Ruby 2.6 adds String#split with block

Taha Husain

By Taha Husain

on July 17, 2018

This blog is part of our  Ruby 2.6 series.

Before Ruby 2.6, String#split returned array of split strings.

In Ruby 2.6, a block can be passed to String#split (Link is not available) which yields each split string and operates on it. This avoids creating an array and thus is memory efficient.

We will add method is_fruit? to understand how to use split with a block.

1def is_fruit?(value)
2%w(apple mango banana watermelon grapes guava lychee).include?(value)
3end

Input is a comma separated string with vegetables and fruits names. Goal is to fetch names of fruits from input string and store it in an array.

String#split
1input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"
2
3splitted_values = input_str.split(", ")
4=> ["apple", "mango", "potato", "banana", "cabbage", "watermelon", "grapes"]
5
6fruits = splitted_values.select { |value| is_fruit?(value) }
7=> ["apple", "mango", "banana", "watermelon", "grapes"]

Using split an intermediate array is created which contains both fruits and vegetables names.

String#split with a block
1fruits = []
2
3input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"
4
5input_str.split(", ") { |value| fruits << value if is_fruit?(value) }
6=> "apple, mango, potato, banana, cabbage, watermelon, grapes"
7
8fruits
9=> ["apple", "mango", "banana", "watermelon", "grapes"]

When a block is passed to split, it returns the string on which split was called and does not create an array. String#split yields block on each split string, which in our case was to push fruit names in a separate array.

Update

Benchmark

We created a large random string to benchmark performance of split and split with block

1require 'securerandom'
2
3test_string = ''
4
5100_000.times.each do
6test_string += SecureRandom.alphanumeric(10)
7test_string += ' '
8end
1require 'benchmark'
2
3Benchmark.bmbm do |bench|
4
5bench.report('split') do
6arr = test_string.split(' ')
7str_starts_with_a = arr.select { |str| str.start_with?('a') }
8end
9
10bench.report('split with block') do
11str_starts_with_a = []
12test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
13end
14
15end

Results

1Rehearsal ----------------------------------------------------
2split              0.023764   0.000911   0.024675 (  0.024686)
3split with block   0.012892   0.000553   0.013445 (  0.013486)
4------------------------------------------- total: 0.038120sec
5
6                       user     system      total        real
7split              0.024107   0.000487   0.024594 (  0.024622)
8split with block   0.010613   0.000334   0.010947 (  0.010991)

We did another iteration of benchmarking using benchmark/ips.

1require 'benchmark/ips'
2Benchmark.ips do |bench|
3
4bench.report('split') do
5splitted_arr = test_string.split(' ')
6str_starts_with_a = splitted_arr.select { |str| str.start_with?('a') }
7end
8
9bench.report('split with block') do
10str_starts_with_a = []
11test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
12end
13
14bench.compare!
15end

Results

1Warming up --------------------------------------
2               split     4.000  i/100ms
3    split with block    10.000  i/100ms
4Calculating -------------------------------------
5               split     46.906  (± 2.1%) i/s -    236.000  in   5.033343s
6    split with block    107.301  (± 1.9%) i/s -    540.000  in   5.033614s
7
8Comparison:
9    split with block:      107.3 i/s
10               split:       46.9 i/s - 2.29x  slower

This benchmark shows that split with block is about 2 times faster than split.

Here is relevant commit and discussion for this change.

The Chinese version of this blog is available here.