Ruby 2.6 adds String#split with block

Taha Husain avatar

Taha Husain

July 17, 2018

This blog is part of our  Ruby 2.6 series.

Before Ruby 2.6, String#split returned array of split strings.

In Ruby 2.6, a block can be passed to String#split (Link is not available) which yields each split string and operates on it. This avoids creating an array and thus is memory efficient.

We will add method is_fruit? to understand how to use split with a block.

def is_fruit?(value)
%w(apple mango banana watermelon grapes guava lychee).include?(value)
end

Input is a comma separated string with vegetables and fruits names. Goal is to fetch names of fruits from input string and store it in an array.

String#split
input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"

splitted_values = input_str.split(", ")
=> ["apple", "mango", "potato", "banana", "cabbage", "watermelon", "grapes"]

fruits = splitted_values.select { |value| is_fruit?(value) }
=> ["apple", "mango", "banana", "watermelon", "grapes"]

Using split an intermediate array is created which contains both fruits and vegetables names.

String#split with a block
fruits = []

input_str = "apple, mango, potato, banana, cabbage, watermelon, grapes"

input_str.split(", ") { |value| fruits << value if is_fruit?(value) }
=> "apple, mango, potato, banana, cabbage, watermelon, grapes"

fruits
=> ["apple", "mango", "banana", "watermelon", "grapes"]

When a block is passed to split, it returns the string on which split was called and does not create an array. String#split yields block on each split string, which in our case was to push fruit names in a separate array.

Update

Benchmark

We created a large random string to benchmark performance of split and split with block

require 'securerandom'

test_string = ''

100_000.times.each do
test_string += SecureRandom.alphanumeric(10)
test_string += ' '
end
require 'benchmark'

Benchmark.bmbm do |bench|

bench.report('split') do
arr = test_string.split(' ')
str_starts_with_a = arr.select { |str| str.start_with?('a') }
end

bench.report('split with block') do
str_starts_with_a = []
test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
end

end

Results

Rehearsal ----------------------------------------------------
split              0.023764   0.000911   0.024675 (  0.024686)
split with block   0.012892   0.000553   0.013445 (  0.013486)
------------------------------------------- total: 0.038120sec

                       user     system      total        real
split              0.024107   0.000487   0.024594 (  0.024622)
split with block   0.010613   0.000334   0.010947 (  0.010991)

We did another iteration of benchmarking using benchmark/ips.

require 'benchmark/ips'
Benchmark.ips do |bench|

bench.report('split') do
splitted_arr = test_string.split(' ')
str_starts_with_a = splitted_arr.select { |str| str.start_with?('a') }
end

bench.report('split with block') do
str_starts_with_a = []
test_string.split(' ') { |str| str_starts_with_a << str if str.start_with?('a') }
end

bench.compare!
end

Results

Warming up --------------------------------------
               split     4.000  i/100ms
    split with block    10.000  i/100ms
Calculating -------------------------------------
               split     46.906  (± 2.1%) i/s -    236.000  in   5.033343s
    split with block    107.301  (± 1.9%) i/s -    540.000  in   5.033614s

Comparison:
    split with block:      107.3 i/s
               split:       46.9 i/s - 2.29x  slower

This benchmark shows that split with block is about 2 times faster than split.

Here is relevant commit and discussion for this change.

The Chinese version of this blog is available here.

Follow @bigbinary on X. Check out our full blog archive.