Rex, Rexical and Rails routing

Please read Journery into Rails routing to get a background on Rails routing discussion.

A new language

Let's say that the route definition looks like this.

1/page/:id(/:action)(.:format)

The task at hand is to develop a new programming language which will understand the rules of the route definitions. Since this language deals with routes let's call this language Poutes . Well Pout sounds better so let's roll with that.

It all begins with scanner

rexical is a gem which generates scanner generator. Notice that rexical is not a scanner itself. It will generate a scanner for the given rules. Let's give it a try.

Create a folder called pout_language and in that folder create a file called pout_scanner.rex . Notice that the extension of the file is .rex .

1class PoutScanner
2end

Before we proceed any further, let's compile to make sure it works.

1$ gem install rexical
2$ rex pout_scanner.rex -o pout_scanner.rb
3$ ls
4pout_scanner.rb pout_scanner.rex

While doing gem install do not do gem install rex . We are installing gem called rexical not rex .

Time to add rules

Now it's time to add rules to our pout.rex file.

Let's try to develop scanner which can detect difference between integers and strings .

1class PoutScanner
2rule
3  \d+         { puts "Detected number" }
4  [a-zA-Z]+   { puts "Detected string" }
5end

Regenerate the scanner .

1$ rex pout_scanner.rex -o pout_scanner.rb

Now let's put the scanner to test . Let's create pout.rb .

1require './pout_scanner.rb'
2class Pout
3  @scanner = PoutScanner.new
4  @scanner.tokenize("123")
5end

You will get the error undefined method tokenize' for #PoutScanner:0x007f9630837980 (NoMethodError)` .

To fix this error open pout_scanner.rex and add inner section like this .

1class PoutScanner
2rule
3  \d+         { puts "Detected number" }
4  [a-zA-Z]+   { puts "Detected string" }
5
6inner
7  def tokenize(code)
8    scan_setup(code)
9    tokens = []
10    while token = next_token
11      tokens << token
12    end
13    tokens
14  end
15end

Regenerate the scanner by executing rex pout_scanner.rex -o pout_scanner.rb . Now let's try to run pout.rb file.

1$ ruby pout.rb
2Detected number

So this time we got some result.

Now let's test for a string .

1 require './pout_scanner.rb'
2
3class Pout
4  @scanner = PoutScanner.new
5  @scanner.tokenize("hello")
6end
7
8$ ruby pout.rb
9Detected string

So the scanner is rightly identifying string vs integer. We are going to add a lot more testing so let's create a test file so that we do not have to keep changing the pout.rb file.

Tests and Rake file

This is our pout_test.rb file.

1require 'test/unit'
2require './pout_scanner'
3
4class PoutTest  < Test::Unit::TestCase
5  def setup
6    @scanner = PoutScanner.new
7  end
8
9  def test_standalone_string
10    assert_equal [[:STRING, 'hello']], @scanner.tokenize("hello")
11  end
12end

And this is our Rakefile file .

1require 'rake'
2require 'rake/testtask'
3
4task :generate_scanner do
5  `rex pout_scanner.rex -o pout_scanner.rb`
6end
7
8task :default => [:generate_scanner, :test_units]
9
10desc "Run basic tests"
11Rake::TestTask.new("test_units") { |t|
12  t.pattern = '*_test.rb'
13  t.verbose = true
14  t.warning = true
15}

Also let's change the pout_scanner.rex file to return an array instead of puts statements . The array contains information about what type of element it is and the value .

1class PoutScanner
2rule
3  \d+         { [:INTEGER, text.to_i] }
4  [a-zA-Z]+   { [:STRING, text] }
5
6inner
7  def tokenize(code)
8    scan_setup(code)
9    tokens = []
10    while token = next_token
11      tokens << token
12    end
13    tokens
14  end
15end

With all this setup now all we need to do is write test and run rake .

Tests for integer

I added following test and it passed.

1def test_standalone_integer
2  assert_equal [[:INTEGER, 123]], @scanner.tokenize("123")
3end

However following test failed .

1def test_string_and_integer
2  assert_equal [[:STRING, 'hello'], [:INTEGER, 123]], @scanner.tokenize("hello 123")
3end

Test is failing with following message

1  1) Error:
2test_string_and_integer(PoutTest):
3PoutScanner::ScanError: can not match: ' 123'

Notice that in the error message before 123 there is a space. So the scanner does not know how to handle space. Let's fix that.

Here is the updated rule. We do not want any action to be taken when a space is detected. Now test is passing .

1class PoutScanner
2rule
3  \s+
4  \d+         { [:INTEGER, text.to_i] }
5  [a-zA-Z]+   { [:STRING, text] }
6
7inner
8  def tokenize(code)
9    scan_setup(code)
10    tokens = []
11    while token = next_token
12      tokens << token
13    end
14    tokens
15  end
16end

Back to routing business

Now that we have some background on how scanning works let's get back to business at hand. The task is to properly parse a routing statement like /page/:id(/:action)(.:format) .

Test for slash

The simplest route is one with / . Let's write a test and then rule for it.

1require 'test/unit'
2require './pout_scanner'
3
4class PoutTest  < Test::Unit::TestCase
5  def setup
6    @scanner = PoutScanner.new
7  end
8
9  def test_just_slash
10    assert_equal [[:SLASH, '/']], @scanner.tokenize("/")
11  end
12
13end

And here is the .rex file .

1class PoutScanner
2rule
3  \/         { [:SLASH, text] }
4
5inner
6  def tokenize(code)
7    scan_setup(code)
8    tokens = []
9    while token = next_token
10      tokens << token
11    end
12    tokens
13  end
14end

Test for /page

Here is the test for /page .

1def test_slash_and_literal
2  assert_equal [[:SLASH, '/'], [:LITERAL, 'page']] , @scanner.tokenize("/page")
3end

And here is the rule that was added .

1 [a-zA-Z]+  { [:LITERAL, text] }

Test for /:page

Here is test for /:page .

1def test_slash_and_symbol
2  assert_equal [[:SLASH, '/'], [:SYMBOL, ':page']] , @scanner.tokenize("/:page")
3end

And here are the rules .

1rule
2  \/          { [:SLASH, text]   }
3  \:[a-zA-Z]+ { [:SYMBOL, text]  }
4  [a-zA-Z]+   { [:LITERAL, text] }

Test for /(:page)

Here is test for /(:page) .

1def test_symbol_with_paran
2  assert_equal  [[[:SLASH, '/'], [:LPAREN, '('],  [:SYMBOL, ':page'], [:RPAREN, ')']]] , @scanner.tokenize("/(:page)")
3end

And here is the new rule

1  \/\(\:[a-z]+\) { [ [:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, text[2..-2]], [:RPAREN, ')']] }

We'll stop here and will look at the final set of files

Final files

This is Rakefile .

1require 'rake'
2require 'rake/testtask'
3
4task :generate_scanner do
5  `rex pout_scanner.rex -o pout_scanner.rb`
6end
7
8task :default => [:generate_scanner, :test_units]
9
10desc "Run basic tests"
11Rake::TestTask.new("test_units") { |t|
12  t.pattern = '*_test.rb'
13  t.verbose = true
14  t.warning = true
15}

This is pout_scanner.rex .

1class PoutScanner
2rule
3  \/\(\:[a-z]+\) { [ [:SLASH, '/'], [:LPAREN, '('], [:SYMBOL, text[2..-2]], [:RPAREN, ')']] }
4  \/          { [:SLASH, text]   }
5  \:[a-zA-Z]+ { [:SYMBOL, text]  }
6  [a-zA-Z]+   { [:LITERAL, text] }
7
8inner
9  def tokenize(code)
10    scan_setup(code)
11    tokens = []
12    while token = next_token
13      tokens << token
14    end
15    tokens
16  end
17end

This is pout_test.rb .

1require 'test/unit'
2require './pout_scanner'
3
4class PoutTest  < Test::Unit::TestCase
5  def setup
6    @scanner = PoutScanner.new
7  end
8
9  def test_just_slash
10    assert_equal [[:SLASH, '/']] , @scanner.tokenize("/")
11  end
12
13  def test_slash_and_literal
14    assert_equal [[:SLASH, '/'], [:LITERAL, 'page']] , @scanner.tokenize("/page")
15  end
16
17  def test_slash_and_symbol
18    assert_equal [[:SLASH, '/'], [:SYMBOL, ':page']] , @scanner.tokenize("/:page")
19  end
20
21  def test_symbol_with_paran
22    assert_equal  [[[:SLASH, '/'], [:LPAREN, '('],  [:SYMBOL, ':page'], [:RPAREN, ')']]] , @scanner.tokenize("/(:page)")
23  end
24end

How scanner works

Here we used rex to generate the scanner. Now take a look that the pout_scanner.rb . Here is that file . Please take a look at this file and study the code. It is only 91 lines of code.

If you look at the code it is clear that scanning is not that hard. You can hand roll it without using a tool like rex . And that's exactly what Aaron Patternson did in Journey . He hand rolled the scanner .

Conclusion

In this blog we saw how to use rex to build the scanner to read our routing statements . In the next blog we'll see how to parse the routing statement and how to find the matching routing statement for a given url .

If this blog was helpful, check out our full blog archive.