Back to Blog

Solr, Sunspot, Websolr and Delayed job

on October 11, 2012

Solr is an open source search platform from Apache. It has a very powerful full-text search capability among other things.

Solr is written in Java. And it runs as a standalone search server within a servlet container like Tomcat. When you are working on a Ruby on Rails application you do not want to maintain Tomcat server. This is where websolr comes in picture. Websolr manages the index and the Rails application interacts with index using a gem called sunspot-rails .

Getting started

1# Gemfile
2gem 'sunspot_rails', '= 1.3.3' # search feature

Here I am interested in searching products.

1class Product < ActiveRecord::Base
2  searchable do
3    text :name, boost: 1.5
4    text :description
5  end
6end

Using sunspot gem

1rails g sunspot_rails:install

Above command creates config/sunspot.yml file. By default this file looks like following.

1production:
2  solr:
3    hostname: localhost
4    port: 8983
5    log_level: WARNING
6
7development:
8  solr:
9    hostname: localhost
10    port: 8982
11    log_level: INFO
12
13test:
14  solr:
15    hostname: localhost
16    port: 8981
17    log_level: WARNING

The way sunspot works is that after every single web request it updates solr about the changes that took place in the request. This is not desirable. To turn that off add auto_commit_after_request option to false in the config/sunsunspot.yml file.

I would also change the log_level for development to DEBUG . The revised config/sunspot.yml file would look like

1production:
2  solr:
3    hostname: localhost
4    port: 8983
5    log_level: WARNING
6    auto_commit_after_request: false
7
8development:
9  solr:
10    hostname: localhost
11    port: 8980
12    log_level: DEBUG
13    auto_commit_after_request: false
14
15test:
16  solr:
17    hostname: localhost
18    port: 8981
19    log_level: DEBUG
20    auto_commit_after_request: false

Taking care of callbacks

In the above case anytime I create, update or destroy a product then as part of after_save callback solr commit commands are issued. Since after_save callbacks are part of ActiveRecord transaction, this slows up the create, update and destroy operation. I like all these operations to happen in background.

Here is how I handled it

1class Product < ActiveRecord::Base
2  searchable do
3    text :name, boost: 1.5
4    text :description
5  end
6  handle_asynchronously :solr_index, queue: 'indexing', priority: 50
7  handle_asynchronously :solr_index!, queue: 'indexing', priority: 50
8  handle_asynchronously :remove_from_index, queue: 'indexing', priority: 50
9end

In the above case I used Delayed Job but you can use any background job processing tool.

In case of Delayed Job the higher the priority value the less is the priority. By bumping the priority value to 50, I'm making sure that emails and other background jobs are processed before solr work is taken up.

Problem with remove_from_index

In the above case the call to remove_from_index has been deferred to Delayed Job. However the record has already been destroyed. So when Delayed Job takes up the work it first tries to retrieve the record. However the record is missing and the background job fails.

Here is how we solved this problem.

1class Product < ActiveRecord::Base
2  searchable do
3    text :name, boost: 1.5
4    text :description
5  end
6  handle_asynchronously :solr_index, queue: 'indexing', priority: 50
7  handle_asynchronously :solr_index!, queue: 'indexing', priority: 50
8
9  def remove_from_index_with_delayed
10    Delayed::Job.enqueue RemoveIndexJob.new(record_class: self.class.to_s, attributes: self.attributes), queue: 'indexing', priority: 50
11  end
12  alias_method_chain :remove_from_index, :delayed
13end

Add another worker named remove_index.rb .

1class RemoveIndexJob < Struct.new(:options)
2  def perform
3    return if options.nil?
4    options.symbolize_keys!
5    record = options[:record_class].constantize.new options[:attributes].except("id")
6    record.id = options[:attributes]["id"]
7    record.remove_from_index_without_delayed
8  end
9end

Connecting to websolr

From the websolr documentation it was not clear that the sunspot gem first looks for an environment variable called WEBSOLR_URL and if that environment variable has a value then sunspot assumes that the solr index is at that url. If no value is found then it assumes that it is dealing with local solr instance.

So if you are using websolr then make sure that your application has environment variable WEBSOLR_URL properly configured in staging and in production environment.


You might also like

If you liked this blog post, check out similar ones from BigBinary