<?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom">
     <title>BigBinary Blog</title>
     <link href="https://www.bigbinary.com/feed.xml" rel="self"/>
     <link href="https://www.bigbinary.com/"/>
     <updated>2026-03-08T08:02:12+00:00</updated>
     <id>https://www.bigbinary.com/</id>
     <entry>
       <title><![CDATA[Understanding Queueing Theory]]></title>
       <author><name>Vishnu M</name></author>
      <link href="https://www.bigbinary.com/blog/understanding-queueing-theory"/>
      <updated>2025-06-03T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/understanding-queueing-theory</id>
      <content type="html"><![CDATA[<p><em>This is Part 6 of our blog series on<a href="/blog/scaling-rails-series">scaling Rails applications</a>.</em></p><hr><h2>Queueing Systems</h2><p>In web applications, not every task needs to be processed immediately. When youupload a large video file, send a bulk email campaign, or generate a complexreport, these time-consuming operations are often handled in the background.This is where queueing systems like <a href="https://sidekiq.org/">Sidekiq</a> or<a href="https://github.com/rails/solid_queue">Solid Queue</a> comes into play.</p><p>Queueing theory helps us understand how these systems behave under differentconditions - from quiet periods to peak load times.</p><p>Let's understand the fundamentals of queueing theory.</p><h2>Basic Terminology in Queueing Theory</h2><ol><li><p><strong>Unit of Work</strong>: This is the individual item needing service - a job.</p></li><li><p><strong>Server</strong>: This is one &quot;unit of parallel processing capacity.&quot; In queuingtheory, this doesn't necessarily mean a physical server. It refers to theability to process one unit of work at a time. For JRuby or TruffleRuby, eachthread can be considered a separate &quot;server&quot; since they can execute inparallel. For CRuby/MRI, because of the GVL, the concept of a &quot;server&quot; isdifferent. We'll discuss it later.</p></li><li><p><strong>Queue Discipline</strong>: This is the rule determining which unit of work isselected next from the queue. For Sidekiq and Solid Queue, it is FCFS(FirstCome First Serve). If there are multiple queues, which job is selecteddepends on the priority of the queue.</p></li><li><p><strong>Service Time</strong>: The actual time it takes to process a unit of work (howlong a job takes to execute).</p></li><li><p><strong>Latency/Wait Time</strong>: How long jobs spend waiting in the queue before beingprocessed.</p></li><li><p><strong>Total Time</strong>: The sum of service time and wait time. It's the completeduration from when a job is enqueued until it finishes executing.</p></li></ol><h2>Little's law</h2><p>Little's law is a theorem in queuing theory that states that the average numberof jobs in a system is equal to the average arrival rate of new jobs multipliedby the average time a job spends in the system.</p><pre><code class="language-ruby">L = W</code></pre><p>L = Average number of jobs in the system <br> = Average arrival rate of new jobs <br>W = Average time a job spends in the system</p><p>For example, if jobs arrive at a rate of 10 per minute (), and each job takes30 seconds (W) to complete:</p><p>Average number of jobs in system = 10 jobs/minute * 0.5 minutes = 5 jobs</p><p>This helps us understand the current state of our system and that there are 5jobs in the system on average at any given moment.</p><p><code>L</code> is also called <strong>offered traffic</strong>.</p><p>Note: Little's Law assumes the arrival rate is consistent over time.</p><h3>Managing Utilization</h3><p>Utilization measures how busy our processing capacity is.</p><p>Mathematically, it is the ratio of how much processing capacity we're using tothe processing capacity we have.</p><pre><code class="language-ruby">utilization = (average number of jobs in the system/capacity to handle jobs) * 100</code></pre><p>In other words, it could be written as follows.</p><pre><code class="language-ruby">utilization = (offered_traffic / parallelism) * 100</code></pre><p>For example, if we are using Sidekiq to manage our background jobs then in asingle-threaded case, parallelism is equal to the number of Sidekiq processes.</p><p>Let's look at a practical case with numbers:</p><ul><li>We have 30 jobs arriving every minute</li><li>It takes 0.5 minutes to process a job</li><li>We have 20 Sidekiq processes</li></ul><p>In this case, the utilization will be:</p><pre><code class="language-ruby">utilization = ((30 jobs/minute * 0.5 minutes) / 20 processes) * 100 = 75%</code></pre><h3>High utilization is bad for performance</h3><p>Let's assume we maintain 100% utilization in our system. It means that if, onaverage, we get 30 jobs per minute, then we have just enough capacity to handle30 jobs per minute.</p><p>One day, we started getting 45 jobs per minute. Since utilization is at 100%,there is no extra room to accommodate the additional load. This leads to higherlatency.</p><p>Hence, having a high utilization rate may result in low performance, as it canlead to higher latency for specific jobs.</p><h3>The knee curve</h3><p>Mathematically, it would seem that only when the utilization rate hits 100%should the latency spike up. However, in the real world, it has been found thatlatency begins to increase dramatically when utilization reaches around 70-75%.</p><p>If we draw a graph between utilization and performance then the graph would looksomething like this.</p><p><img src="/blog_images/2025/understanding-queueing-theory/knee-curve.png" alt="The knee curve"></p><p>The point at which the curve bends sharply upwards is called &quot;Knee&quot; in theperformance curve. At this point, the exponential effects predicted by queueingtheory become pronounced, causing the queue latency to climb up quickly.</p><p>Running any system consistently above 70-75% utilization significantly increasesthe risk of spiking the latency, as jobs spend more and more time waiting.</p><p>This would directly impact the customer experience, as it could result in delaysin sending emails or making calls to Twilio to send SMS messages, etc.</p><p>Tracking this latency will be covered in the upcoming blogs. The tracking ofmetrics depends on the queueing backend used (Sidekiq or Solid Queue).</p><h2>Concurrency and theoretical parallelism</h2><p>In Sidekiq, a process is the primary unit of parallelism. However, concurrency(threads per process) significantly impacts a process's effective throughput.Because of GVL we need to take into account how long the job is waiting for I/O.</p><p>The more time a job spends waiting on external resources (like databases orAPIs) rather than executing Ruby code, the more other threads within the sameprocess can run Ruby code while the first thread waits.</p><p>We learned about Amdahl's law in<a href="/blog/amdahls-law-the-theoretical-relationship-between-speedup-and-concurrency">Part 3</a>of this series.</p><p><img src="/blog_images/2025/understanding-queueing-theory/amdahls-law.png" alt="Amdahl's law"></p><p>Where:</p><p><code>p</code> is the portion that can be parallelized (the I/O percentage)</p><p><code>n</code> is the number of threads (concurrency)</p><p>Speedup is equivalent to theoretical parallelism in this context. In queueingtheory, parallelism refers to how many units of work can be processedsimultaneously. When we calculate speedup using Amdahl's Law, we're essentiallydetermining how much faster a multi-threaded system can handle work compared toa single-threaded system.</p><p>Let's assume that a system has an I/O of 50% and a concurrency of 10. ThenSpeedup will be:</p><pre><code class="language-ruby">Speedup = 1 / ((1 - 0.5) + 0.5 / 10) = 1 / 0.55 = 1.82  2</code></pre><p>This means one Sidekiq process with 10 threads will handle jobs twice as fast asSidekiq with a single process with a single thread.</p><p>Let's recap what we are saying here. We are assuming that the system has I/O of50%. System is using a single Sidekiq process with 10 threads(concurrency). Thenbecause of 10 threads the system has a speed gain of 2x compared to systemhaving just a single thread. In other words, just because we have 10 threadsrunning we are not going to gain 10X performance improvement. What those 10threads is getting us is what is called &quot;theoretical parallelism&quot;.</p><p>Similarly, for other values of I/O and Concurrency, we can get the theoreticalparallelism.</p><table><thead><tr><th>I/O</th><th>Concurrency</th><th>Theoretical parallelism</th></tr></thead><tbody><tr><td>5%</td><td>1</td><td>1</td></tr><tr><td>25%</td><td>5</td><td>1.25</td></tr><tr><td>50%</td><td>10</td><td>2</td></tr><tr><td>75%</td><td>16</td><td>3</td></tr><tr><td>90%</td><td>32</td><td>8</td></tr><tr><td>95%</td><td>64</td><td>16</td></tr></tbody></table><p>Let's go over one more time. In the last example, what we are stating is that ifa system has 95% I/O and if the system has 64 threads running, then that willgive 16x performance improvement over the same system running on a singlethread.</p><p>Here is the graph for this data.</p><p><img src="/blog_images/2025/understanding-queueing-theory/concurrency-vs-effective-parallelism.png" alt="Theoretical Parallelism"></p><p>As shown in the graph, a Sidekiq process with 16 threads handling jobs that are75% I/O-bound achieves a theoretical parallelism of approximately 3. In otherwords, 3x performance improvement is there over a single-threaded system.</p><h2>Calculating the number of processes required</h2><p>At the beginning of this article, we discussed &quot;Little's law&quot; and we discussedthat <code>L</code> is also called &quot;offered traffic,&quot; which depicts the &quot;average number ofjobs in the system&quot;.</p><p>If &quot;offered traffic&quot; is 5, then it means we have 5 units of work arriving onaverage that requires processing simultaneously.</p><p>We just learned that if the utilization is greater than 75% then it can causeproblems as there is a risk of latency to spike.</p><p>For queues with low latency requirements(eg. <code>urgent</code>), we need to target alower utilization rate. Let's say we want utilization to be around 50% to be onthe safe side.</p><p>Now we know the utilization rate that we need to target and we know the &quot;offeredtraffic&quot;. So now we can calculate the &quot;parallelism&quot;.</p><pre><code class="language-ruby">utilization = offered_traffic / parallelism=&gt; 0.50 = 5 / parallelism=&gt; parallelism = 5 / 0.50 = 10</code></pre><p>This means we need a theoretical parallelism of 10 to ensure that theutilization is 50% at max.</p><p>Let's assume the jobs in this queue have an average of <code>50%</code> I/O. Based on theabove mentioned graph, we can see that if the concurrency is 10 then we getparallelism of 2. However, increasing the concurrency doesn't increase theparallelism. It means if we want 10 parallelism, then we can't just switch toconcurrency of 50. Even the concurrency of 50 (or 50 threads) will only yield aparallelism of 2.</p><p>So we have no choice but to add more processes. Since one process with 10concurrency is yielding a parallelism of 2 we need to add 5 processes to get 10parallelism.</p><p><em>To get the I/O wait percentage, we can make use of perfm.<a href="https://github.com/bigbinary/perfm?tab=readme-ov-file#sidekiq-gvl-instrumentation">Here</a>is the documentation on how it can be done.</em></p><pre><code class="language-ruby">Total number of Sidekiq processes required = 10 / 2 = 5</code></pre><p>Here we're talking about Sidekiq free version, where we'll only be able to run asingle process per dyno. If we're using Sidekiq Pro, we can run multipleprocesses per dyno via Sidekiq Swarm.</p><p>We can provision 5 dynos for the urgent queue. But we should always have a queuetime based autoscaler like Judoscale enabled to handle spikes.</p><h2>Sources of Saturation</h2><p>We discussed earlier that, in the context of queueing theory, the saturationpoint is typically reached at around 70-75% utilization. This is from the pointof view of further gains by adding more threads.</p><p>However saturation can occur in other parts of the system.</p><h3>CPU</h3><p>The servers running your Sidekiq processes have finite CPU and memory. While CPUusage is a metric we can track for Sidekiq, it's generally not the only one weneed to focus on for scaling decisions.</p><p>CPU utilization can be misleading. If our jobs spend most of their time doingI/O (like making API calls or database queries), in which case CPU usage will bevery low, even when our Sidekiq system is at capacity.</p><h3>Memory</h3><p>Memory utilization impacts performance very differently from CPU utilization.Memory utilization generally exhibits minimal changes in latency or throughputfrom 0% to 100% utilization. However, after 100% utilization, things will startto deteriorate significantly. The system will start using the swap memory, whichcan be very slow and thereby increase the job service times.</p><h3>Redis</h3><p>Another place where saturation can occur is in our datastore i.e Redis in caseof Sidekiq. We have to make sure that we provision a separate Redis instance forSidekiq and also make sure to set the eviction policy to <code>noeviction</code>. Thisensures that Redis will reject new data when the memory limit is reached,resulting in an explicit failure rather than silently dropping important jobs.</p><p><em>This was Part 6 of our blog series on<a href="/blog/scaling-rails-series">scaling Rails applications</a>. If any part of theblog is not clear to you then please write to us at<a href="https://www.linkedin.com/company/bigbinary">LinkedIn</a>,<a href="https://twitter.com/bigbinary">Twitter</a> or<a href="https://bigbinary.com/contact">BigBinary website</a>.</em></p>]]></content>
    </entry><entry>
       <title><![CDATA[How we migrated from Sidekiq to Solid Queue]]></title>
       <author><name>Chirag Shah</name></author>
      <link href="https://www.bigbinary.com/blog/migrating-to-solid-queue-from-sidekiq"/>
      <updated>2024-03-05T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/migrating-to-solid-queue-from-sidekiq</id>
      <content type="html"><![CDATA[<p>BigBinary is building a suite of products under <a href="https://neeto.com">neeto</a>. Wecurrently have around 22 products under development, and all of the products areusing <a href="https://github.com/sidekiq/sidekiq">Sidekiq</a>. After the<a href="https://dev.37signals.com/introducing-solid-queue/">launch of Solid Queue</a>, wedecided to migrate <a href="https://neeto.com/neetoform">NeetoForm</a> from Sidekiq toSolid Queue.</p><p>Please note that Solid Queue currently doesn't support cron-style or recurringjobs. There is a <a href="https://github.com/basecamp/solid_queue/pull/155">PR open</a>regarding this issue. We have only partially migrated to Solid Queue. Forrecurring jobs, we are still using Sidekiq. Once the PR is merged, we willmigrate completely to Solid Queue.</p><h2>Migrating to Solid Queue from Sidekiq</h2><p>Here is a step-by-step migration guide you can use to migrate your Railsapplication from Sidekiq to Solid Queue.</p><h2>1. Installation</h2><ul><li>Add <code>gem &quot;solid_queue&quot;</code> to your Rails application's Gemfile and run<code>bundle install</code>.</li><li>Run <code>bin/rails generate solid_queue:install</code> which copies the config file andthe required migrations.</li><li>Run the migrations using <code>bin/rails db:migrate</code>.</li></ul><h2>2. Configuration</h2><p>The installation step should have created a <code>config/solid_queue.yml</code> file.Uncomment the file and modify it as per your needs. Here is how the file looksfor our application.</p><pre><code class="language-yaml">default: &amp;default  dispatchers:    - polling_interval: 1      batch_size: 500  workers:    - queues: &quot;auth&quot;      threads: 3      processes: 1      polling_interval: 0.1    - queues: &quot;urgent&quot;      threads: 3      processes: 1      polling_interval: 0.1    - queues: &quot;low&quot;      threads: 3      processes: 1      polling_interval: 2    - queues: &quot;*&quot;      threads: 3      processes: 1      polling_interval: 1development:  &lt;&lt;: *defaultstaging:  &lt;&lt;: *defaultheroku:  &lt;&lt;: *defaulttest:  &lt;&lt;: *defaultproduction:  &lt;&lt;: *default</code></pre><h2>3. Starting Solid Queue</h2><p>On your development machine, you can start Solid Queue by running the followingcommand.</p><pre><code>bundle exec rake solid_queue:start</code></pre><p>This will start Solid Queue's supervisor process and will start processing anyenqueued jobs. The supervisor process forks<a href="https://github.com/basecamp/solid_queue?tab=readme-ov-file#workers-and-dispatchers">workers and dispatchers</a>according to the configuration provided in the <code>config/solid_queue.yml</code> file.The supervisor process also controls the heartbeats of workers and dispatchers,and sends signals to stop and start them when needed.</p><p>Since we use <a href="https://github.com/ddollar/foreman">foreman</a>, we added the abovecommand to our Procfile.</p><pre><code class="language-ruby"># Procfileweb:  bundle exec puma -C config/puma.rbworker: bundle exec sidekiq -C config/sidekiq.ymlsolidqueueworker: bundle exec rake solid_queue:startrelease: bundle exec rake db:migrate</code></pre><h2>4. Setting the Active Job queue adapter</h2><p>You can set the Active Job queue adapter to <code>:solid_queue</code> by adding thefollowing line in your <code>application.rb</code> file.</p><pre><code class="language-ruby"># application.rbconfig.active_job.queue_adapter = :solid_queue</code></pre><p>The above change sets the queue adapter at the application level for all thejobs. However, since we wanted to use Solid Queue for our regular jobs andcontinue using Sidekiq for cron jobs, we didn't make the above change in<code>application.rb</code>.</p><p>Instead, we created a new base class that inherited from <code>ApplicationJob</code> andset the queue adapter to <code>:solid_queue</code> inside that.</p><pre><code class="language-ruby"># sq_base_job.rbclass SqBaseJob &lt; ApplicationJob  self.queue_adapter = :solid_queueend</code></pre><p>Then we made all the classes implementing regular jobs inherit from this newclass <code>SqBaseJob</code> instead of <code>ApplicationJob</code>.</p><pre><code class="language-diff"># send_email_job.rb- class SendEmailJob &lt; ApplicationJob+ class SendEmailJob &lt; SqBaseJob  # ...end</code></pre><p>By making the above change, all our regular jobs got enqueued via Solid Queueinstead of Sidekiq.</p><p>But we realized later that emails were still being sent via Sidekiq. Ondebugging and looking into Rails internals, we found that <code>ActionMailer</code> uses<code>ActionMailer::MailDeliveryJob</code> for enqueuing or sending emails.</p><p><code>ActionMailer::MailDeliveryJob</code> inherits from <code>ActiveJob::Base</code> rather than theapplication's <code>ApplicationJob</code>. So even if we set the queue_adapter in<code>application_job.rb</code>, it won't work. <code>ActionMailer::MailDeliveryJob</code> fallbacksto using the adapter defined in <code>application.rb</code> or environment-specific(production.rb / staging.rb / development.rb) config files. But we can't do thatbecause we still want to use Sidekiq for cron jobs.</p><p>To use Solid Queue for mailers, we needed to override the queue_adapter formailers. We can do that in <code>application_mailer.rb</code>.</p><pre><code class="language-ruby"># application_mailer.rbclass ApplicationMailer &lt; ActionMailer::Base  # ...  ActionMailer::MailDeliveryJob.queue_adapter = :solid_queueend</code></pre><p>This change is only until we use both Sidekiq and Solid Queue. Once cron stylejobs feature lands in Solid Queue, we can remove this override and set thequeue_adapter directly in <code>application.rb</code>, which will enforce the settingglobally.</p><h2>5. Code changes</h2><p>For migrating from Sidekiq to Solid Queue, we had to make the following changesto the syntax for enqueuing a job.</p><ul><li>Replaced <code>.perform_async</code> with <code>.perform_later</code>.</li><li>Replaced <code>.perform_at</code> with <code>.set(...).perform_later(...)</code>.</li></ul><pre><code class="language-diff">- SendMailJob.perform_async+ SendMailJob.perform_later- SendMailJob.perform_at(1.minute.from_now)+ SendMailJob.set(wait: 1.minute).perform_later</code></pre><p>At some places we were storing the Job ID on a record, for querying the job'sstatus or for cancelling the job. For such cases, we made the following change.</p><pre><code class="language-diff">def disable_form_at_deadline- job_id = DisableFormJob.perform_at(deadline, self.id)- self.disable_job_id = job_id+ job_id = DisableFormJob.set(wait_until: deadline).perform_later(self.id)+ self.disable_job_id = job.job_idenddef cancel_form_deadline- Sidekiq::Status.cancel(self.disable_job_id)+ SolidQueue::Job.find_by(active_job_id: self.disable_job_id).destroy!  self.disable_job_id = nilend</code></pre><h2>6. Error handling and retries</h2><p>Initially, we thought the<a href="https://github.com/basecamp/solid_queue?tab=readme-ov-file#other-configuration-settings"><code>on_thread_error</code> configuration</a>provided by Solid Queue can be used for error handling. However, during thedevelopment phase, we noticed that it wasn't capturing errors. We raised<a href="https://github.com/basecamp/solid_queue/issues/120">an issue with Solid Queue</a>as we thought it was a bug.</p><p><a href="https://github.com/rosa">Rosa Gutirrez</a><a href="https://github.com/basecamp/solid_queue/issues/120#issuecomment-1894413948">responded</a>on the issue and clarified the following.</p><blockquote><p><code>on_thread_error</code> wasn't intended for errors on the job itself, but rathererrors in the thread that's executing the job, but around the job itself. Forexample, if you had an Active Record's thread pool too small for your numberof threads and you got an error when trying to check out a new connection,on_thread_error would be called with that.</p><p>For errors in the job itself, you could try to hook into Active Job's itself.</p></blockquote><p>Based on the above information, we modified our <code>SqBaseJob</code> base class to handlethe exceptions and report it to <a href="https://www.honeybadger.io/">Honeybadger</a>.</p><pre><code class="language-ruby"># sq_base_job.rbclass SqBaseJob &lt; ApplicationJob  self.queue_adapter = :solid_queue  rescue_from(Exception) do |exception|    context = {      error_class: self.class.name,      args: self.arguments,      scheduled_at: self.scheduled_at,      job_id: self.job_id    }    Honeybadger.notify(exception, context:)    raise exception  endend</code></pre><p>Remember we mentioned that <code>ActionMailer</code> doesn't inherit from <code>ApplicationJob</code>.So similarly, we would have to handle exceptions for Mailers separately.</p><pre><code class="language-ruby"># application_mailer.rbclass ApplicationMailer &lt; ActionMailer::Base  # ...  ActionMailer::MailDeliveryJob.rescue_from(Exception) do |exception|    context = {      error_class: self.class.name,      args: self.arguments,      scheduled_at: self.scheduled_at,      job_id: self.job_id    }    Honeybadger.notify(exception, context:)    raise exception  endend</code></pre><p>For retries, unlike Sidekiq, Solid Queue doesn't include any automatic retrymechanism, it<a href="https://edgeguides.rubyonrails.org/active_job_basics.html#retrying-or-discarding-failed-jobs">relies on Active Job for this</a>.We wanted our application to retry sending emails in case of any errors. So weadded the retry logic in the <code>ApplicationMailer</code>.</p><pre><code class="language-ruby"># application_mailer.rbclass ApplicationMailer &lt; ActionMailer::Base  # ...  ActionMailer::MailDeliveryJob.retry_on StandardError, attempts: 3end</code></pre><p>Note that, although the queue adapter configuration can be removed from<code>application_mailer.rb</code> once the entire application migrates to Solid Queue,error handling and retry override cannot be removed because of the way<code>ActionMailer::MailDeliveryJob</code> inherits from <code>ActiveJob::Base</code> rather thanapplication's <code>ApplicationJob</code>.</p><h2>7. Testing</h2><p>Once all the above changes were done, it was obvious that a lot of tests werefailing. Apart from fixing the usual failures related to the syntax changes,some of the tests were failing inconsistently. On debugging, we found that theaffected tests were all related to controllers, specifically tests inheritingfrom <code>ActionDispatch::IntegrationTest</code>.</p><p>We tried debugging and searched for solutions when we stumbled upon<a href="https://github.com/bensheldon">Ben Sheldon's</a><a href="https://github.com/bensheldon/good_job/issues/846#issuecomment-1432375562">comment on one of Good Job's issues</a>.Ben points out that this is actually<a href="https://github.com/rails/rails/issues/37270">an issue in Rails</a> where Railssometimes inconsistently overrides ActiveJob's queue_adapter setting withTestAdapter. A <a href="https://github.com/rails/rails/pull/48585">PR is already open</a>for the fix. Thankfully, Ben, in the same comment, also mentioned a workaroundfor it until the fix has been added to Rails.</p><p>We added the workaround in our test <code>helper_methods.rb</code> and called the method ineach of our controller tests which were failing.</p><pre><code class="language-ruby"># test/support/helper_methods.rbdef ensure_consistent_test_adapter_is_used  # This is a hack mentioned here: https://github.com/bensheldon/good_job/issues/846#issuecomment-1432375562  # The actual issue is in Rails for which a PR is pending merge  # https://github.com/rails/rails/pull/48585  (ActiveJob::Base.descendants + [ActiveJob::Base]).each(&amp;:disable_test_adapter)end</code></pre><pre><code class="language-ruby"># test/controllers/exports_controller_test.rbclass ExportsControllerTest &lt; ActionDispatch::IntegrationTest  def setup    ensure_consistent_test_adapter_is_used    # ...  end  # ...end</code></pre><h2>8. Monitoring</h2><p>Basecamp has released<a href="https://github.com/basecamp/mission_control-jobs">mission_control-jobs</a>, whichcan be used to monitor background jobs. It is generic, so it can be used withany compatible ActiveJob adapter.</p><p>Add <code>gem &quot;mission_control-jobs&quot;</code> to your Gemfile and run <code>bundle install</code>.</p><p>Mount the mission control route in your <code>routes.rb</code> file.</p><pre><code class="language-ruby"># routes.rbRails.application.routes.draw do  # ...  mount MissionControl::Jobs::Engine, at: &quot;/jobs&quot;</code></pre><p>By default, mission control would try to load the adapter specified in your<code>application.rb</code> or individual environment-specific files. Currently, Sidekiqisn't compatible with mission control, so you will face an error while loadingthe dashboard at <code>/jobs</code>. The fix is to explicitly specify <code>solid_queue</code> to thelist of mission control adapters.</p><pre><code class="language-ruby"># application.rb# ...config.mission_control.jobs.adapters = [:solid_queue]</code></pre><p>Now, visiting <code>/jobs</code> on your site should load a dashboard where you can monitoryour Solid Queue jobs.</p><p>But that isn't enough. There is no authentication. For development environments,it is fine, but the <code>/jobs</code> route would be exposed on production too. Bydefault, Mission Control's controllers will extend the host app's<code>ApplicationController</code>. If no authentication is enforced, <code>/jobs</code> will beavailable to everyone.</p><p>To implement some kind of authentication, we can specify a different controlleras the base class for Mission Control's controllers and add the authenticationthere.</p><pre><code class="language-ruby"># application.rb# ...MissionControl::Jobs.base_controller_class = &quot;MissionControlController&quot;</code></pre><pre><code class="language-ruby"># app/controllers/mission_control_controller.rbclass MissionControlController &lt; ApplicationController  before_action :authenticate!, if: :restricted_env?  private    def authenticate!      authenticate_or_request_with_http_basic do |username, password|        username == &quot;solidqueue&quot; &amp;&amp; password == Rails.application.secrets.mission_control_password      end    end    def restricted_env?      Rails.env.staging? || Rails.env.production?    endend</code></pre><p>Here, we have specified that <code>MissionControlController</code> would be our basecontroller for mission control related controllers. Then in<code>MissionControlController</code> we implemented basic authentication for staging andproduction environments.</p><h2>Observations</h2><p>We haven't had any complaints so far. Solid Queue offers simplicity, requires noadditional infrastructure and provides visibility for managing jobs since theyare stored in the database.</p><p>In the coming days, we will migrate all of our 22 Neeto products to Solid Queue.And once cron-style job support lands in Solid Queue, we will completely migratefrom Sidekiq.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Solid Queue & understanding UPDATE SKIP LOCKED]]></title>
       <author><name>Chirag Shah</name></author>
      <link href="https://www.bigbinary.com/blog/solid-queue"/>
      <updated>2024-01-23T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/solid-queue</id>
      <content type="html"><![CDATA[<h2>What is solid queue?</h2><p>Recently, <a href="https://37signals.com/">37signals</a> open sourced<a href="https://dev.37signals.com/introducing-solid-queue">Solid Queue</a>.</p><p>Solid Queue is database based queuing backend for Active Job. In contrastSidekiq and Resque are Redis-based queuing backends.</p><p>In her blog <a href="https://github.com/rosa">Rosa Gutirrez</a> mentioned following lineswhich captured our attention.</p><blockquote><p>In our case, one feature that PostgreSQL has had for quite some time and thatwas finally introduced in MySQL 8, has been crucial to our implementation:</p><p>SELECT ... FOR UPDATE SKIP LOCKED</p><p>This allows Solid Queues workers to fetch and lock jobs without locking otherworkers.</p></blockquote><p>As per her, this feature had been<a href="https://www.postgresql.org/docs/current/sql-select.html#SQL-FOR-UPDATE-SHARE">in PostgreSQL</a>for a while, and now this feature has landed<a href="https://dev.mysql.com/blog-archive/mysql-8-0-1-using-skip-locked-and-nowait-to-handle-hot-rows">in MySQL</a>making it possible to build Solid Queue.</p><p>We had never heard of <code>UPDATE SKIP LOCKED</code> feature either in PostgreSQL or inMySQL. We were wondering what is this <code>UPDATE SKIP LOCKED</code> without which it wasnot possible to build Solid Queue. So we decided to look into it.</p><h2>Processing jobs from a queue</h2><p>Consider a case where we need to build a system where a bunch of jobs need to beprocessed in the background.</p><p>There are a bunch of workers waiting to grab a job and start processing themoment a job becomes available. The challenge is when multiple workers attemptto claim the same job simultaneously how do we ensure that only one of theworkers claims the job for processing. At any point of time, a worker shouldclaim only the &quot;unclaimed&quot; job, and an &quot;unclaimed&quot; job should be claimed by oneand only one worker.</p><p>Here is how one might go about it implementing it.</p><pre><code class="language-sql">START TRANSACTIONSELECT * FROM JOBS WHERE processed='no' LIMIT 1;-- Process the jobCOMMIT;</code></pre><p>With the above code, it's possible that two workers might claim the same job.</p><p>One way to resolve this issue is by marking a particular row as locked forupdate.</p><pre><code class="language-sql">START TRANSACTION;SELECT * FROM JOBS WHERE processed='no' FOR UPDATE LIMIT 1;-- Process the jobCOMMIT</code></pre><p><code>SELECT ... FOR UPDATE</code> locks a particular row, and hence no one else can lockthat record.</p><p>As soon as a new job comes in, multiple workers will execute the above query andwill try to take a lock on that record. The database will ensure that only oneof the workers gets the lock.</p><p>The first worker will take the lock on the record using <code>FOR UPDATE</code>. When otherworkers come to that record and they see that there is a lock <code>FOR UPDATE</code>, theywill wait for the lock to be lifted. Yes, these workers will wait until the lockis released.</p><p>The lock will only be released when the transaction is committed. When thetransaction is committed and the lock is released, then other workers will gethold of the record only to find that the job has already been processed. As youcan see, this is a highly inefficient process.</p><p>That is where <code>FOR UPDATE SKIP LOCKED</code> comes in.</p><h2>SKIP LOCKED skips locked rows</h2><pre><code class="language-sql">START TRANSACTION;SELECT * FROM jobs_table FOR UPDATE SKIP LOCKED LIMIT 1;-- Process the jobCOMMIT;</code></pre><p>Imagine the same scenario here. A job comes in. Multiple workers compete toclaim the job. The database ensures that only one worker gets the lock. Howeverin this case the other workers will move on to the next record. They will notwait. That's what <code>SKIP LOCKED</code> does.</p><p>MySQL has detailed documentation on<a href="https://dev.mysql.com/blog-archive/mysql-8-0-1-using-skip-locked-and-nowait-to-handle-hot-rows/">how SKIP LOCKED works</a>If you want to read in more detail.</p><p>Solid Queue uses <code>FOR UPDATE SKIP LOCKED</code> feature to ensure that a job isclaimed by only one worker.</p><h2>How GoodJob manages job processing without SKIP LOCKED</h2><p><a href="https://github.com/bensheldon/good_job">GoodJob</a> burst into the scene<a href="https://island94.org/2020/07/introducing-goodjob-1-0">around July, 2020</a>.GoodJobs supports only PostgreSQL database because it uses advisory locks toguarantee that no two workers claim the same job.</p><p>PostgreSQL folks understand that the lock mechanism provided by the databasewould not satisfy all the variety of cases that might arise in an application.Advisory locks are a mechanism that allows applications to establish acommunication channel to coordinate actions between different sessions ortransactions. Unlike regular row-level locks enforced by the database system,advisory locks are implemented as a set of low-level functions that applicationscan use to acquire and release locks based on their requirements. We can readmore about it<a href="https://www.postgresql.org/docs/current/explicit-locking.html#ADVISORY-LOCKS">here</a>.</p><p><a href="https://www.postgresql.org/docs/9.1/functions-admin.html#FUNCTIONS-ADVISORY-LOCKS">pg_advisory_lock function</a>will lock the given resource. However, if another session already holds a lock onthe same resourc,e then this function will wait. This is similar to the<code>FOR UPDATE</code> case we saw above.</p><p>However <code>pg_try_advisory_lock function</code> will either obtain the lock immediatelyand return true, or return false if the lock cannot be acquired immediately. Asyou can see, the name has the word <code>try</code>. This function attempts to acquire a lock.If it can't get the lock, then it won't wait. Now this function can be utilizedto build a queuing system.</p><p>Any usage of an advisory lock means the application needs to coordinate action. Itgives more power to the application but it also means more work by theapplication. In contrast, <code>FOR UPDATE SKIP RECORD</code> is natively supported byPostgreSQL.</p><p>Based on the discussions<a href="https://github.com/bensheldon/good_job/issues/896">here</a> and<a href="https://github.com/bensheldon/good_job/discussions/831#discussioncomment-6780579">here</a>,it seems GoodJob is evaluating the possibility of migrating from advisory locksto using <code>FOR UPDATE SKIP LOCKED</code> for better performance. Going through theseissues was quite revealing and I got to learn a lot about things I was unawareof.</p><h2>Delayed job implementation</h2><p><a href="https://github.com/collectiveidea/delayed_job">DelayedJob</a> has been there since2009, long before Sidekiq. It doesn't use <code>SKIP LOCK</code>. Instead it uses a rowlevel locking system by<a href="https://github.com/tobi/delayed_job/blob/719b628bdd54566f80ae3a99c4a02dd39d386c07/lib/delayed/job.rb#L164-L181">updating a field in the job record</a>to indicate that the job is being processed. In short DelayedJob ensures that notwo workers take the same job at the application level without taking any helpin this direction from the database.</p><h2>What about SQLite</h2><p>So far, we discussed PostgreSQL and MySQL. What about SQLite? Does it support<code>SKIP LOCK</code>. No, it doesn't support it, but it's ok. As per the<a href="https://www.sqlite.org/whentouse.html#dbcklst">documentation</a>, it supports<code>only writer at any instant in time</code>.</p><blockquote><p>High Concurrency</p><p>SQLite supports an unlimited number of simultaneous readers, but it will onlyallow one writer at any instant in time. For many situations, this is not aproblem. Writers queue up. Each application does its database work quickly andmoves on, and no lock lasts for more than a few dozen milliseconds. But thereare some applications that require more concurrency, and those applicationsmay need to seek a different solution.</p></blockquote><h2>NOWAIT</h2><p>For completeness, let's discuss the <code>NOWAIT</code> feature. We saw earlier thatif we take a lock on a row using <code>FOR UPDATE</code>, then other workers will wait untilthe lock is released.</p><pre><code class="language-sql">START TRANSACTION;SELECT * FROM JOBS WHERE processed='no' FOR UPDATE NOWAIT LIMIT 1;-- Process the jobCOMMIT</code></pre><p><code>NOWAIT</code> feature allows other transactions to not wait for the lock to bereleased. In this case if a transaction is not able to get a lock on the givenrow then it will raise an error and the application needs to handle the error.</p><p>In contrast, <code>SKIP LOCKED</code> will allow the transaction to move on to the next rowif a lock is already taken.</p><h2>Redis backed queue vs database backed queue</h2><p>Now that we looked at how <code>FOR UPDATE SKIP LOCK</code> helps build queuing systemusing database itself, let's see some pros and cons of each type of queuingsystem.</p><h4>Simplicity and familiarity</h4><p>Database-backed queues are often simpler to set up and manage, especially ifyour application is already using a relational database. There's no need for anadditional dependency like Redis.</p><h4>No Additional Infrastructure</h4><p>Since the job information is stored in the same database as your applicationdata, you don't need to set up and maintain a separate infrastructure like aRedis server.</p><h4>Transactionality</h4><p>Database-backed queues can leverage database transactions, ensuring that boththe job creation and any related database operations are committed or rolledback together. This can be important in scenarios where data consistency iscritical.</p><h4>Modifiability</h4><p>It is easier to modify the jobs stored in the database than Redis, but doing sorequires caution, and it's generally not recommended. In Redis, jobs are oftenstored as serialized data, and modifying them directly is not as straightforwardor common. Redis provides commands to interact with data, but modifying job datadirectly is not a standard practice and could result in data corruption.</p>]]></content>
    </entry>
     </feed>