<?xml version="1.0" encoding="utf-8"?>
    <feed xmlns="http://www.w3.org/2005/Atom">
     <title>BigBinary Blog</title>
     <link href="https://www.bigbinary.com/feed.xml" rel="self"/>
     <link href="https://www.bigbinary.com/"/>
     <updated>2026-05-19T03:34:11+00:00</updated>
     <id>https://www.bigbinary.com/</id>
     <entry>
       <title><![CDATA[Active Job Continuations]]></title>
       <author><name>Vishnu M</name></author>
      <link href="https://www.bigbinary.com/blog/active-jobs-continuations"/>
      <updated>2025-06-09T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/active-jobs-continuations</id>
      <content type="html"><![CDATA[<p>Active Job Continuations was recently merged into Rails. We recommend that you gothrough the description in the<a href="https://github.com/rails/rails/issues/55127">pull request</a> since they are sowell written.</p><p>If you prefer watching a video to learn about Active Job Continuations, then wemade a video for you.</p><p>&lt;iframewidth=&quot;560&quot;height=&quot;315&quot;src=&quot;https://www.youtube.com/embed/r4uuQh1Zog0&quot;title=&quot;YouTube video player&quot;frameborder=&quot;0&quot;allow=&quot;accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture&quot;allowfullscreen</p><blockquote><p>&lt;/iframe&gt;</p></blockquote><p>In short, this feature allows you to configure your jobs in such a manner thatthe job can be interrupted and next time when the job starts, it'll start from aparticular point so that the work done so far is not totally wasted.</p><p>This work is highly inspired by Shopify's<a href="https://github.com/Shopify/job-iteration">job-iteration</a> gem.</p>]]></content>
    </entry><entry>
       <title><![CDATA[Understanding Queueing Theory]]></title>
       <author><name>Vishnu M</name></author>
      <link href="https://www.bigbinary.com/blog/understanding-queueing-theory"/>
      <updated>2025-06-03T12:00:00+00:00</updated>
      <id>https://www.bigbinary.com/blog/understanding-queueing-theory</id>
      <content type="html"><![CDATA[<p><em>This is Part 6 of our blog series on<a href="/blog/scaling-rails-series">scaling Rails applications</a>.</em></p><hr><h2>Queueing Systems</h2><p>In web applications, not every task needs to be processed immediately. When youupload a large video file, send a bulk email campaign, or generate a complexreport, these time-consuming operations are often handled in the background.This is where queueing systems like <a href="https://sidekiq.org/">Sidekiq</a> or<a href="https://github.com/rails/solid_queue">Solid Queue</a> comes into play.</p><p>Queueing theory helps us understand how these systems behave under differentconditions - from quiet periods to peak load times.</p><p>Let's understand the fundamentals of queueing theory.</p><h2>Basic Terminology in Queueing Theory</h2><ol><li><p><strong>Unit of Work</strong>: This is the individual item needing service - a job.</p></li><li><p><strong>Server</strong>: This is one &quot;unit of parallel processing capacity.&quot; In queuingtheory, this doesn't necessarily mean a physical server. It refers to theability to process one unit of work at a time. For JRuby or TruffleRuby, eachthread can be considered a separate &quot;server&quot; since they can execute inparallel. For CRuby/MRI, because of the GVL, the concept of a &quot;server&quot; isdifferent. We'll discuss it later.</p></li><li><p><strong>Queue Discipline</strong>: This is the rule determining which unit of work isselected next from the queue. For Sidekiq and Solid Queue, it is FCFS(FirstCome First Serve). If there are multiple queues, which job is selecteddepends on the priority of the queue.</p></li><li><p><strong>Service Time</strong>: The actual time it takes to process a unit of work (howlong a job takes to execute).</p></li><li><p><strong>Latency/Wait Time</strong>: How long jobs spend waiting in the queue before beingprocessed.</p></li><li><p><strong>Total Time</strong>: The sum of service time and wait time. It's the completeduration from when a job is enqueued until it finishes executing.</p></li></ol><h2>Little's law</h2><p>Little's law is a theorem in queuing theory that states that the average numberof jobs in a system is equal to the average arrival rate of new jobs multipliedby the average time a job spends in the system.</p><pre><code class="language-ruby">L = W</code></pre><p>L = Average number of jobs in the system <br> = Average arrival rate of new jobs <br>W = Average time a job spends in the system</p><p>For example, if jobs arrive at a rate of 10 per minute (), and each job takes30 seconds (W) to complete:</p><p>Average number of jobs in system = 10 jobs/minute * 0.5 minutes = 5 jobs</p><p>This helps us understand the current state of our system and that there are 5jobs in the system on average at any given moment.</p><p><code>L</code> is also called <strong>offered traffic</strong>.</p><p>Note: Little's Law assumes the arrival rate is consistent over time.</p><h3>Managing Utilization</h3><p>Utilization measures how busy our processing capacity is.</p><p>Mathematically, it is the ratio of how much processing capacity we're using tothe processing capacity we have.</p><pre><code class="language-ruby">utilization = (average number of jobs in the system/capacity to handle jobs) * 100</code></pre><p>In other words, it could be written as follows.</p><pre><code class="language-ruby">utilization = (offered_traffic / parallelism) * 100</code></pre><p>For example, if we are using Sidekiq to manage our background jobs then in asingle-threaded case, parallelism is equal to the number of Sidekiq processes.</p><p>Let's look at a practical case with numbers:</p><ul><li>We have 30 jobs arriving every minute</li><li>It takes 0.5 minutes to process a job</li><li>We have 20 Sidekiq processes</li></ul><p>In this case, the utilization will be:</p><pre><code class="language-ruby">utilization = ((30 jobs/minute * 0.5 minutes) / 20 processes) * 100 = 75%</code></pre><h3>High utilization is bad for performance</h3><p>Let's assume we maintain 100% utilization in our system. It means that if, onaverage, we get 30 jobs per minute, then we have just enough capacity to handle30 jobs per minute.</p><p>One day, we started getting 45 jobs per minute. Since utilization is at 100%,there is no extra room to accommodate the additional load. This leads to higherlatency.</p><p>Hence, having a high utilization rate may result in low performance, as it canlead to higher latency for specific jobs.</p><h3>The knee curve</h3><p>Mathematically, it would seem that only when the utilization rate hits 100%should the latency spike up. However, in the real world, it has been found thatlatency begins to increase dramatically when utilization reaches around 70-75%.</p><p>If we draw a graph between utilization and performance then the graph would looksomething like this.</p><p><img src="/blog_images/2025/understanding-queueing-theory/knee-curve.png" alt="The knee curve"></p><p>The point at which the curve bends sharply upwards is called &quot;Knee&quot; in theperformance curve. At this point, the exponential effects predicted by queueingtheory become pronounced, causing the queue latency to climb up quickly.</p><p>Running any system consistently above 70-75% utilization significantly increasesthe risk of spiking the latency, as jobs spend more and more time waiting.</p><p>This would directly impact the customer experience, as it could result in delaysin sending emails or making calls to Twilio to send SMS messages, etc.</p><p>Tracking this latency will be covered in the upcoming blogs. The tracking ofmetrics depends on the queueing backend used (Sidekiq or Solid Queue).</p><h2>Concurrency and theoretical parallelism</h2><p>In Sidekiq, a process is the primary unit of parallelism. However, concurrency(threads per process) significantly impacts a process's effective throughput.Because of GVL we need to take into account how long the job is waiting for I/O.</p><p>The more time a job spends waiting on external resources (like databases orAPIs) rather than executing Ruby code, the more other threads within the sameprocess can run Ruby code while the first thread waits.</p><p>We learned about Amdahl's law in<a href="/blog/amdahls-law-the-theoretical-relationship-between-speedup-and-concurrency">Part 3</a>of this series.</p><p><img src="/blog_images/2025/understanding-queueing-theory/amdahls-law.png" alt="Amdahl's law"></p><p>Where:</p><p><code>p</code> is the portion that can be parallelized (the I/O percentage)</p><p><code>n</code> is the number of threads (concurrency)</p><p>Speedup is equivalent to theoretical parallelism in this context. In queueingtheory, parallelism refers to how many units of work can be processedsimultaneously. When we calculate speedup using Amdahl's Law, we're essentiallydetermining how much faster a multi-threaded system can handle work compared toa single-threaded system.</p><p>Let's assume that a system has an I/O of 50% and a concurrency of 10. ThenSpeedup will be:</p><pre><code class="language-ruby">Speedup = 1 / ((1 - 0.5) + 0.5 / 10) = 1 / 0.55 = 1.82  2</code></pre><p>This means one Sidekiq process with 10 threads will handle jobs twice as fast asSidekiq with a single process with a single thread.</p><p>Let's recap what we are saying here. We are assuming that the system has I/O of50%. System is using a single Sidekiq process with 10 threads(concurrency). Thenbecause of 10 threads the system has a speed gain of 2x compared to systemhaving just a single thread. In other words, just because we have 10 threadsrunning we are not going to gain 10X performance improvement. What those 10threads is getting us is what is called &quot;theoretical parallelism&quot;.</p><p>Similarly, for other values of I/O and Concurrency, we can get the theoreticalparallelism.</p><table><thead><tr><th>I/O</th><th>Concurrency</th><th>Theoretical parallelism</th></tr></thead><tbody><tr><td>5%</td><td>1</td><td>1</td></tr><tr><td>25%</td><td>5</td><td>1.25</td></tr><tr><td>50%</td><td>10</td><td>2</td></tr><tr><td>75%</td><td>16</td><td>3</td></tr><tr><td>90%</td><td>32</td><td>8</td></tr><tr><td>95%</td><td>64</td><td>16</td></tr></tbody></table><p>Let's go over one more time. In the last example, what we are stating is that ifa system has 95% I/O and if the system has 64 threads running, then that willgive 16x performance improvement over the same system running on a singlethread.</p><p>Here is the graph for this data.</p><p><img src="/blog_images/2025/understanding-queueing-theory/concurrency-vs-effective-parallelism.png" alt="Theoretical Parallelism"></p><p>As shown in the graph, a Sidekiq process with 16 threads handling jobs that are75% I/O-bound achieves a theoretical parallelism of approximately 3. In otherwords, 3x performance improvement is there over a single-threaded system.</p><h2>Calculating the number of processes required</h2><p>At the beginning of this article, we discussed &quot;Little's law&quot; and we discussedthat <code>L</code> is also called &quot;offered traffic,&quot; which depicts the &quot;average number ofjobs in the system&quot;.</p><p>If &quot;offered traffic&quot; is 5, then it means we have 5 units of work arriving onaverage that requires processing simultaneously.</p><p>We just learned that if the utilization is greater than 75% then it can causeproblems as there is a risk of latency to spike.</p><p>For queues with low latency requirements(eg. <code>urgent</code>), we need to target alower utilization rate. Let's say we want utilization to be around 50% to be onthe safe side.</p><p>Now we know the utilization rate that we need to target and we know the &quot;offeredtraffic&quot;. So now we can calculate the &quot;parallelism&quot;.</p><pre><code class="language-ruby">utilization = offered_traffic / parallelism=&gt; 0.50 = 5 / parallelism=&gt; parallelism = 5 / 0.50 = 10</code></pre><p>This means we need a theoretical parallelism of 10 to ensure that theutilization is 50% at max.</p><p>Let's assume the jobs in this queue have an average of <code>50%</code> I/O. Based on theabove mentioned graph, we can see that if the concurrency is 10 then we getparallelism of 2. However, increasing the concurrency doesn't increase theparallelism. It means if we want 10 parallelism, then we can't just switch toconcurrency of 50. Even the concurrency of 50 (or 50 threads) will only yield aparallelism of 2.</p><p>So we have no choice but to add more processes. Since one process with 10concurrency is yielding a parallelism of 2 we need to add 5 processes to get 10parallelism.</p><p><em>To get the I/O wait percentage, we can make use of perfm.<a href="https://github.com/bigbinary/perfm?tab=readme-ov-file#sidekiq-gvl-instrumentation">Here</a>is the documentation on how it can be done.</em></p><pre><code class="language-ruby">Total number of Sidekiq processes required = 10 / 2 = 5</code></pre><p>Here we're talking about Sidekiq free version, where we'll only be able to run asingle process per dyno. If we're using Sidekiq Pro, we can run multipleprocesses per dyno via Sidekiq Swarm.</p><p>We can provision 5 dynos for the urgent queue. But we should always have a queuetime based autoscaler like Judoscale enabled to handle spikes.</p><h2>Sources of Saturation</h2><p>We discussed earlier that, in the context of queueing theory, the saturationpoint is typically reached at around 70-75% utilization. This is from the pointof view of further gains by adding more threads.</p><p>However saturation can occur in other parts of the system.</p><h3>CPU</h3><p>The servers running your Sidekiq processes have finite CPU and memory. While CPUusage is a metric we can track for Sidekiq, it's generally not the only one weneed to focus on for scaling decisions.</p><p>CPU utilization can be misleading. If our jobs spend most of their time doingI/O (like making API calls or database queries), in which case CPU usage will bevery low, even when our Sidekiq system is at capacity.</p><h3>Memory</h3><p>Memory utilization impacts performance very differently from CPU utilization.Memory utilization generally exhibits minimal changes in latency or throughputfrom 0% to 100% utilization. However, after 100% utilization, things will startto deteriorate significantly. The system will start using the swap memory, whichcan be very slow and thereby increase the job service times.</p><h3>Redis</h3><p>Another place where saturation can occur is in our datastore i.e Redis in caseof Sidekiq. We have to make sure that we provision a separate Redis instance forSidekiq and also make sure to set the eviction policy to <code>noeviction</code>. Thisensures that Redis will reject new data when the memory limit is reached,resulting in an explicit failure rather than silently dropping important jobs.</p><p><em>This was Part 6 of our blog series on<a href="/blog/scaling-rails-series">scaling Rails applications</a>. If any part of theblog is not clear to you then please write to us at<a href="https://www.linkedin.com/company/bigbinary">LinkedIn</a>,<a href="https://twitter.com/bigbinary">Twitter</a> or<a href="https://bigbinary.com/contact">BigBinary website</a>.</em></p>]]></content>
    </entry>
     </feed>