Three case studies of debugging redis running out of memory

In this blog we would be discussing three separate case studies of Redis running out of memory. All the three case studies have videos demonstrating how the debugging was done.

All the three videos were prepared for my team members to show how to go about debugging. The videos are being presented "as it was recorded".

First Case Study

When a job fails in Sidekiq, Sidekiq puts that job in RetrySet and retries that job until the job succeeds or the job reaches the maximum number of retries. By default the maximum number of retries is 25. If a job fails 25 times then that job is moved to the DeadSet. By default Sidekiq will store up to 10,000 jobs in the deadset.

We had a situation where Redis was running out of memory. Here is how the debugging was done.

How to inspect the deadset

1ds = Sidekiq::DeadSet.new
2ds.each do |job|
3  puts "Job #{job['jid']}: #{job['class']} failed at #{job['failed_at']}"
4end

Running the following to view the latest entry to the dataset:

1ds.first
2ds.count

To see the memory usage following commands were executed in the Redis console.

1> memory usage dead
230042467
3
4> type dead
5zset

As discussed in the video large amount of payload was being sent. This is not the right way to send data to the worker. Ideally some sort of id should be sent to the worker and the worker should be able to get the necessary data from the database based on the received id.

References

Second case study

In this case the Redis instance of neetochat was running out of memory. The Redis instance had 50MB capacity but we were getting the following error.

1ERROR: heartbeat: OOM command not allowed when used memory > 'maxmemory'.

We were pushing too many geo info records to Redis and that caused the memory to fill up. Here is the video capturing the debugging session.

Followings are the commands that were executed while debugging.

1> ping
2PONG
3
4> info
5
6> info memory
7
8> info keyspace
9
10> keys *failed*
11
12> keys *process*
13
14> keys *geocoder*
15
16> get getocoder:http://ipinfo.io/41.174.30.55/geo?

Third Case Study

In this case the authentication service of neeto was failing because of memory exhaustion.

Here the number of keys was limited but the payload data was huge and all that payload data was hogging the memory. Here is the video capturing the debugging session.

Followings are the commands that were executed while debugging.

1> ping
2
3> info keyspace
4db0:keys=106, expires=86,avg_ttl=1233332728573
5
6> key * (to see all the keys)

Last command listed all the 106 keys. Next we needed to find how much memory each of these keys are using. For that the following commands were executed.

1> memory usage organizations/subdomains/bigbinary/neeto_app_links
2736 bytes
3
4> memory usage failed
510316224 (10MB)
6
7> memory usage dead
829871174 (29MB)

If you liked this blog, you might also like the other blogs we have written. Check out the full archive.

Using enable-load-relative flag in building Ruby binaries

Vishal Yadav

June 20, 2023

Improving the application performance by harnessing the full potential of ancestry gem

Shemin Anto

May 23, 2023

Ruby 3.1 adds Class#subclasses

Ashik Salman

December 27, 2021