Search
⌘K
    to navigateEnterto select Escto close

    N+1 Queries and memoization

    What is an N+1 query?

    The N+1 query issue arises when your code executes N number of extra database queries to fetch some data that could have been fetched in a single query.

    This is a major performance bottleneck. Let us understand it better and see how it can bring down application performance with the help of an example:

    1class Movie < ApplicationRecord
    2  has_many :reviews
    3end
    4
    5class Review < ApplicationRecord
    6  belongs_to :movie
    7end

    In the above example, each review belongs to a movie and each movie can have multiple reviews.

    Now, consider a scenario where you are required to fetch a list of all movies along with the reviews for each movie in the list.

    This can be done by querying the movies and then looping over the movies to get the reviews for each movie like so:

    1Movie.all.map do |movie|
    2	movie.reviews
    3end
    4
    5# SELECT "movies".* FROM "Movies"
    6# Review Load (0.1ms)  SELECT "reviews".* FROM "reviews" WHERE "reviews"."movie_id" = ?  [["movie_id", 1]]
    7# Review Load (0.0ms)  SELECT "reviews".* FROM "reviews" WHERE "reviews"."movie_id" = ?  [["movie_id", 2]]
    8# Review Load (0.0ms)  SELECT "reviews".* FROM "reviews" WHERE "reviews"."movie_id" = ?  [["movie_id", 3]]
    9# Review Load (0.0ms)  SELECT "reviews".* FROM "reviews" WHERE "reviews"."movie_id" = ?  [["movie_id", 4]]

    As you can see in the above example, the number of queries made appear to be N+1, where one query fetches the list of movies and then N number of queries fetch the reviews for N number of movies.

    Hence, the number of network calls appear to be proportional to the number of movie records in this case. What if the movie list contained a thousand or more records? Surely making that many calls to the database for a single operation is not efficient.

    In the coming sections we will see how to overcome this issue.

    Query with includes method

    Rails provides an ActiveRecord method called includes which can be used to specify associations to be included in the result set. It loads associated records beforehand using techniques like eager loading or preloading, and thus reducing the number of database calls and improving the application performance significantly.

    Let us take the previous example and try to implement it using includes method to see how it works:

    1movies = Movie.includes(:reviews)
    2
    3# Movie Load (0.5ms)  SELECT "movies".* FROM "movies"
    4# Review Load (0.5ms)  SELECT "reviews".* FROM "reviews" WHERE "reviews"."movie_id" IN (?, ?, ?, ?)  [["movie_id", 1], ["movie_id", 2], ["movie_id", 3], ["movie_id", 4]]

    As you can see from the above example, specifying the reviews association inside includes method has led to a reduced number of queries to fetch the same information that previously took N+1 queries. This can provide a significant boost to the application's performance.

    Now, when you call the reviews method on a movie object from the list of movies, your application won't make an additional database call to get you the information. This is because the information was preloaded.

    Under the hood the includes method either uses the preload method or eager_load method depending upon the type of query to load up the association data in advance.

    Preload

    By default includes delegates to preload method which makes two queries like we saw in the previous example.

    1. First query loads up the the data from the parent model.
    2. Second query loads up data from the specified association.

    Previous example would look the exact same if we were to replace includes with preload like so:

    1movies = Movie.preload(:reviews)
    2
    3# Movie Load (0.5ms)  SELECT "movies".* FROM "movies"
    4# Review Load (0.5ms)  SELECT "reviews".* FROM "reviews" WHERE "reviews"."movie_id" IN (?, ?, ?, ?)  [["movie_id", 1], ["movie_id", 2], ["movie_id", 3], ["movie_id", 4]]

    Eager load

    In case an includes query contains a where clause which contains an attribute from the specified association, then includes delegates to the eager_load method. eager_load method loads all association data along with the data from parent model in a single query using a LEFT OUTER JOIN on the associated table. For example:

    1movies = Movie.eager_load(:reviews).where("reviews.content = 'sample content'")
    2
    3# SQL (0.3ms)  SELECT "movies"."id" AS t0_r0, "movies"."title" AS t0_r1, "reviews"."id" AS t1_r0, "reviews"."content" AS t1_r1, "reviews"."movie_id" AS t1_r2, FROM "movies" LEFT OUTER JOIN "reviews" ON "reviews"."movie_id" = "movies"."id" WHERE (reviews.content = 'sample content')

    We will see how joins work later in this chapter.

    includes method in Rails cannot use the preload method when a query contains the where clause because preload always generates two SQL queries and where clause cannot be used in such a case.

    For example:

    1movies = Movie.preload(:reviews).where("reviews.content = 'sample content'")
    2
    3# Movie Load (1.1ms)  SELECT "movies".* FROM "movies" WHERE (reviews.content = 'sample content')
    4# (Object doesn't support #inspect)

    As you can see, using preload in a query with the where clause when the conditions inside where clause contain attributes from the association table leads to an error because Rails applies the conditions inside where clause on the parent model which is the Movie class in this case.

    This means that you can still use preload with a where clause if the where clause contains attributes which are not from the specified association. For example:

    1movies = Movie.preload(:reviews).where("movies.title = 'sample title'")
    2
    3# Task Load (0.6ms)  SELECT "tasks".* FROM "tasks" WHERE (tasks.title = 'hello')

    In the above example, the where clause doesn't contain any attributes from the association table i.e. the reviews table.

    Specifying multiple associations in includes method

    In Rails, you can specify multiple associations inside the includes method. Consider the following example where the Movie class also has an association with the Actor class:

    1class Movie < ApplicationRecord
    2	has_many :reviews
    3	has_many :actors
    4end

    You can query the reviews and actors along with the movies using the includes method like so:

    1Movie.includes(:reviews, :actors)
    2
    3# Movie Load (0.5ms)  SELECT "movies".* FROM "movies"
    4# Review Load (0.5ms)  SELECT "reviews".* FROM "reviews" WHERE "reviews"."movie_id" IN (?, ?, ?, ?)  [["movie_id", 1], ["movie_id", 2], ["movie_id", 3], ["movie_id", 4]]
    5# Actor Load (0.3ms)  SELECT "actors".* FROM "actors" WHERE "actors"."id" IN (?, ?, ?)  [["movie_id", 1], ["movie_id", 2], ["movie", 3]]

    There is an additional query to load up the information for the actors association in the above example. Thus a separate query will be made for each association specified inside the includes method.

    Loading nested associations with includes

    Suppose each movie has many actors and each actor belongs to a certain region. Can you visualize the nested associations in this case? We can use includes to load nested associations like so:

    1Movie.includes(actors: [:region])
    2
    3# Task Load (1.8ms)  SELECT "movies".* FROM "movies"
    4# User Load (0.3ms)  SELECT "actors".* FROM "actors" WHERE "actors"."id" IN (?, ?, ?, ?)  [["id", 1], ["id", 2], ["id", 3], ["id", 4]]
    5# UserNotification Load (0.1ms)  SELECT "region".* FROM "regions" WHERE "actor"."region_id" IN (?, ?, ?, ?)  [["region_id", 1], ["region_id", 2], ["region_id", 3], ["region_id", 4]]

    Query with joins method

    A join clause is used to combine records from two tables by using values common to each.

    Active Record provides two finder methods for specifying JOIN clauses on the resulting SQL namely joins and left_outer_joins.joins method should be used for INNER JOIN whereas the left_outer_joins method is used for queries using LEFT OUTER JOIN.

    An INNER JOIN query performed on two tables, compares each row of table_1 with each row of table_2 to find all pairs of rows which satisfy the join condition, then combines the rows that match the join condition to produce a single table.

    For example, consider an example where all movie records have to be fetched where the reviews contain less than and equal to 3 stars:

    1movies = Movie.joins(:reviews).where("reviews.stars <= ?", 3)
    2
    3# Movie Load (0.3ms)  SELECT "movies".* FROM "movies" INNER JOIN "reviews" ON "reviews"."task_id" = "movies"."id" WHERE (reviews.stars <= 3)

    In the above example, the joins method is performing an INNER JOIN on movies and reviews table based on the common movie_id value in both tables and producing a table which contains all rows where the review contains 3 or less than 3 stars.

    A LEFT OUTER JOIN is an extension on INNER JOIN, where an INNER JOIN join is performed first, then for each row in table_1 which does not satisfy the join condition with any row in table_2, a joined row is added with null values in columns of table_2. Thus, the joined table always has at least one row for each row in table_1.

    joins vs includes

    You should use joins when you are merely filtering records based on values in associations and you should use includes when you have to access the values in associations because joins method does not load the association data.

    For example, if you have to filter out movie records based on reviews, then use joins and if you have to access the reviews and render them then you should use includes.

    Memoization

    Memoization is a technique used to cache the results of a time consuming and resource intensive operation so that it only needs to be done once. Examples of such operations could be querying data from the database or hitting an external service.

    Although it is a good idea to cache the results of such operations, it doesn't make sense for us to do so when we only require the results once.

    Ruby provides a very idiomatic way for memoizing values with the help of an conditional assignment operator (||=). A conditional assignment operator assigns a value to a variable if the value inside the variable is either false or nil.

    In other words, if the boolean equivalent of value inside the left-hand side of the conditional assignment operator is true then the value resulting from the left-hand side expression will not be assigned to the variable on left-hand side.

    For example:

    1result ||= expensive_method_invocation

    is the same as:

    1result = (result || expensive_method_invocation)

    The code in above example basically tells Ruby to invoke the expensive method and store the result in result variable if the boolean equivalent of value inside result is not true. If result contains a truthy value then the expression on right hand side of the conditional assignment operator will not be evaluated.

    Since a conditional assignment operator will prevent the execution of the right-hand side expression, it should not be used to store results from an operation whose value is likely to change. For example, a method that returns a value based on the current time. In such a case, memoization is not a good choice as you will get stale value present in the result variable.

    Let us see a hypothetical example to understand how we can use memoization in Rails. As per our Granite application, while deleting a user, we assign back the tasks assigned to that user to the task owner. Consider a scenario where we have an admin user and instead of assigning back the tasks to the task owner the tasks have to be assigned back to the admin, like so:

    1class UserDeletionService
    2  ADMIN_EMAIL = "oliver@example.com"
    3
    4  attr_reader :user_id
    5
    6  def initialize(user_id)
    7	  @user_id = user_id
    8  end
    9
    10  def process
    11	  assign_tasks_to_admin
    12	  delete_user
    13  end
    14
    15  private
    16
    17	  def assign_tasks_to_admin
    18	    user.assigned_tasks.each do |task|
    19	      task.update(assigned_user: admin)
    20	    end
    21	  end
    22
    23	  def delete_user
    24	    user.destroy
    25	  end
    26
    27	  def user
    28	    @_user ||= User.find_by(id: user_id)
    29	  end
    30
    31	  def admin
    32	    @_admin ||= User.find_by(email: ADMIN_EMAIL)
    33	  end
    34end

    In the above example, we have declared a UserDeletionService which first assigns the tasks to admin and then deletes the user. Inside the assign_tasks_to_admin method, we are calling the user method for the first time. When user method gets called, it will query the database for the user based on the user_id and store the result of the query in the @_user variable. variable. At Bigbinary we prefix the memoized instance variable with an underscore, like @_user.

    When the user method gets called again for the second time in the delete_user method, the query to fetch the user won't get executed again because the @_user variable already contains the user. Thus saving time and resources, and most importantly reduces a database query.

    While assigning the tasks to admin, we are looping over all the tasks assigned to the user to be deleted and then assigning each task to the admin. admin method will get called during each iteration but the query to fetch the admin will only be executed once during the first run of the iteration and for all subsequent runs, value stored inside the @_admin variable will be used.

    In the Granite application we have a private method inside the ApplicationController which returns the @current_user:

    1class ApplicationController < ActionController::Base
    2  protect_from_forgery with: :exception
    3  include Authorizable
    4
    5  def authenticate_user_using_x_auth_token
    6    user_email = request.headers["X-Auth-Email"]
    7    auth_token = request.headers["X-Auth-Token"].presence
    8    user = user_email && User.find_by_email(user_email)
    9
    10    if user && auth_token && ActiveSupport::SecurityUtils.secure_compare(user.authentication_token, auth_token)
    11      @current_user = user
    12    else
    13      render status: :unauthorized, json: { error: t("session.could_not_auth") }
    14    end
    15  end
    16
    17  private
    18
    19    def current_user
    20      @current_user
    21    end
    22end

    You might be tempted to memoize the initialization of @current_user instance variable. The reason we haven't used memoization here is because, authenticate_user_using_x_auth_token method gets called for every request and for each request we need to fetch the user based on the email present in X-Auth-Email key in request headers.

    Since the user can vary across requests we should not use memoization in this case. We have already discussed that memoization isn't a good choice to store values which are likely to change frequently.

    Another reason why memoization won't work here is because, a new instance of ApplicationController is created for each request and the value inside memoized variables do not persist across different instances of a class. If you had noticed, a memoized variable is an instance variable, for example, @_user, which means they belong to a particular instance of the class.

    This is an in-depth chapter and you do not need to commit any changes.

    Previous
    Next