Dropping tables, dropping columns and renaming columns in a safe way in Ruby on Rails

September 17, 2024

Dropping tables, dropping columns and renaming columns in a safe way in Ruby on Rails

We are building NeetoCal which is a Calendly alternative. Recently, we deployed the latest code to production. The code change involved deleting a table. To our horror, during the deployment, we noticed that some users experienced errors with status code 500 for a few minutes. This happened because the migration to drop the tables ran quickly and the tables got deleted. However, the old code was still referring to those tables.

This kind of issue is pretty common with schema migrations, especially when you're dropping tables or columns. If the migration finishes before the code deployment catches up, you end up with old process still trying to access tables or columns that no longer exist. This mismatch can cause temporary errors, like the 500s we saw.

The safest bet might be to turn on the maintenance mode every time we run a schema migration. At NeetoCal, we deploy changes to production every day. We only want to schedule downtime when it's absolutely necessary. So this option was ruled out. We also heard that some companies manually restart their dynos during schema migrations to roll out the new code changes. However, this did not sit well with us either.

Most of the people we talked to solve this problem by having two step deployments.

Deployment 1: Deploy the code that is not using the table.

Deployment 2: Drop the table.

This can work and it does work. However we were worried about a potential edge case. Let's say that a piece of code is still referring the table. After the deployment 1 this code is working and we don't see anything going wrong. It could be due to recent merges slipping in unnoticed, as we ship very fast.

However when do we deployment 2 the migrations will run first, dropping the table, followed by the deployment of the code. When new code boots up, we realize that one part of the app is not working.

Now we are in trouble. We are in trouble because the table is gone. If we have taken the database backup then we can restore the backup but that causes all kinds of issues because we might not catch this bug for sometime. In the meantime other tables are getting new data. So restoring backup is a messy solution.

The only solution is the fix the code. Now we need to fix the code in a rush. That's what we want to avoid. Before we look at our solution, let's look at what we found when we looked at other solutions.

strong_migrations didn't protect dropping of table

At NeetoCal, we are using the strong_migrations gem to catch unsafe migrations. The gem catches unsafe migration like removing a column but it doesn't capture unsafe operations like dropping a table.

Upon some digging, we found this issue where the author of the gem expressed unwillingness to add drop_table as an unsafe operation.

No worries. We can add dropping of a table as an unsafe operation in strong_migration ourselves. Here's how it can be done.

# config/initializers/strong_migrations.rb

StrongMigrations.add_check do |method, args|
  if method == :drop_table
    stop! "Dropping tables via migrations is discouraged."
  end
end

To drop the table, we can use the safety_ensured block provided by the strong_migrations gem to mark the step as safe.

# db/migrate/20240809131941_drop_users.rb

class DropUsers < ActiveRecord::Migration[8.0]
  def change
    safety_assured { drop_table :users }
  end
end

While this gets the work done, it doesn't solve the problem of "some code still referring the table" problem. Hence, this solution was a "no go" from our side.

Sam wants to delay dropping of tables and columns

Sam Saffron had run into similar problems. He came up with a solution and he wrote about it in this blog.

His solution was not to drop the tables and columns immediately. Instead use "defer drops" to drop column or tables at least 30 minutes after the particular migration was run.

He introduced ColumnDropper and TableDropper to get this work done.

We felt that this solution adds an extra layer of complexity and we rejected this solution. In fact later we found that they ran into some issues with "defer drops" as discussed here.

Dropping tables and columns should be allowed if it follows a pattern

After some internal discussion, we also decided to follow a three-step deployment process to ensure zero downtime and easy roll back without any data loss.

In "Deployment 1", we remove all the code that refers to the table we want to drop. This ensures that nothing in the application is depending on that table anymore.

In "Deployment 2", the table will be renamed. For example, table users will be renamed to users-deprecated-on-2024-08-09. This step helps catch any dangling code that is still referring the old table. If any part of the app still tries to use the table, the errors will show up, and we can fix the problem one of the two ways. We can revert the migration and the code. Or we can change the code. We have a choice. If we delete the table then we don't have a choice.

Finally, in the "Deployment 3", once we’re confident that the table is no longer in use, we can drop it completely. Since the table follows a specific naming pattern, it's clear that it's ready to be safely deleted.

We can follow a similar approach when dropping columns. To add an extra layer of safety, we mark the column that we need to drop as ignored using ActiveRecord's ignored_columns method. For example, if we need to drop the display_name column from the users table, start by marking it as ignored:

class User < ActiveRecord::Base
  self.ignored_columns += [:display_name]
end

By doing this, even if the display_name column still exists in some lingering code, our model won’t recognize it. This helps avoid any accidental references to the column in your code. Once you’ve successfully dropped the column, you can remove this line from your model.

If our model won't recognize it then why do we need RuboCop for dropping a column. Once again the answer is to avoid an edge case. Let's say that we are using executing SQL directly. And this SQL is referring the column display_name. Since a direct SQL is being used adding this column to ignored_columns will have no impact. Once this column is deleted then only we will get to know about the error.

By renaming the column we maintain the data but at the same time all the lingering code would start failing.

RuboCop rules to ensure the policy is followed

Now the task was to build a custom cop to enforce the policy.

# bad
drop_table :users

# bad
drop_table :users do |t|
  t.string :email, null: false
  t.string :first_name, null: false
end

# good
drop_table :users_deprecated_on_2024_08_09

# good
drop_table :users_deprecated_on_2024_08_09 do |t|
  t.string :email, null: false
  t.string :first_name, null: false
end

We need to handle removal of column similarly.

# bad
remove_column :users, :email

# bad
change_table :users do |t|
  t.remove :email
end

# good
remove_column :users, :email_deprecated_on_2024_08_09

# good
drop_table :users do |t|
  t.remove :email_deprecated_on_2024_08_09
end

We added these two cops to our rubocop-neeto repo.

Safely renaming database columns

Renaming a column brings the same challenges as we have discussed in the previous sections. Renaming a column directly will cause temporary downtime since the new code references the new column name while the old code refers to the old column name. To avoid downtime, we need to deliberately carry out this operation in multiple deployments.

Here are the steps to rename the username column to display_name in the users table:

Deployment 1

Create the new column: Start by adding the new display_name column to the table.

class AddDisplayNameToUsers < ActiveRecord::Migration[8.0]
  def change
    add_column :users, :display_name, :string
  end
end

Write to both columns: Update your app so it writes to both the old and new columns. ActiveRecord callbacks can help with this:

class User < ApplicationRecord
  before_save do
    self.display_name = username if will_save_change_to_username? }
  end
end

Backfill data from the old column to the new column: Next, backfill the data from the username column to the display_name column:

User.update_all('display_name = username')

Deployment 2

Move reads from the old column to the new column: Update application to read from the display_name column instead of the old username column, and then remove the double writes to both columns.

Deployment 3

Drop the old column: Finally, drop the old column once everything is in place.

class DropUsernameFromUsers < ActiveRecord::Migration[8.0]
  def change
    remove_column :users, :username
  end
end

This approach might seem tedious, but it's essential for achieving zero downtime during the migration and to avoid any edge cases. We can apply the same steps when renaming tables as well. For more details on that process, check out the steps mentioned by the strong_migrations gem.

Running schema migrations can be scary, especially when they involve dropping tables and columns. But with the right safeguards in place, we can confidently deploy updates without worrying about any surprises.

If this blog was helpful, check out our full blog archive.