Raw SQL migrations and their complications
When to use raw SQL?
Sometimes, Rails-style migration scripts might not be enough to meet our needs. Some changes in real-life requires us to run raw SQL code. A good example of this case is adding additional constraints to the database.
We add constraints to our database to ensure the integrity of our data. Constraints are validated every time we make a change to our database. Since constraint validation is a database-level operation, they are always thread-safe. Therefore, database constraints help us prevent race conditions.
We can use Rails migration to add foreign key constraints and unique constraints during table creation.
Some databases provide several constraining techniques which are very unique to the specific DB. exclusion constraint in PostgreSQL and enum constraint in MySQL are some examples. We can't express these in Rails migration scripts as of now.
In such cases, we will have to write raw SQL queries to achieve the desired result.
Writing raw SQL in migrations
Rails supports native queries for migrations too. Here is how we write a native SQL query in Rails migration:
1class SampleSQLMigration < ActiveRecord::Migration[6.1] 2 def up 3 # add an exclusion constraint 4 execute <<~SQL 5 ALTER TABLE sales_orders 6 ADD CONSTRAINT sales_orders_non_deleted_date_unique 7 EXCLUDE (order_date WITH =) WHERE (archived = false); 8 SQL 9 end 10 11 def down 12 # remove the exclusion constraint 13 execute <<~SQL 14 ALTER TABLE sales_orders 15 DROP CONSTRAINT sales_orders_non_deleted_date_unique 16 SQL 17 end 18end
execute method accepts a string of native SQL query and then runs it
on the database.
<<~ squiggly symbol is a plain ruby syntax for writing long multi-line
indented content. Ruby calls it
"Here Documents" or heredocs
This migration script will add a new exclusion constraint
sales_orders_non_deleted_date_unique on the table
constraint ensures that no two rows with same
order_date exists unless any one
of them is
archived. In simpler words, there should only be a single
SaleOrder for any given
Exclusion constraint is a PostgreSQL specific feature and therefore this SQL will succeed only when executed on a PostgreSQL database.
We have to provide the migration separate
down methods instead of a
change method when using native SQL migrations. Otherwise, the
migration won't be reversible. We will discuss more about reversible migrations
in the next section.
Raw SQL is not recommended
Raw SQL migrations can do everything a Rails migration can do. But as discussed earlier, the converse is not true. We can't do everything raw SQL does, using Rails migration scripts.
But still, we use the Rails way for writing migrations. Ever wondered why? There are four strong reasons why raw SQL migrations aren't recommended unless it is the only possible way of achieving our requirement.
- Databases differ in SQL syntax and supported features. An SQL migration script written for PostgreSQL is not guaranteed to work for another database system like SQLite. So we will have to use the same database system for development and production environments.
- Native SQL migrations are inherently non-reversible. Irrespective of the
complexity, Rails isn't capable of finding an appropriate inverse SQL script
from the supplied one as of now. So, we will have to write
downscripts for every migration.
- We manually have to make sure that the names we define adhere to Rails naming conventions to avoid breaking conventions over configuration style.
- Active Record queries are much more readable and maintainable compared to a raw SQL statement.
Reversible and non reversible migrations
If we make any mistakes in the migration script, and if we had migrated that particular migration to database, then reverting the changes of that migration in DB depends on how we had written that migration in the first place.
To make sure we are able to revert changes in database level, we will have to rollback our previous migration and rerun it with modifications. When we rollback a migration, the database will be restored to its previous state.
Rails lets us rollback already executed migrations using the command
bundle exec rails db:rollback.
Even though Rails can automatically devise a strategy for reverting several migration operations, there are some operations that are inherently irreversible.
Some examples will be updating the value of a column or dropping a column. In both cases, there is now way of knowing what data that column contained earlier. So Rails cannot reverse the operation.
Even though it cannot be automated, we can ask Rails to execute a script which will reverse the changes we had made.
As discussed in the earlier chapters, we do this by specifying
methods instead of a single
change method in migration files.
As discussed earlier, Rails isn't smart enough to automatically determine the
inverse of a given SQL script irrespective of its complexity. So we should
down methods if we are using these irreversible
Writing rollback scripts for migrations is only a recommendation. In real life,
we will encounter situations where we cannot specify a strategy for reversing a
migration. Rails do not force us to make our migrations always reversible. We
can use the
change method to define the irreversible migrations in such cases.
Possible operations in Rails migrations
Rails provide us these methods to perform migration operations. Some not-so-relevant methods are omitted to avoid being too verbose:
Description: Used to add a column to an existing table. We will specify the table name, name of the column to add and its data type as parameters.
1add_column :table_name, :column_name, :string
Description: To add index to the specified column. Indices are used to speed up queries in the expense of additional storage. For example:
userstable should be indexed to quickly retrieve user's details using the given
1add_index :table_name, :column_name
Description: Used to change a column's data type. There are some subsidiary methods available as well.
change_column_nullsets or removes a
NOT NULLconstraint on a column.
change_column_defaultchanges the default value of a column.
Note that the change_column command is irreversible.
1change_column :table_name, :column_name, :string 2change_column_null :table_name, :column_name, false 3change_column_default :table_name, :column_name, from: true, to: false
Description: Used to add or remove columns, rename existing columns, change the datatype of columns, add indices to columns, etc, all in one block.
1change_table :table_name do |t| 2 t.remove :column_1, :column_2 3 t.string :column_to_change_datatype 4 t.index :column_to_add_index 5 t.rename :column_old_name, :column_new_name 6end
Description: To create a new table specifying its columns and constraints. As discussed earlier
t.timestampscreates two columns
1create_table :table_name do |t| 2 t.string :column_1 3 t.text :column_2 4 t.timestamps 5end
Description: Drops a table from the database.
The two main options that this command takes in are:
:cascadeto drop dependent objects as well. Defaults to
trueto only drop the table if it exists. Defaults to
1drop_table :table_name, :options
Description: Removes an existing column from a table.
1remove_column :table_name, :column_name
Description: Removes a previously added column index from a table. We can supply
if_exists: trueas an additional parameter to suppress error when trying to remove non-existent indices.
1remove_index :table_name, :column_name 2remove_index :table_name, :column_name, if_exists: true
Description: Renames a single column of specified table.
1rename_column :table_name, :old_column_name, :new_column_name
Seeding initial data
Most of the time we need some basic essential data in our database for our application to start. We want such data to be fed to the database during its initialization.
This process of initializing a database with basic data is called seeding. Rails provides a seeding mechanism that we can use to insert data during database setup.
To see it in action, open the file
db/seeds.rb and add the following content
1oliver = User.create( 2 name: "Oliver", 3 email: "email@example.com", 4 password: "welcome", 5 password_confirmation: "welcome" 6) 7 8Task.create( 9 title: "Initial task.", 10 user_id: oliver.id, 11 creator_id: oliver.id 12)
Now, try resetting the database. Warning: This will clear all the data in your database.
Run the following command to delete and rebuild the database:
1bundle exec rails db:reset
The above command is a shortcut of running all these four commands sequentially:
1bundle exec rails db:drop 2bundle exec rails db:create 3bundle exec rails db:migrate 4bundle exec rails db:seed
Now, if you start the application, we can log in with the username:
firstname.lastname@example.org and password:
welcome without registering any such user
manually. Also, you can see a task assigned to Oliver, when you first log in to
Internals of migrations and schema loads
As you have already seen, the file names of every Rails migration script start with a number. It is actually a timestamp representing the time at which that file was created. Rails calls this timestamp number, the version of the migration script.
Rails uses an internally managed table in our database named
to track which all migrations have been executed. It contains a single column
and it stores the versions of all migrations scripts that have been executed.
Whenever we run
bundle exec rails db:migrate, Rails checks the
schema_migrations table for all executed migrations and then checks versions
of all files in the
db/migrate folder. All the files whose version doesn't
schema_migrations will be executed in ascending order of their
versions specified in filename.
We have one more auto-generated file which keeps track of the executed
migrations, that is the
Whenever we run
bundle exec rails db:migrate, Rails goes through the whole
database structure and rewrites
schema.rb with one big Rails migration script
equivalent of that structure. Also, it copies the version of the last created
migration script to
schema.rb regeneration is independent of the executed script, any
changes we have made manually to the database structure prior to the execution
of migration script will also be reflected in
We can manually request a
schema.rb regeneration using the command
bundle exec rails db:schema:dump.
Significance of schema file
As we have discussed earlier, when initializing a new database, we can use this file to bypass executing all the migrations one by one.
If we run the command
bundle exec rails db:schema:load, then Rails will
execute the migration script defined in
schema.rb and thus initialize the
database structure in one go.
bundle exec rails db:schema:load also marks all of the migration files
preceding the one specified in the
schema.rb as executed. This is done by
adding their version to the
To be specific, Rails compares the timestamp specified in the
the timestamps of all migration files and everything that is prior to the
schema.rb timestamp is added to the table.
In effect, even though we took a shortcut and didn't run any migrations at all,
we marked all of those scripts as executed. So we won't be running them again.
Only the newly added migration scripts will be executed when we run
bundle exec rails db:migrate from now on.
bundle exec rails db:schema:load only loads the structure of the
database. It won't run
bundle exec rails db:seed internally for us. If we have
data to seed, we will have to run
bundle exec rails db:seed manually after
Since Rails migrations can't express some of the advanced SQL features like
exclusion constraints, partial indices, triggers, stored procedures, etc, they
won't be saved to the
schema.rb file and thus won't be restored when we run
bundle exec rails db:schema:load.
So, if we are using such database features in our migration scripts using raw
SQL, then we will have to refrain from initializing our database using
bundle exec rails db:schema:load command.
Bulk mode migrations
Altering an existing table in the following way is a bad idea:
1def change 2 add_column :users, :first_name, :string 3 add_column :users, :last_name, :string 4 add_column :users, :phone_number, :string 5end
This is because each of the queries is performed separately, even if it is done on the same table. Rails translates the above script to the following SQL:
1ALTER TABLE `users` ADD `first_name` varchar(255); 2ALTER TABLE `users` ADD `last_name` varchar(255); 3ALTER TABLE `users` ADD `phone_number` varchar(255);
But, MySQL and PostgreSQL support combining these queries to a single SQL as shown below:
1ALTER TABLE `users` 2 ADD `first_name` varchar(255), 3 ADD `last_name` varchar(255) 4 ADD `phone_number` varchar(255);
Both methods do the same thing, that is, adding three columns to an existing table.
In the database world, executing a single query is much faster than executing three individual queries one by one even though their results are the same. This is because of the optimizations the DB engines can perform when executing operations in batch.
In this case, each of these queries takes almost the same amount of time to complete. That makes the first group of SQL statements three times slower in comparison with the second SQL statement.
Since we will have to keep the server down while running migrations, this is quite a large saving in time, at least for production servers with large databases.
To achieve the same effect in rails migrations we can bring all these three
alterations under a single
change_table method and enable bulk mode. Our
previous example can be re-written as follows in bulk mode:
1def change 2 change_table :users, bulk: true do |t| 3 t.string :first_name 4 t.string :last_name 5 t.string :phone_number 6 end 7end
There is a rubocop rule that enforces bulk mode in table altering migrations. You can refer to details of the cop here.
All changes we have made in this chapter were for demonstration only. We do not need to commit them. To clear the changes run this command:
1git reset --hard HEAD