ActiveRecord Should Ask For More Than ID
This post is targeted at Rails developers who use
ActiveRecord
with, obviously, a relational
database that allows setting functions as a
columns default. PostgreSQL for example.
What We Wanted
I took part in discussions of an issue
that led to ActiveRecord
migrations accepting
and delegating column default values to a native
database function. PostgreSQL already had this.
For as long as I’ve used it, it’s been possible to
set the default value of a column to a function,
either one that comes with PostgreSQL, an extension, or a
custom procedure. ActiveRecord just didn’t have
the facility to do that. Well, you could still
execute
your way to this desired state.
This was our past (with the mandatory down
migration specification since we called
execute
):
class CreatePosts < ActiveRecord::Migration
def up
create_table :posts do |t|
t.string :title
t.timestamps
end
execute "ALTER TABLE posts ALTER title SET DEFAULT some_function()"
end
def down
drop_table :posts
end
end
And this is our present and future (which I love so much):
class CreatePosts < ActiveRecord::Migration[5.2]
def change
create_table :posts do |t|
t.string :title, default: -> { "some_function()" }
t.timestamps
end
end
end
Now it’s easier to set a database function as the default value of a column using an automatically reversible migration. Lots to love, and a proper thank you is in order to whoever worked on it.
Solving this problem led to another. If you read the comments on the issue, you’ll see what I’m talking about. A new issue has been created to address it. That is what I address in this post.
The Database As A Dumb Store
An ORM, like every other abstraction, tries to live by the true meaning of the word abstraction. It doesn’t demand of you a blind trust but rather it demands that you think of your database, sorry persistence layer (and this is the first pitfall), as, you guessed it, a persistence layer. The place where you store stuff that survive a crash, shutdown, or anything else that makes the computer lose its memory.
Unfortunately the database isn’t a dumb store. As the server tasked with ensuring storage and integrity of data, it gets to have the last word. And rightly so. It shouldn’t (and doesn’t) delegate this responsibility to any application (or layer) above it. It would be irresponsible and lazy of the database if didn’t perform this task.
What this means is that a couple of validations in
your ActiveRecord
model isn’t enough to ensure
data integrity. Technically, bad data could still
end up in your database.
It also means that when you send a valid query to
the database it might not even get the chance to
run. It could be summarily aborted or heavily
modified before it’s applied. There’s not much to
worry about when the query is aborted. AFAICT
ActiveRecord
will behave appropriately. It is
when the query is modified that things could take a
bad turn because of ActiveRecord
's confidence
that it knows what the database will do with such
a query. This is not hubris—it’s all good
intentions. After all it’s an abstraction, one that
requires you to not pay a lot of attention if any
at all to the underlying database, vague-ified as
the persistence layer.
Pitfalls
ActiveRecord
's confidence is a pitfall. In most
cases it does exactly what you want it to. You
start fighting (and eventually hating) it when you
have columns with default values generated by the
database during insert and/or columns whose
eventual value is affected by the result of a
trigger.
What You Insert/Update Is Not What Is Stored
Remember when we said that the database, as part of its job to ensure data integrity, determines the fate of all queries (and their values) it receives? If a query isn’t aborted but run, then the eventual values stored could be very different than what was originally submitted. Let’s look at two of such cases.
Function Defaults
When the default of a column is a function
(such as now()
, gen_random_uuid()
,
etc) the stored value is generated at insert-time
(to use the compile-time, runtime lingo).
Function defaults are awesome. gen_random_uuid()
might be the most popular but stored procedures
are easy to write, can be as complex as needed,
and are tucked away in the database. But
ActiveRecord
punishes you for using a strength
of the database.
Triggers
Triggers are like a Pub/Sub system. They allow you
to listen to events and perform actions before or
after they’re applied. The staple of trigger
examples is audit tables (tables that keep history
of changes to other tables). A rather simple
example is normalizing some column values before
they’re stored. For example, lowercasing email
addresses before they’re stored. Where they usually
shine is complex validations which depend on values
in different tables. I’ve used them liberally this
way and enjoyed the benefit of keeping logic close
to the data they work on. If you haven’t used
triggers before take a look at them. Again,
ActiveRecord
punishes you for using this
incredible database feature.
What’s A Programmer To Do
The database has a mechanism of telling you what
was the eventual values stored. It’s the
RETURNING
clause that you can tuck to
the end of your SQL query to specify which values
should be returned from the result of the query.
RETURNING *
, just like SELECT *
, returns the
entire row, and these are the truest values, which
is what you’d expect when you’ve successfully
inserted a new row or updated an existing one.
But not with ActiveRecord
.
How ActiveRecord
Punishes
Take for example the User model below. token
has
a database function as default, while the email
column’s value is trimmed and lowered before it is
inserted or updated.
class CreateUsers < ActiveRecord::Migration[5.2]
def change
create_table :users do |t|
t.string :email, null: false
t.string :token, null: false, default: -> { "gen_random_uuid()" }
t.timestamps
end
reversible do |dir|
dir.up do
execute <<-SQL
CREATE OR REPLACE FUNCTION lower_trim_email()
RETURNS TRIGGER AS $$
BEGIN
NEW.email = lower(trim(NEW.email));
RETURN NEW;
END;
$$ LANGUAGE plpgsql;
CREATE TRIGGER trg_lower_trim_email
BEFORE INSERT OR UPDATE OF email ON users
FOR EACH ROW
EXECUTE PROCEDURE lower_trim_email();
SQL
end
dir.down do
execute "DROP TRIGGER trg_lower_trim_email ON users"
execute "DROP FUNCTION lower_trim_email()"
end
end
end
end
user = User.create(email: " HeLLo@exaMPLe.oRg ")
puts user.email # " HeLLo@exaMPLe.oRg "
puts user.token # nil
What happened here? During an INSERT
,
ActiveRecord
only asks for the autogenerated
id
back from the database. Thus, while a trimmed
and lowercased email address was stored in the
database, the application continues to use the
original. Even worse, the user object doesn’t have
its token
set. The workaround is to immediately
reload the model after_{create,save}
, and that’s
two queries already where one would suffice:
class Post < ApplicationRecord
after_save :reload
# If you only have default values and no triggers
# that affect updates then use after_create instead.
# after_create :reload
end
During an update, you can forget about what your
triggers did to the values. There is no
RETURNING
clause, even for the specific values
that were changed. This can produce a conflict
where you work with one value in your application
but have another in the database. In the above
example it does.
The ActiveRecord
Way
If immediately reloading models after save bothers
you then don’t use database functions as default
values. And don’t use triggers either. Embrace
ActiveRecord
fully and do as it expects: set the
defaults in the model, transform the attribute
values (like trimming and lowercasing) before
persisting.
I look forward to what the resolution of this new
issue will bring. It’s more complex
than just writing the code to attach RETURNING *
to the generated insert or update query. What I
wish for is a directive on a model (in the manner
of abstract_class
) that marks a model as one
relying on database features such as described
above.
Got comments or corrections for factual errors? There’s a Hacker News thread for that.