Cassandra & Rails: Cequel Batch Support


Cassandra On Rails using Cequel Atomic Batch Support When I first tried to use Cassandra with Rails over a year ago the lack of a good native (IE not thrift) driver and a good mapper was a show stopper for the project I was on. Today the situation has improved markedly with the cql-rb driver proving stable in usage, and the excellent Cequel ActiveModel capable library switching to use cql-rb.

I’d like to cover a common use case that’s not been handled by ORM style mapping libraries very well, atomic batches. Cequel has support for this even though you have to hunt a bit for it and it works well. Consider the following scenario, you’ve got 2 related tables, and you want one to update when the other one changes.

class Post 
 include Cequel::Record
 belongs_to :blog key :id, :uuid, auto: true
 column :body, :text
 column :author, :text
 column :title, :text 
end

class Blog 
 include Cequel::Record
 has_many :posts key :id, :uuid, auto: true
 column :author, :text 
end

If I want to say update author name with an Active Record style interface I’m totally out of luck, and have to maintain this state separately.

post.blog = blog
post.author = "John Smith"
blog.author = "John Smith"
blog.save!

CQL (2ms) UPDATE blogs SET author = 'Ryan Svihla' WHERE id = c5cdd562-0c2e-11e4-bd14-1d762bc51d5b

Since I forgot to save my post no updates occur. I can use callbacks to help however, and Cequel is smart enough to wrap all of these in an Atomic Batch for me.

class Blog 
 include Cequel::Record
 has_many :posts key :id, :uuid, auto: true
 column :author, :text 

 after_destroy :delete_all_posts
 after_update :update_author_name

 def delete_all_posts
  self.posts.each { |p| p.destroy }
 end

 def update_author_name
  self.posts.each { |p| 
   p.author = self.author
   p.save!
  } 
 end 
end

Now an update will have the behavior we want perform an atomic batch across all tables and rows acted on in the callback. This means that even if the client and even the coordinator node fails eventually this will be done.

post.blog = blog
post2.blog = blog
blog.author = "John Smith"
blog.save!

CQL (2ms) BEGIN BATCH
UPDATE blogs SET author = ‘John Smith’ WHERE id = 48097216-0c2f-11e4-9d4b-230a125e3b62
UPDATE posts SET author = ‘John Smith’ WHERE blog_id = 48097216-0c2f-11e4-9d4b-230a125e3b62 AND id = 4d7fe89c-0c2f-11e4-9d4b-230a125e3b62
UPDATE posts SET author = ‘John Smith’ WHERE blog_id = 48097216-0c2f-11e4-9d4b-230a125e3b62 AND id = 62cc675c-0c2f-11e4-9d4b-230a125e3b62
APPLY BATCH

Likewise if we want to delete the blog, we can rely on all child posts being deleted in a batch manner

post.blog = blog
post2.blog = blog
blog.destroy

CQL (4ms) BEGIN BATCH
DELETE FROM blogs WHERE id = 48097216-0c2f-11e4-9d4b-230a125e3b62
DELETE FROM posts WHERE blog_id = 48097216-0c2f-11e4-9d4b-230a125e3b62 AND id = 4d7fe89c-0c2f-11e4-9d4b-230a125e3b62
DELETE FROM posts WHERE WHERE blog_id = 48097216-0c2f-11e4-9d4b-230a125e3b62 AND id = 62cc675c-0c2f-11e4-9d4b-230a125e3b62
APPLY BATCH

This is all a big step in the right direction and closer to what I’ve envisioned for automatic batch support of related tables than anything I’ve used yet besides my own work (which is on hold for the extended future).

Cequel is the best mapping library I’ve used to date regardless of language for Cassandra. You should check it out, they get the important parts.

Buffer Cache Makes Slow Disks Seem Fast, Till You Need Them.