Powerfully simple persistence: MongoDB


In my post “Great time to be a developer“, I listed MongoDB as one of the tools that made my task (track travel times for a given route) easy. This post will show you how.

What do I need to store?

My travel time data collection job needs the URL for the traffic data endpoint for each route that I’ll be tracking. I could have just hardcoded the URL in the script, but I knew that my co-workers would be interested in tracking their routes too, so it made sense to store the list of routes in the database.

I need to store the list of ‘trips’. I define a trip as the reported travel details for a given route and departure time (Josh’s route at 9am, Josh’s route at 9:10am, Tim’s route at 9:10am, etc.). I want to capture the date of each trip so that I can chart the trips for a given day, and compare day to day variation. Even though I really only need to the total travel time for each trip, I want to capture the entire response  from the traffic service (travel times, directions, traffic delay, etc.) so that I could add new visualizations in the future.

Setup

First, I had to install mongo on my laptop. I used the homebrew package manager, but binary releases are readily available.

brew install mongodb

I need to add the route for my commute. I fire up the mongo console by typing mongo. I’m automatically connected to the default ‘test’ database in my local mongodb server. I add my route:

> db.routes.save({
  name: 'josh',
  url: 'http://theurlformyroute...'
})

I verify the route was saved:

> db.routes.find()
{"_id" : ObjectId("4f22434d47dd721cf842bdf6"),
 "name" : "josh",
 "url" : "http://theurlformyroute..." }

It is worth noting that I haven’t skipped any steps. I fired up the mongo console, ran the save command, and now I have the route in my database. I didn’t need to create a database, since the ‘test’ database works for my needs. I didn’t need to define the routes collection – it was created as soon as I stored something in it. I didn’t need to define a schema for the data I’m storing, because there is no schema. I am now ready to run my data collection script.

Save some data

I’ll use the ruby MongoDB driver (gem install mongo) directly (you can also use something like mongoid or mongomapper for a higher-level abstraction). My update script needs to work with the URL for each route:

db = Mongo::Connection.new.db("test")
db["routes"].find({}, :fields => {"url" => 1}).each do |route|
  url = route["url"]
  # collect trip data for this route's url
end

I want to group related trips for a commute, so I create a ‘date_key’ based on the current date/time. A date_key looks like: 2012-01-25_AM, 2012-01-25_PM, or 2012-01-26_AM. Now to store the details returned from the traffic service:

trip_details = TrafficSource.get(url)
db["routes"].update({"_id" => route["_id"]}, {
  "$addToSet" => {"trip_keys" => date_key},
  "$push" => {"trips.#{date_key}" => trip_details}
})

After running for a couple days, this will result in a route document that looks something like:

{
  _id: 1234,
  name: 'josh',
  url: 'http://mytravelurl...',
  trip_keys: ['2012-01-25_AM', '2012-01-25_PM', '2012-01-26_AM',...],
  trips: {
    2012-01-25_AM: [{departure: '9:00', travelTime: 24, ...}, {departure: '9:10', travelTime: 26}, ...],
    2012-01-25_PM: [{departure: '9:00', travelTime: 28, ...}, {departure: '9:10', travelTime: 29}, ...],
    2012-01-26_AM: [{departure: '9:00', travelTime: 25, ...}, {departure: '9:10', travelTime: 25}, ...],
    ...
  }
}

That is *all* of the MongoDB-related code in the data collection script. I haven’t left out any steps – programmatic, or administratrive. None of the structue was defined ahead of time. I just ‘$push’ed some trip details into ‘trips.2012-01-25_AM’ on the route. It automatically added an object to the ‘trips’ field, with a ‘2012-01-25_AM’ field, which holds an array of trip details. I also store a list of unique keys in the trip_keys field using $addToSet in the same `update` statement.

Show the data

The web page that charts the travel times makes a single call to MongoDB:

route = db["routes"].find_one(
  {:name => 'josh'},
  :fields => {"trips" => 1}
)

The entire trips field, containing all of the trips grouped by date_key, is now available in the ruby hash route. With a little help from ruby’s Enumerable#map, I transform the data into a format consumable by Highcharts JS.

Production

Just to be thorough, I’ll mention that I had to modify the script for production use. I replaced the `db` local variable with a method that uses the mongolab connection when available, or falls back to the local test connection:

def db
  @db ||=
  begin
    mongolab_uri = ENV['MONGOLAB_URI']
    return Mongo::Connection.new.db("test") unless mongolab_uri
    uri = URI.parse(mongolab_uri)
    Mongo::Connection.from_uri(mongolab_uri).db(uri.path.gsub(/^\//, ''))
  end
end

Conclusion

A couple queries, a single, powerful update statement, and no administration or schema preparation. Paired with the ruby driver‘s seemless mapping to native Hash objects, it is hard to imagine a simpler, equally powerful, persistence strategy for this type of project.

Great time to be a developer