Author Archives:

Lambda+ or Event Sourcing with TTLs

Some people are worried about the data volume that a strategy like Lambda+ or Event Sourcing implies. As a disclaimer, by giving up the historical data you have, you risk losing useful data layer and you lose the ability to have … Continue reading 

Posted in Cassandra, Event Sourcing, Lambda | Tagged , , , | Leave a comment

Lambda+: Cassandra and Spark for Scalable Architecture

UPDATE: For some background on Spark Streaming and Cassandra please consult some of my previous blog post on the subject. Many of you have heard of and a few of you may have used the Lambda Architecture. If you’ve not heard … Continue reading 

Posted in Cassandra, Spark | Tagged , , | Leave a comment

Data Density! Destroyer of Scalability

UPDATE: I’d incorrectly attributed a practice to Netflix about scaling down daily . I cannot find any reference to using today, and I’ve been unable to find the previous reference to it. So I’ve just removed the point. I’ll cover cluster … Continue reading 

Posted in Cassandra | Tagged , | 1 Comment

Real Time Analytics With Spark Streaming and Cassandra

Spark Streaming is a good tool to roll up transactions data into summaries as they enter the system. When paired with an easily idempotent data store like Cassandra you get a high performance low hassle approach to getting your work done. … Continue reading 

Posted in Cassandra, Spark | Tagged , , | Leave a comment

Retry not Rollback: Idempotent Data Models in Cassandra

Naive Consistency Often the first error handling code I see from new Cassandra users is the client side rollback in an attempt to replicate database transactions from the ACID world. This is typically done when a write to multiple tables … Continue reading 

Posted in Cassandra | 1 Comment

Event Sourcing and System of Record: Sane Distributed Development In The Modern Era

No matter the message queue or broker you rely on whether it be RabbitMQ, JMS, ActiveMQ, Websphere, MSMQ and yes even Kafka you can lose messages in any of the following ways: A downstream system from the broker can have … Continue reading 

Posted in Cassandra, Event Sourcing, Lambda | Tagged , , | 1 Comment

DataStax a Love Letter

Today is my last day at DataStax and what an amazing ride it was. This is easily the best job and the best group of people I’ve ever worked with and I’m very sad to go, but I had an … Continue reading 

Posted in Cassandra | Leave a comment

My Cassandra 2.0 Diagnostics Checklist (Brain Dump)

UPDATE: This list needs to be updated and as of today only has been verified with Cassandra 2.0. Original Blog Post: This isn’t remotely complete, but I had a colleague ask me to do a brain dump of my process … Continue reading 

Posted in Cassandra | Tagged | 4 Comments

Domain Modeling Around Deletes or “Using Cassandra as a queue even when you know better”

Understanding Deletes Delete heavy workloads have a number of pretty serious issues when it comes to using a distributed database. Unfortunately one of the most common delete heavy workloads and the most common desired use case for Cassandra is to … Continue reading 

Posted in Cassandra | Tagged | 8 Comments

Cassandra Query Patterns: Not using the “in” query for multiple partitions.

So lets say you’re doing you’re best to data model all around one partition. You’ve done your homework and all you queries look like this: SELECT * FROM my_keyspace.users where id = 1 Over time as features are added however, … Continue reading 

Posted in Cassandra | 6 Comments