Author Archives:

Real Time Analytics With Spark Streaming and Cassandra

Spark Streaming is a good tool to roll up transactions data into summaries as they enter the system. When paired with an easily idempotent data store like Cassandra you get a high performance low hassle approach to getting your work done. … Continue reading 

Posted in Cassandra, Spark | Tagged , , | Leave a comment

Retry not Rollback: Idempotent Data Models in Cassandra

Naive Consistency Often the first error handling code I see from new Cassandra users is the client side rollback in an attempt to replicate database transactions from the ACID world. This is typically done when a write to multiple tables … Continue reading 

Posted in Cassandra | 1 Comment

Event Sourcing and System of Record: Sane Distributed Development In The Modern Era

No matter the message queue or broker you rely on whether it be RabbitMQ, JMS, ActiveMQ, Websphere, MSMQ and yes even Kafka you can lose messages in any of the following ways: A downstream system from the broker can have … Continue reading 

Posted in Cassandra, Event Sourcing, Lambda | Tagged , , | 1 Comment

DataStax a Love Letter

Today is my last day at DataStax and what an amazing ride it was. This is easily the best job and the best group of people I’ve ever worked with and I’m very sad to go, but I had an … Continue reading 

Posted in Cassandra | Leave a comment

My Cassandra 2.0 Diagnostics Checklist (Brain Dump)

UPDATE: This list needs to be updated and as of today only has been verified with Cassandra 2.0. Original Blog Post: This isn’t remotely complete, but I had a colleague ask me to do a brain dump of my process … Continue reading 

Posted in Cassandra | Tagged | 4 Comments

Domain Modeling Around Deletes or “Using Cassandra as a queue even when you know better”

Understanding Deletes Delete heavy workloads have a number of pretty serious issues when it comes to using a distributed database. Unfortunately one of the most common delete heavy workloads and the most common desired use case for Cassandra is to … Continue reading 

Posted in Cassandra | Tagged | 8 Comments

Cassandra Query Patterns: Not using the “in” query for multiple partitions.

So lets say you’re doing you’re best to data model all around one partition. You’ve done your homework and all you queries look like this: SELECT * FROM my_keyspace.users where id = 1 Over time as features are added however, … Continue reading 

Posted in Cassandra | 5 Comments

Cassandra Auth: Never use the cassandra user in production!

Normal best practice for security with applications is never to use the default admin user. In Sql Server this is manifested by the recommendation not to use the “sa” user. Likewise in Cassandra the default Cassandra user has full rights … Continue reading 

Posted in Cassandra | Tagged | Leave a comment

Apache Cassandra: Some useful JMX metrics to monitor

This is not a complete list, but is this what I’ve typically had to look out for in the wild. There maybe some selection bias at play since once I’m involved with a cluster it’s not in a good place. … Continue reading 

Posted in Cassandra | Tagged , | 1 Comment

Cassandra C# Driver: Surprising gotcha with SimpleStatement

When helping someone with a Batch using the C# driver and I had a bit of a surprise. I wanted to reuse the CQL and I couldn’t at that point use a Prepare because of a bug, since SimpleStatement has … Continue reading 

Posted in C#, Cassandra | Leave a comment