Synthetic Sharding with Cassandra. Or How To Deal With Large Partitions.

15 March, 2016. It was a Tuesday.

Extremely overdue that I write this down as it’s a common problem, and really applies to any database that needs to scale horizontally, not just Cassandra.

Problem Statement

Good partition keys are not always obvious, and it’s easy to create a bad one.

Defining A Bad Partition

Uneven Access. Read OR write count is more than 2x different from one partition to another. This is a purist view and some of you using time series are screaming at me right now, but set that aside, I’ll have another blog post for you, but if you’re new to Cassandra consider it a good principle and goal to aim for.
Uneven Size. Same as above really you can run cfhistograms and if you see really tiny or empty partitions next to really large ones or ones with really high cell count, you are at least on ingest uneven. I would shoot for within an order of magnitude or two. If you’re smart enough to tell me why I’m wrong here that’s fine, I’m not gonna care, but if you’re new to Cassandra this is a good goal.
Too Many Cells. The partition cell count for a given partition is over 100k (run cfhistograms to get these numbers). This is entirely a rule of thumb and varies amazingly between hardware, tuning and column names (length matters). You may find you can add more and not hit a problem and you may find you can’t get near this. If you want to be exacting you should test.
Too Large. Your partition size is over 32mb (also in cfhisograms). This also varies like cell count. Some people tell me this matters less now (as of 2.1), and they run a lot larger. However, I’ve seen it cause problems on a number of clusters. I repeat as a new user this is a good number to shoot for, once you’re advanced enough to tell me why I’m wrong you may ignore this rule. Again you should test your cluster to get the number where things get problematic.

Options if you have a bad partition

Pick a better partition key (read http://www.datastax.com/dev/blog/basic-rules-of-cassandra-data-modeling).
Give up and use Synthetic Sharding.
Pretend it’s not a problem and find out the hard way that it really is, usually this is at 3 am.

Synthetic Sharding Strategy: Shard Table

Pros

Always works.
Easy to parallelize (can be writing to shards in parallel).
Very very common and therefore battle tested.

Cons

May have to do shards of shards for particularly large partitions.
Hard for new users to understand.
Hard to use in low latency use cases (but so are REALLY large partitions, so it’s a problem either way).

Example Idea

Synthetic Sharding Strategy: Shard Count Static Column

Pros

No separate shard table.
No shards of shards problem
2x faster to read when there is a single shard than the shard table option.
Still faster even when there is more more than a single shard than the shard table option.

Cons

Maybe even harder for new users since it’s a little bit of a surprise.
Harder to load concurrently.
I don’t see this in wide use.

Example Idea

Synthetic Sharding Strategy: Known Shard Count

Pros

No separate shard table.
No shards of shards problem
Not as fast as static column shard count option when only a single shard.
Easy to grasp once the rule is explained.
Can easily abstract the shards away (if you always query for example 5 shards, then this can be a series of queries added to a library).
Useful when you just want to shrink the overall size of the partitions by a set order of magnitude, but don’t care so much about making sure the shards are even.
Can use random shard selection and probably call it ‘good enough’
Can even use a for loop on ingest (and on read).

Cons

I don’t see this in wide use.
Shard selection has to be somewhat thoughtful.

Example Idea

Appendix: Java Example For Async

A lot of folks seem to struggle with async queries. So for example using an integer style of shards this would just be a very simple:

← Cassandra’s “Repair” Should Be Called “Required Maintenance”

Spark job that writes to Cassandra just hangs when one node goes down? →