Tag Archives: Cassandra

Reflection Scala-2.10 and Spark weird errors when saving to Cassandra

This originally started with this SO question, and I’ll be honest I was flummoxed for a couple of days looking at this (in no small part because the code was doing a lot). But at some point I was able to … Continue reading 

Posted in Cassandra, Spark | Also tagged , | Leave a comment

Logging The Generated CQL from the Spark Cassandra Connector

This has come up some in the last few days so I thought I’d share the available options and the tradeoffs. Option 1: Turn ON ALL THE TRACING! nodetool settraceprobability 1.0 Probabilistic tracing is a handy feature for finding expensive … Continue reading 

Posted in Cassandra, Spark | Also tagged | Leave a comment

Don’t use TextField for your unique key in Solr

This seems immediately obvious when you think about it, but TextField is what you use for fuzzy searches in Solr, and why would a person want a fuzzy search on a unique value? While I can come up with some … Continue reading 

Posted in Cassandra, Solr | Also tagged , | Leave a comment

Spark job that writes to Cassandra just hangs when one node goes down?

If one node takes down your app, do you have any replicas?

Posted in Cassandra, Spark | Also tagged | Leave a comment

Synthetic Sharding with Cassandra. Or How To Deal With Large Partitions.

Extremely overdue that I write this down as it’s a common problem, and really applies to any database that needs to scale horizontally, not just Cassandra. Problem Statement Good partition keys are not always obvious, and it’s easy to create … Continue reading 

Posted in Cassandra | 1 Comment

Cassandra’s “Repair” Should Be Called “Required Maintenance”

One of the bigger challenges when you go Eventually Consistent is how to reconcile data not being replicated. This happens if your using Oracle and multi-data centers with tech like Golden Gate and it happens if you’re using async replicas … Continue reading 

Posted in Cassandra | Also tagged | Leave a comment

Scale is a Dish Best Served Eventually Consistent

A lot of people new to Cassandra find the data modeling required tedious and outrageously hard. They’ll long for their RDMBS, if only insert favorite vendor or project lead here would make their RDBMS scale like Cassandra they could tell their bosses … Continue reading 

Posted in Cassandra, Distributed | Also tagged | 1 Comment

Lambda+ or Event Sourcing with TTLs

Some people are worried about the data volume that a strategy like Lambda+ or Event Sourcing implies. As a disclaimer, by giving up the historical data you have, you risk losing useful data layer and you lose the ability to have … Continue reading 

Posted in Cassandra, Event Sourcing, Lambda | Also tagged , , | Leave a comment

Lambda+: Cassandra and Spark for Scalable Architecture

UPDATE: For some background on Spark Streaming and Cassandra please consult some of my previous blog post on the subject. Many of you have heard of and a few of you may have used the Lambda Architecture. If you’ve not heard … Continue reading 

Posted in Cassandra, Spark | Also tagged , | Leave a comment

Data Density! Destroyer of Scalability

UPDATE: I’d incorrectly attributed a practice to Netflix about scaling down daily . I cannot find any reference to using today, and I’ve been unable to find the previous reference to it. So I’ve just removed the point. I’ll cover cluster … Continue reading 

Posted in Cassandra | Also tagged | 1 Comment