Cassandra: Batch Loading Without the Batch — The Nuanced Edition

My previous post on this subject has proven extraordinarily popular and I get commentary on it all the time, most of it quite good. It has however, gotten a decent number of comments from people quibbling with the nuance of the post … Continue reading 

Posted in Cassandra, Java | Leave a comment


A couple of times a week I get a question where someone wants to know how to “failover” to a remote DC in the driver if the local Cassandra DC fails or even if there is only a couple of … Continue reading 

Posted in Cassandra | Leave a comment

Connection to Oracle From Spark

For some silly reason there is a has been a fair amount of difficulty in reading and writing to Oracle from Spark when using DataFrames. SPARK-10648 — Spark-SQL JDBC fails to set a default precision and scale when they are not defined … Continue reading 

Posted in Spark | Tagged , | Leave a comment

Reflection Scala-2.10 and Spark weird errors when saving to Cassandra

This originally started with this SO question, and I’ll be honest I was flummoxed for a couple of days looking at this (in no small part because the code was doing a lot). But at some point I was able to … Continue reading 

Posted in Cassandra, Spark | Tagged , , | Leave a comment

Logging The Generated CQL from the Spark Cassandra Connector

This has come up some in the last few days so I thought I’d share the available options and the tradeoffs. Option 1: Turn ON ALL THE TRACING! nodetool settraceprobability 1.0 Probabilistic tracing is a handy feature for finding expensive … Continue reading 

Posted in Cassandra, Spark | Tagged , | Leave a comment

Don’t use TextField for your unique key in Solr

This seems immediately obvious when you think about it, but TextField is what you use for fuzzy searches in Solr, and why would a person want a fuzzy search on a unique value? While I can come up with some … Continue reading 

Posted in Cassandra, Solr | Tagged , , | Leave a comment

Spark job that writes to Cassandra just hangs when one node goes down?

If one node takes down your app, do you have any replicas?

Posted in Cassandra, Spark | Tagged , | Leave a comment

Synthetic Sharding with Cassandra. Or How To Deal With Large Partitions.

Extremely overdue that I write this down as it’s a common problem, and really applies to any database that needs to scale horizontally, not just Cassandra. Problem Statement Good partition keys are not always obvious, and it’s easy to create … Continue reading 

Posted in Cassandra | Tagged | 1 Comment

Cassandra’s “Repair” Should Be Called “Required Maintenance”

One of the bigger challenges when you go Eventually Consistent is how to reconcile data not being replicated. This happens if your using Oracle and multi-data centers with tech like Golden Gate and it happens if you’re using async replicas … Continue reading 

Posted in Cassandra | Tagged , | Leave a comment

Scale is a Dish Best Served Eventually Consistent

A lot of people new to Cassandra find the data modeling required tedious and outrageously hard. They’ll long for their RDMBS, if only insert favorite vendor or project lead here would make their RDBMS scale like Cassandra they could tell their bosses … Continue reading 

Posted in Cassandra, Distributed | Tagged , | 1 Comment