Tag Archives: Spark

Connection to Oracle From Spark

For some silly reason there is a has been a fair amount of difficulty in reading and writing to Oracle from Spark when using DataFrames. SPARK-10648 — Spark-SQL JDBC fails to set a default precision and scale when they are not defined … Continue reading 

Posted in Spark | Also tagged | Leave a comment

Reflection Scala-2.10 and Spark weird errors when saving to Cassandra

This originally started with this SO question, and I’ll be honest I was flummoxed for a couple of days looking at this (in no small part because the code was doing a lot). But at some point I was able to … Continue reading 

Posted in Cassandra, Spark | Also tagged , | Leave a comment

Logging The Generated CQL from the Spark Cassandra Connector

This has come up some in the last few days so I thought I’d share the available options and the tradeoffs. Option 1: Turn ON ALL THE TRACING! nodetool settraceprobability 1.0 Probabilistic tracing is a handy feature for finding expensive … Continue reading 

Posted in Cassandra, Spark | Also tagged | Leave a comment

Spark job that writes to Cassandra just hangs when one node goes down?

If one node takes down your app, do you have any replicas?

Posted in Cassandra, Spark | Also tagged | Leave a comment

Lambda+: Cassandra and Spark for Scalable Architecture

UPDATE: For some background on Spark Streaming and Cassandra please consult some of my previous blog post on the subject. Many of you have heard of and a few of you may have used the Lambda Architecture. If you’ve not heard … Continue reading 

Posted in Cassandra, Spark | Also tagged , | Leave a comment

Real Time Analytics With Spark Streaming and Cassandra

Spark Streaming is a good tool to roll up transactions data into summaries as they enter the system. When paired with an easily idempotent data store like Cassandra you get a high performance low hassle approach to getting your work done. … Continue reading 

Posted in Cassandra, Spark | Also tagged , | Leave a comment