Category Archives: Spark
For some silly reason there is a has been a fair amount of difficulty in reading and writing to Oracle from Spark when using DataFrames. SPARK-10648 — Spark-SQL JDBC fails to set a default precision and scale when they are not defined … Continue reading
This originally started with this SO question, and I’ll be honest I was flummoxed for a couple of days looking at this (in no small part because the code was doing a lot). But at some point I was able to … Continue reading
This has come up some in the last few days so I thought I’d share the available options and the tradeoffs. Option 1: Turn ON ALL THE TRACING! nodetool settraceprobability 1.0 Probabilistic tracing is a handy feature for finding expensive … Continue reading
If one node takes down your app, do you have any replicas?
UPDATE: For some background on Spark Streaming and Cassandra please consult some of my previous blog post on the subject. Many of you have heard of and a few of you may have used the Lambda Architecture. If you’ve not heard … Continue reading
Spark Streaming is a good tool to roll up transactions data into summaries as they enter the system. When paired with an easily idempotent data store like Cassandra you get a high performance low hassle approach to getting your work done. … Continue reading