MVP how minimal

    MVPs or Minimum Viable Products are pretty contentious ideas for something seemingly simple. Depending on background and where pepole are coming from experience wise those terms carry radically different ideas. In recent history I’ve seen up close two extreme constrasting examples of MVP:

    • Mega Minimal: website and db, mostly manual on the backend
    • Mega Mega: provisioning system, dynamic tuning of systems via ML, automated operations, monitoring a few others I’m leaving out.


    If we’re evaluating which approach gives us more feedback, Mega Minimal MVP is gonna win hands down here. Some will counter they don’t want to give people a bad impression with a limited product and that’s fair, but it’s better than no impression (the dreaded never shipped MVP). The Mega Mega MVP I referenced took months to demo. only had one of those checkboxes setup and wasn’t ever demod again. So we can categorical say that failed at getting any feedback.

    Whereas the Mega Minimal MVP, got enough feedback and users for the founders to realize that wasn’t a business for them. Better than after hiring a huge team and sinking a million plus into dev efforts for sure. Not the happy ending I’m sure you all were expecting, but I view that as mission accomplished.

    Core Value

    • Mega Minimal, they only focused on a single feature, executed well enough that people gave them some positive feedback, but not enough to justify automating everything.
    • Mega Mega. I’m not sure anyone who talked about the product saw the same core value, and there were several rewrites and shifts along the way.

    Advantage Mega Minimal again

    What about entrants into a crowded field

    Well that is harder and the MVP tends to be less minimal, because the baseline expectations are just much higher. I still lean towards Mega Minimal having a better chance at getting users, since there is a non zero chance the Mega Mega MVP will never get finished. I still think the exercise in focusing on core value that makes your product not a me too, and even considering how you can find a niche in a crowded field instead of just being “better”, and your MVP can be that niche differentiator.

    Internal users

    Sometimes a good middle ground is considering getting lots of internal users if you’re really worried about bad experiences. This has it’s it’s definite downsides however, and you may not get diverse enough opinions. But it does give you some feedback while saving some face or bad experiences. I often think of the example of EC2 that was heavily used by Amazon, before being released to the world. That was a luxury Amazon had, where their customer base and their user base happened to be very similar, and they had bigger scale needs than any of their early customers, so the early internal feedback loop was a very strong signal.


    In the end however you want to approach MVPs is up to you, and if you find success with a meatier MVP than I have please don’t let me push you away from what works. But if you are having trouble shipping and are getting pushed all the time to add one more feature to that MVP before releasing it, consider stepping back and asking is this really core value for the product? Do you already have your core value? if so, consider just releasing it.

    Surprise Go is ok for me now

    I’m surprised to say this, I am ok using Go now. It’s not my style but I am able to build most anything I want to with it, and the tooling around it continues to improve.

    About 7 months ago I wrote about all the things I didn’t really care for in Go and now I either no longer am so bothered by it or things have improved.

    Go Modules so far is a huge improvement over Dep and Glide for dependency management. It’s easy to setup, performant and eliminates the GOPATH silliness. I haven’t tried it yet with some of the goofier libraries that gave me problems in the past (k8s api for example) so the jury is out on that, but again pretty impressed. I now longer have to check in vendor to speed up builds. Lesson use Go Modules.

    I pretty much stopped using channels for everything but shutdown signals and that fits my preferences pretty well, I use mutex and semaphores for my multithreaded code and feel no guilt about it. This cut out a lot of pain for me, and with the excellent race detector I feel really comfortable writing multi-threaded in Go now. Lesson, don’t use channels much.

    Lack of generics still sometimes sucks but I usually implement some crappy casting with dynamic types if I need that. I’ve sorta made my piece with just writing more code, and am no longer so hung up. Lesson relax.

    Error handling I’m still struggling with, I thought about using one of the error Wrap() libraries but an official one is in draft spec now, so I’ll wait on that. I now tend to have less nesting of functions as a result, this probably means longer functions than I like, but my code looks more “normal” now. This is a trade off I’m ok with. Lesson relax more.

    I see the main virtue of Go now that it is very popular in the infrastructure space where I am and so it’s becoming the common tongue (largely replacing Python for those sorts of tasks). For this, honestly it’s about right. It’s easy to rip out command line tools and deploy binaries for every platform with no runtime install.

    The community’s conservative attitude I sort of view as a feature now, in that there isn’t a bunch of different options that are popular and there is no arguing over what file format is used. This drove me up the wall initially, but I appreciate how much less time I spend on these things now.

    So now I suspect Go will be my “last” programming language. It’s not the one I would have chosen, but where I am at in my career, where most of my dev work is automation and tooling it fits the bill pretty well.

    Also equally important most of the people working with me didn’t have full time careers as developers or spend their time reading “Domain Driven Design” (amazing book) so adding in a bunched of nuanced stuff that maybe technically optimal but assumes the reader grasps all that nuance isn’t a good tradeoff for me.

    So I think I sorta get it now. I’ll never be a cheerleader for the language but it definitely solves my problems well enough.

    Lessons from a year of Golang

    I’m hoping to share in a non-negative way help others avoid the pitfalls I ran into with my most recent work building infrastructure software on top of a Kubernetes using Go, it sounded like an awesome job at first but I ran into a lot of problems getting productive.

    This isn’t meant to evaluate if you should pick up Go or tell you what you should think of it, this is strictly meant to help people out that are new to the language but experienced in Java, Python, Ruby, C#, etc and have read some basic Go getting started guide.

    Dependency management

    This is probably the feature most frequently talked about by newcomers to Go and with some justification, as dependency management in Go has been a rapidly shifting area that’s nothing like what experienced Java, C#, Ruby or Python developers are used to.

    Today, the default tool is Dep all other tools I’ve used such as Glide or Godep are deprecated in favor of Dep, and while Dep has advanced rapidly there are some problems you’ll eventually run into (or I did):

    1. Dep hangs randomly and is slow, which is supposedly network traffic but it happens to everyone I know with tons of bandwidth. Regardless, I’d like an option to supply a timeout and report an error.
    2. Versions and transitive depency conflicts can be a real breaking issue in Go. So without shading or it’s equivalent two package depending on different versions of a given package can break your build, there are a number or proposals to fix this but we’re not there yet.
    3. Dep has some goofy ways it resolves transitive dependencies and you may have to add explicit references to them in your Gopkg.toml file. You can see an example here under Updating dependencies – golang/dep.

    My advice

    • Avoid hangs by checking in your dependencies directly into your source repository and just using the dependency tool (dep, godep, glide it doesn’t matter) for downloading dependencies.
    • Minimize transitive dependencies by keeping stuff small and using patterns like microservices when your dependency tree conflicts.


    Something that takes some adjustment is you check out all your source code in one directory with one path (by default ~/go/src ) and include the path to the source tree to where you check out. Example:

    1. I want to use a package I found on github called jim/awesomeness
    2. I have to go to ~/go/src and mkdir -p
    3. cd into that and clone the package.
    4. When I reference the package in my source file it’ll be literally importing

    A better guide to GOPATH and packages is here.

    My advice

    Don’t fight it, it’s actually not so bad once you embrace it.

    Code structure

    This is a hot topic and there are a few standards for the right way to structure you code from projects that do “file per class” to giant files with general concept names (think like types.go and net.go). Also if you’re used to using a lot of sub package you’re gonna to have issues with not being able to compile if for example you have two sub packages reference one another.

    My Advice

    In the end I was reasonably ok with something like the following:

    • myproject/bin for generated executables
    • myproject/cmd for command line code
    • myproject/pkg for code related to the package

    Now whatever you do is fine, this was just a common idiom I saw, but it wasn’t remotely all projects. I also had some luck with just jamming everything into the top level of the package and keeping packages small (and making new packages for common code that is used in several places in the code base). If I ever return to using Go for any reason I will probably just jam everything into the top level directory.


    No debugger! There are some projects attempting to add one but Rob Pike finds them a crutch.

    My Advice

    Lots of unit tests and print statements.

    No generics

    Sorta self explanatory and it causes you a lot of pain when you’re used to reaching for these.

    My advice

    Look at the code generation support which uses pragmas, this is not exactly the same as having generics but if you have some code that has a lot of boiler plate without them this is a valid alternative. See this official Go Blog post for more details.

    If you don’t want to use generation you really only have reflection left as a valid tool, which comes with all of it’s lack of speed and type safety.

    Cross compiling

    If you have certain features or dependencies you may find you cannot take advantage of one of Go’s better features cross compilation.

    I ran into this when using the confluent-go/kafka library which depends on the C librdkafka library. It basically meant I had to do all my development in a Linux VM because almost all our packages relied on this.

    My Advice

    Avoid C dependencies at all costs.

    Error handling

    Go error handling is not exception base but return based, and it’s got a lot of common idioms around it:

    myValue, err := doThing()
    if err != nil {
    	return -1, fmt.Errorf(“unable to doThing %v”, err)

    Needless to say this can get very wordy when dealing with deeply nested exceptions or when you’re interacting a lot with external systems. It is definitely a mind shift if you’re used to the throwing exceptions wherever and have one single place to catch all exceptions where they’re handled appropriately.

    My Advice

    I’ll be honest I never totally made my peace with this. I had good training from experienced opensource contributors to major Go projects, read all the right blog posts, definitely felt like I’d heard enough from the community on why the current state of Go error handling was great in their opinions, but the lack of stack traces was a deal breaker for me.

    On the positive side, I found Dave Cheney’s advice on error handling to be the most practical and he wrote a package containing a lot of that advice, I found it invaluable as it provided those stack traces I missed.


    A lot of people really love Go and are very productive with it, I just was never one of those people and that’s ok. However, I think the advice in this post is reasonably sound and uncontroversial. So, if you find yourself needing to write some code in Go, give this guide a quick perusal. Hope it helps.

    Cassandra: Batch Loading Without the Batch — The Nuanced Edition

    My previous post on this subject has proven extraordinarily popular and I get commentary on it all the time, most of it quite good. It has however, gotten a decent number of comments from people quibbling with the nuance of the post and pointing out it’s failings, which is fair because I didn’t explicitly spell this out as a “framework of thinking” blog post or a series of principles to consider. This is the tension between making something approachable and understandable to the new user but still technically correct for the advanced one. Because of where Cassandra was at the time and the user base I was encountering day to day, I took the approach of simplification for the sake of understanding.

    However, now I think is a good time to write this up with all the complexity and detail of a production implementation and the tradeoffs to consider.


    1. Find the ideal write size it can make a 10x difference in perf (10k-100k is common).
    2. Limit threads in flight when writing.
    3. Use tokenaware unlogged batches if you need to get to your ideal size.

    Details on all this below.

    You cannot escape physics {#23cb}

    Ok it’s not really physics, but it’s a good word to use to get people to understand you have to consider your hard constraints and that wishing them away or pretending they’re not there will not save you from their factual nature.

    So now lets talk about some of the constraints that will influence your optimum throughput strategy.

    large writes hurtNow that the clarification is out of the way. Something the previous batch post neglected to mention was the size of your writes can change these tradeoffs a lot, back in 2.0 I worked with Brian Hess of DataStax on figuring this out in detail (he did most of the work, I took most of the credit). And what we found with the hardware we had at the time was total THROUGHPUT in megabytes changed by an order of magnitude depending on the size of each write. So in the case of 10mb writes, total throughput plummeted to an embarrassing slow level nearly 10x.
    small writes hurt
    Using the same benchmarks tiny writes still managed to hurt a lot and 1k writes were 9x less throughput than 10–100k writes. This number will vary depending on A LOT of factors in your cluster but it tells you the type of things you should be looking at if you’re wanting to optimize throughput.

    Can your network handle 100 of your 100mb writes? Can your disk? Can your memory? Can your client? Pay attention to whatever gets pegged during load and that should drive the size of your writes and your strategy for using unlogged batches or not.


    I’m dovetailing here with the previous statement a bit and starting with something inflammatory to get your attention. But this is often the metric I see people use and it’s at best accidentally accurate.

    I’ve had some very smart people tell me “100 row unlogged batches that are token aware are the best”. Are those 100 rows all 1GB a piece? or are they all 1 byte a piece?

    Your blog was wrong! {#0988}

    I had many people smarter than I tell me my “post was wrong” because they were getting awesome performance by ignoring it. None of this is surprising. In some of those cases they were still missing the key component of what matters, in other cases rightly poking holes in the lack of nuance in my previous blog post on this subject (I never clarified it was intentionally simplistic).

    wrong - client death

    I intentionally left out code to limit threads in flight. Many people new to Cassandra are often new to distributed programming and consequentially multi-threaded programming. I was in this world myself for a long time. Your database, your web framework and all of your client code do everything they can to hide threads from you and often will give you nice fat thread safe objects to work with. Not everyone in computer science has the same background and experience, and the folks in traditional batch jobs programming which is common in insurance and finance where I started do not have the same experience as folks working with distributed real time systems.

    In retrospect I think this was a mistake and I ran into a large number of users that did copy and paste my example code from the previous blog post and ran into issues with clients exhausting resources. So here is an updated example using the DataStax Java driver that will allow you to adjust thread levels easily and tweak the amount of threads in flight:

    version 2.1–3.0:

    Using classic manual manipulation of futures {#3a17}

    This is with a fixed thread pool and callbacks

    Either approach works as well as the other. Your preference should fit your background.

    wrong - token aware batches
    Sure tokenaware unlogged batches that act like big writes, and really match up with my advice on unlogged batches and partition keys. If you can find a way to have all of your batch writes go to the correct node and use a routing key to do so then yes of course one can get some nice perf boosts. If you are lucky enough to make it work well, trust that you have my blessing.

    Anyway it is advanced code and I did not think it appropriate for a beginner level blog post. I will later dedicate an entire blog post just to this.

    Conclusion {#12a2}

    I hope you found the additional nuance useful and maybe this can help some of you push even faster once all of these tradeoffs are taken into account. But I hope you take away my primary lesson:

    Distributed databases often require you to apply distributed principles to get the best experience out of them. Doing things the way we’re used to may lead to surprising results.


    A couple of times a week I get a question where someone wants to know how to “failover” to a remote DC in the driver if the local Cassandra DC fails or even if there is only a couple of nodes in the local data center that are down.

    Frequently the customer is using LOCAL_QUORUM or LOCAL_ONE at our suggestion to pin all reads and writes to the local data center often times at their request to help with p99 latencies. But now they want the driver to “do the right thing” and start using replicas in another data center when there is a local outage.

    Here is how (but you DO NOT WANT TO DO THIS)

    However you may end up with more than you bargained for.


    Here is why you DO NOT WANT TO DO THIS

    Intended case is stupidly rare

    The common case I hear is “my local Cassandra DC has failed” Ok your app servers in the SAME DATACENTER are up but your Cassandra nodes are all down.

    I’ve got news for you, if all your local Cassandra nodes are down one of the following has happened:

    1. The entire datacenter is down including your app servers that are supposed to be handling this failover via the LoadBalancingPolicy
    2. There is something seriously wrong in your data model that is knocking all the nodes off line, if you send this traffic to another data center it will also go off line (assuming you have the bandwidth to get the queries through)
    3. You’ve created a SPOF in your deployment for Cassandra using SAN or some other infrastructure mistake..don’t do this

    So on the off hand that by dumb luck or just neglect you hit this scenario, then sure go for it, but it’s not going to be free (I cover this later on so keep reading).

    Common failure cases aren’t helped {#commonfailurecasesaren’thelped}

    A badly thought out and unproven data model is the most common reason outside of SAN I see widespread issues with a cluster or data center, failover at a datacenter level will not help you in either case here.

    Failure is often transient

    More often you may have a brief problem where a couple of nodes are having a problem. Say one expensive query happened right before, or you restarted a node, or you made a simple configuration change. Do you want your queries bleeding out to the other data center because of a very short term hiccup? Instead you could have had just retried the query one more time, the local nodes would have responded before the remote data center had time to receive the query.

    TLDR More often than not the correct behavior is just to retry in the local data center.

    Application SLAs are not considered

    Electrons only move so fast, and if you have 300 ms latency between your London and Texas data centers, while your application has a 100ms SLA for all queries, you’re still basically “down”.

    Available bandwidth is not considered

    Now lets say the default latency is fine, how fat is that pipe? Often on long links companies go cheap and stay in the sub 100mb range. This can barely keep up with replication traffic from Cassandra for most use cases let alone shifting all queries over. That 300ms latency will soon climb to 1 second or eventually just out and out failure as the pipe totally jams full. So you’re still down.

    Operational overhead of the “failed to” data center are not considered {#operationaloverheadofthe“failedto”datacenterarenotconsidered}

    I’ve mentioned this in passing a few times, but lets say you have fast enough pipes, can your secondary data center handle not only it’s local traffic but also the new ‘failover’ traffic coming in? If you’ve not allowed for that operational overhead you maybe taking down all data centers one at a time until there is none remaining.

    Intended consistency is not met

    Now lets say you have operational overhead, latency and bandwidth overhead to connect to that remote data center HUZZAH RIGHT!

    Say your app is using LOCAL_QUORUM. This implies something about your consistency needs AKA you need your model to be relatively consistent. Now if you recall above I’ve mentioned that most failures in practice are transient and not permanent.

    So what happens in the following scenario?

    • Node A and B are down restarting because a new admin got overzealous.
    • A write logging a user’s purchase is destined for A and B in DC1 instead go to their partners in DC2 and the write quietly completes successfully in the remote dc.
    • Node A and B and DC1 start responding but do not yet have the write from DC2 propagated back to them.
    • The user goes back to read his purchase history which goes to nodes A and B but no does not yet have it.

    Now all of this happened without an error and the application server had no idea it should retry. I’ve now defeated the point of LOCAL_QUORUM and I’m effectively using consistency level of TWO without realizing it

    What should you do?

    Same thing you were before. Use technology like Eureka, F5, whatever customer solution you have or want to use.

    I think there is some confusion in terminology here, Cassandra data centers are boundaries to lock in queries and by extension provide some consistency guarantee (some of you will be confused by this statement but it’s a blog post unto itself so save that for another time) and potentially to pin some administrative operations to a set of nodes. However, new users expect the Cassandra data center to be a driver level failover zone.

    If you need queries to go beyond a data center then use the classic QUORUM consistency level or something like SERIAL, it’s globally honest and provides consistency at a cluster level in exchange for latency hits in the process, but you’ll get the right answer more often than not that way (SERIAL especially as it takes into account race conditions)

    So lets step through the above scenarios:

    • Transient failure. Retry is probably faster.
    • SLAs are met or the query can just fail at least on the Cassandra side. If you swap at the load balancer level to a distant data center the customer may not have a great experience still, but dependent queries won’t stack up.
    • Inter DC bandwidth is largely unaffected.
    • Application consistency is maintained.

    One scenario that still isn’t handled is having adequate capacity in the downstream data centers. But it’s still something you have to think about either way. On the whole uptime and customer experience are way better with failing over using a dedicated load balancer than trying to do Multidc failover with the Cassandra client.

    So is there any case I can use allowRemoteHosts?

    Of course but you have to have the following conditions:

    1. Enough operational overhead to handle application servers from multiple data centers talking to a single Cassandra data center
    2. Enough inter DC bandwidth to support all those app servers talking across the pipe
    3. High enough SLAs or data centers close enough to hit SLA still,
    4. Weak consistency needs but a preference for more consistency than what ONE would give you
    5. Prefer local nodes to remote ones still even though you have enough load, bandwidth and low enough latency to handle all of these queries from any data center.

    Basically the use of this feature is EXTRAORDINARILY minimal and at best only gives you a minor benefit over just using a consistency level of TWO.

    Connection to Oracle From Spark

    For some silly reason there is a has been a fair amount of difficulty in reading and writing to Oracle from Spark when using DataFrames.

    SPARK-10648 — Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

    This issue manifests itself when you have numbers in your schema with no precision or scale and attempt to read. They added the following dialect here

    This fixes the issue for 1.4.2, 1.5.3 and 1.6.0 (and DataStax Enterprise 4.8.3). However recently there is another issue.

    SPARK-12941 — Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

    And this lead me to this SO issue. Unfortunately, one of the answers has a fix they claim only works for 1.5.x however, I had no issue porting it to 1.4.1. The solution in Java looked something like the following, which is just a Scala port of the SO answer above ( This is not under warranty and it may destroy your server, but this should allow you to write to Oracle.)

    In the future keep an eye out for more official support in SPARK-12941 and then you can ever forget the hacky workaround above.

    Reflection Scala-2.10 and Spark weird errors when saving to Cassandra

    This originally started with this SO question, and I’ll be honest I was flummoxed for a couple of days looking at this (in no small part because the code was doing a lot). But at some point I was able to isolate all issues down to dataFrame.saveToCassandra. Every 5 or so runs I’d get one or two errors:


    I could write this data to a file with no issue, but when writing to a Cassandra these errors would come flying out at me. With the help of Russ Spitzer (by help I mean explaining to my thick skull what was going on) I was pointed to the fact that reflection wasn’t thread safe in Scala 2.10.

    No miracle here objects have to be read via reflection down to the even the type checking (type checking being something writing via text that is avoided) and then objects have to be populated on flush.

    Ok so what are the fixes {#5bed}

  • Run with a version of Spark compiled against Scala 2.11
  • Use foreachPartition to write directly using the Cassandra java driver API and avoid reflection. You’ll end up giving up some performance tweaks with token aware batching if you go this route, however.
  • Accept it! Spark is fault tolerant, and in any of these cases the job is not failing and it’s so rare that Spark just quietly retries the task and there is no data loss and no pain.
  • Logging The Generated CQL from the Spark Cassandra Connector

    This has come up some in the last few days so I thought I’d share the available options and the tradeoffs.

    Option 1: Turn ON ALL THE TRACING! nodetool settraceprobability 1.0 {#354c}

    Probabilistic tracing is a handy feature for finding expensive queries in use cases where there is little control over who has access to the cluster IE most enterprises in my experience (ironic considering all the process). However, it’s too expensive to turn it up too high in production, but in development it’s a good way to give you an idea of what a query turns into. Read more about probalistic tracing here:

    and the command syntax:

    Option 2: Trace at the driver level {#42c9}

    Set TRACE logging level on the java-driver request handler on the spark nodes you’re curious about.

    Say I have a typical join query:

    On the spark nodes now configure the DataStax java driver RequestHandler.

    In my case using the tarball this is dse-4.8.4/resources/spark/conf/logback-spark-executor.xml. In that file I just added the following inside the element: </p>

    On the spark nodes in the executor logs you’ll now have. In my case /var/lib/spark/worker/worker-0/app-20160203094945–0003/0/stdout, app-20160203094945–0003 is the job name.

    You’ll note this is a dumb table scan that is only limited to the tokens that the node owns. You’ll note the tokens involved are not visible, I leave it to the reader to repeat this exercise with pushdown like 2i and partitions.

    Don’t use TextField for your unique key in Solr

    This seems immediately obvious when you think about it, but TextField is what you use for fuzzy searches in Solr, and why would a person want a fuzzy search on a unique value? While I can come up with some oddball use cases, making use of copy fields would seem to be the more valid approach and fitting with the typical use of Solr IE you filter on strings and query on text.

    However, people have done this a few times and they throw me for a loop and in the case of DataStax Enterprise Search (built on Solr) this creates an interesting split between the index and the data.

    Given a Cassandra schema of

    A Solr Schema of (important bits in bold):

    Initial records never get indexed

    I’m assuming this is because the aspect of indexing that checks to see if it’s been visited or not is thrown by the tokens:

    First fill up a table

    Then turn on indexing

    Add one more record

    Then query via Solr and…no ‘1235’ or ‘1234’

    But Cassandra knows all about them

    To recap we never indexed ‘1234’ and ‘1235’ for some reason ‘123’ indexes and later on when I add 9999 it indexes fine. Later testing showed that as soon as readded ‘1234’ is joined the search results, so this only appears to happen to records that were there before hand.

    Deletes can greedily remove LOTS

    I delete id ‘1234’

    But when I query Solr I find only:

    So where did ‘1234 4566’, ‘1235’, and ‘1230’ go? If I query Cassandra directly they’re safe and sound only now Solr has no idea about them.

    To recap, this is just nasty and the only fix I’ve found is either reindexing or just adding the records again.


    Just use a StrField type for your key and everything is happy. Special thanks to J.B. Langston (twitter of DataStax for finding the nooks and crannies and then letting me take credit by posting about it.

    Spark job that writes to Cassandra just hangs when one node goes down?

    So this was hyper obvious once I saw the executor logs and the database schema, but this had me befuddled at first and the change in behavior with one node should have made it obvious. {#23da}

    The code was simple, read a bunch of information, do some minor transformations and flush to Cassandra. This was nothing crazy. But during the users fault tolerance testing, the job would just seamingly hang indefinitely when a node was down.

    If one node takes down your app, do you have any replicas?

    That was it, in fact that’s always it, if something myseriously “just stops” usually you have a small cluster and no replicas (RF 1). Now one may ask why anyone would ever have 1 replica with Cassandra and while I concede it is a very fair question, this was the case here.

    Example if I have RF1 and three nodes, when I wrote a row it’s only going to go to 1 of those nodes. If it dies, then how would I retrieve the data? Wouldn’t it just fail? Ok yeah wait a minute why is the job not failing?

    It’s the defaults!

    This is a bit of misdirection, the other nodes were timing out (probably slow IO layer). If we read the connector docs we get query.retry.count which gives us 10 retries, and the default read.timeout_ms is 120000 (which confusingly is also the write timeout), so 1.2 million milliseconds or 20 minutes to fail a task that is timing out. If you retry that task 4 times (which is the Spark default) it could take you 80 minutes to fail the job, this is of course assuming all the writes timeout.

    The Fix {#98c8}

    Short term

  • Don’t let any nodes fall over
  • drop retry down to 10 seconds, this will at least let you fail fast
  • drop output.batch.size.bytes down. Default is 1mb, half until you stop having issues.
  • Long term

  • Use a proper RF of 3
  • I still think the default retry of 120 seconds is way too high. Try 30 seconds at least.
  • Get better IO usually local SSDs will get you where you need to go.

subscribe via RSS