Recently, we had a Kinesis consumer back up due to a deployment problem. Sadly, this problem went on for 2 days without us noticing, so we were pretty backed up. Now in theory, we should have been able to catch back up to real-time data sometime later that calendar day. In reality, we fell 2 days behind and it took us 2 days to catch up. That’s not acceptable, and I wanted to document the list of things we tried and how well they worked and to document our process for looking for the bottleneck. Obviously, there’s tons of room for improvement in the code, but there’s also a lot of room for improvement in how we were doing things before, and probably a lot of room for in how we went about trying to fix the issue.
I transferred teams at work recently, and spent about a week trying to get their code running on my laptop so I can do useful development work. This is in addition to trying to wrap my head around the existing codebase and figuring out how to test my changes. It’s not that the code is bad, it’s just getting all the ****ing components hooked up, communicating with each other, and playing nicely together an exercise in impossibility. Coming from a group that ran everything in AWS, going back to managing all the third-party services in a development environment makes me want to flip my desk over and start screaming about what the **** is wrong with everyone.
You’re sitting there at work and word reaches you that your application is misbehaving. What’s the first thing you do? Pull up the logs and see what they say. And this is where you may wish that you had an entire class in school dedicated to the art of writing useful log messages. Good logging is an art, and it’s 1 that most of us don’t learn until we’re stuck trying to diagnose something in the wild with subpar messages to guide us.
I mentioned in a previous blog post that we had used DynamoDB on an internal project at Bronto (my employer, but I don’t speak for them – they have people for that). That project is the Bronto Cantina – which launched to the whole Durham office a couple of weeks ago. It was a pretty neat little application, most of which was written over the course of a couple of days, followed by bits and pieces of cleanup afterwards to get a test setup up and running. Now we’re live, and I wanted to say a few words about it.
Amazon’s DynamoDB service is a managed NoSQL database that promises great speeds that allow it to be “…a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.” That claim is pretty much just a pipe dream. The reality is that DynamoDB is a terrible fit for most applications, and your best bet would be to prefer regular NoSQL databases and manage the machines yourself.
I recently finished up some data aggregation work involving Apache’s Hive, and as a means of getting some MapReduce work off the ground quickly, it’s pretty good. Hive’s goal is to abstract away MapReduce behind basic SQL queries, and on that front it succeeds. The fact that I’m ultimately doing MapReduce jobs is hidden except for what would look like a minor quirk if I didn’t know that was what was going on under the covers. That said, there were a couple of things I noticed both during development and with running the jobs on Amazon’s EMR service that are worth noting.
It’s a tale as old as development – you make an application, and now you need to sell it. That means you need to have a demo, and demos require data in there. The dilemma is, what do you do to get that data? Do you have a demo app in a sandboxed environment, do you just add it to your regular production database, do you just take some screenshots of what the app looks like with data from a development environment, or do you do something else entirely? It seems like a stupid thing to worry about, until you’re actually trying to figure it out, then it becomes really important because whatever you decide to do about sample data, you’re going to have to live with forever.
1 of the last things I worked on in my previous job was converting a cron-generated set of emails about server SLA stats into a web service. Early on into this process, I looked at the ELK stack, discussed it with my team lead, and the two of us ultimately came to the conclusion that going that route would be overkill, and that instead we should just write our own utility. That, boys and girls, was a stupid, stupid mistake. Instead of using an existing, free, open-source, project, we decided to re-invent a wheel that other people had already invented, debugged, and was working well for a lot of projects, including bigger ones than what we were intending to do. Continue reading »
My last post about mocking a Netty server using Mockito worked for Netty 3.x, but the changes made in Netty 4.0 broke a lot of that work. After spending some quality time reading up on the changes from 3 to 4 and debugging my testing code, I got my mocked Netty server working with Netty 4.0, and now I’m posting it here in the hopes it helps anyone else who’s looking to mock a current Netty server for their unit tests. Continue reading »
A while back at work, we noticed a periodic issue where remote jobs just weren’t being run. That was being caused by the fact that the jobs weren’t actually being sent to the remote AWS instances that were meant to be running said jobs. Just why the jobs weren’t being sent out was a mystery, but a simple bouncing of either the remote servers or the main application itself seemed to get things moving again. In the meantime, we were off on a hunt for just why these jobs were no longer getting dispatched.