Abstractions are hard to get right

May 312025

Abstractions are 1 of those things that are fairly easy to understand the basics of, but hard to really do right. They’re a tool used to simplify code by using a simpler-to-understand proxy (like a database record or a class/interface), but that doesn’t really scratch the surface of how to use them well vs. just tossing in them into code because you’re supposed use them. Done well, abstractions let you move forward while keeping just the right level of detail in your head. The problem is that almost everything we do in programming is just layering abstractions over real-world processes and behaviors. That makes it deceptively easy to have software that feels like it’s in the way more than it is actually helping anything.

Probably the most fundamental layer of abstraction we have in our code is data persistence. Whether that’s just an in-memory system like Redis, or a database, or just writing flat files to object storage like S3, However you’re storing data, you’re fundamentally turning facts and details about the real, physical world into bits being written to a disk somewhere. This degree of abstraction is probably the most fundamental 1 that we deal with. At some point, our code conforms to this abstraction just to persist the application state. At a more fundamental level, this abstraction is the application state, and eventually our code has to work around this.

If you choose to store your data in a relational database, your data tends to be branched out across multiple tables, and you’re probably going to need to do a lot of joining when the time comes to read that data back out and populate actual objects that your code will be using. For data that has -to-many relationships (e.g. a customer is associated with many orders), that works just fine. However, if your data involves a lot of -has-a relationships (e.g. a customer has a phone number), relational tables feel like a very awkward way to store data, although a document-based datastore (like mongo) feels like it works much better.

The next, and probably most pervasive abstraction in code tends to be the objects and types we create to represent data as part of our programs. This is the level of abstraction that we use the most often throughout our code. This is probably the closest level of abstraction to the “real world” that we deal with, so in theory it’s the 1 we should spend the most time thinking about, but usually we don’t. Instead, we figure out some schema for writing state to the database, and then just make types that are a 1-1 map to that schema. This feels easier – you can save and return your data directly, with no conversion between the layers. This is “clean,” and how we’re supposed to do it, right? Well…sure…if you’re writing a CRUD app with little to no actual computing.

If you are going to be doing more than CRUD, you realize pretty quickly that you’re going to want data objects that don’t necessarily line up with the data in your database, and that’s fine. It’ll change how you write the objects to the database, but some simple queries that update specific fields aren’t that hard to write (and it’ll help prevent accidental overwrites of other table fields once you start trying to do multiple things in short succession). Making this realization and transitioning to a setup where the abstractions for the real world in your code and the abstractions for the state of the world in your database don’t match exactly is ironically the point where things seem to get easier. The awkwardness of working around my data model is surprisingly limited to reading from and writing to the database, although there are some other exceptions (I wrote the code implementing staffing capacity logic at my day job – the code to translate the rules in the database to the actual schedule a manager would see was pretty gross, but still not so bad that I’d insist on copying your database schema exactly into the rest of your codebase).

What’s important to remember is that at the end of the day, what we’re trying to do with software is provide a simplified model of the real world alongside tools that allow users to safely manipulate it’s state in order to affect changes offline (for example, e-commerce systems have abstractions that represent the number of items on the warehouse shelf, along with tools for re-ordering more, removing some in response to a customer buying it, etc.). It’s all too easy to add abstractions to our code base for the sake of having abstractions to program to (OK, that may be because I’m a Java developer in my day job). That merely adds useless bloat to your code at best, generally it adds problems without actually solving any. I shouldn’t have to work harder to wrap my head around how things work in code than I do to wrap my head around how things work in the real world (remember, the point of abstracting is to simplify things, not introduce complexity for the sake of not repeating yourself).

When you use an object-oriented language like Java, it’s tempting to overuse abstractions. After all, you’re supposed to program to abstractions, which means you should be creating them from the get-go, right? No. Abstractions should simplify things and make the code easier to understand, if they don’t, then you should probably abandon them for something that does. Not everything needs to be (or implement) an interface. You don’t need multiple method calls just to get a property value (yes, I’ve seen people do this), just read it by name. If you’re only ever going to have 1 implementation of the interface, then it’s not an interface, it’s the Java equivalent of a C header file.

If you want an example of abstractions done well, look at the List interface in Java. There’s multiple ways to implement the basics of a list, but all most of us really need to do is add things, remove things, get things, check to see if something’s already there, iterate through the list, and some basic general object utilities (like equals()). Since 99% of what we’re doing assumes that the common denominator, and there’s multiple options that have that common denominator, an abstraction makes sense. But the key thing that makes this useful is the fact that there are multiple ways to implement a list that don’t matter to outside methods using it. Are you building very context-specific options that are going to be widely passed around to places that don’t care about those hyper-specific implementation details? Use an abstraction. If you’re not, then a POJO is just fine.

We spend a lot of time and effort trying to abstract real world state state and processes into our databases and code. Our lives will be a lot easier if we stopped trying to insist on a grand unified abstraction of everything and instead let our datastore abstract things 1 way, and our code abstract them another, and just write a translation between the 2. But the biggest offender is object-oriented code where we try to put in abstractions for everything without thinking about whether we really need it or not. At every step of the way, keep in mind that we create abstractions to simplify things by grouping things that are the same with a few minor implementation details into 1 simple reference that can be used everywhere. Like everything else you bring into your life, abstractions should be making things better. If not, then just drop them.

Sorry, the comment form is closed at this time.