10 Mistakes to Avoid as a Software Engineer

Published in

CodeX

14 min readApr 1, 2021

We all have to start somewhere. For me, that was June 2019 when I fired up a python crash course at Datacamp. After graduating with a B.S. degree in applied statistics, I was bright, starry eye, and full of pent up energy from years spent in the classroom. I knew nothing about “best practice.” I only wanted to stop learning and start doing.

But I quickly found that you never stop learning in real life. Each day holds a lesson to take notes on, and each mistake can teach us more than any professor. Starting out, I knew more about Poisson distributions than software engineering.

I’ve grown a lot since then, but I’ve had to learn many lessons the hard way. While some of these lessons can seem obvious to those who have been programming for a while, “hindsight is 20/20” as they say. This article is dedicated to those young bucks who were in the same place where I was a year ago. I hope these 10 mistakes I’ve made can help you in your journey towards greatness.

Mistake 1: Thinking Too Small

It’s pretty easy to make the assumption that if something is working on your local machine, it will work well in production.When I built my first microservices, I didn’t think about how well they would scale and perform.

Using a python dictionary and regex expressions to power a primitive search engine works well if you are searching over one document. Trying to use the same tools to search over thousands of documents is the definition of futility. Something I wish I did earlier in my projects was to stop and ask myself “will this scale?” Most of the time, the answer was a solid no.

Finding solutions and architectures that work well at scale is a foundational part of software development. The first thing I do when evaluating a potential answer to a problem is to think about how performant it would be in the bigger picture. With time and practice, I was able to anticipate problems that arise due to scale and complexity. Here are a few examples:

The choice of database can make or break a product. If you choose a weak option, you’ll be knee deep in tech debt until you pay off your student loans (if that ever happens).
When dealing with volume, threading & concurrent tasks can cause unexpected behaviour. I’ve embarked on many Scooby-Doo debug sessions, which lead to an unmasking of the culprit at the end. The usual suspects? Race conditions, out of memory errors, and throttling.
Having a design or a naming convention that is too brittle can make later changes painful. Refactoring happens more often than you think, especially if you work at a startup. Building microservices with the perspective that each service is a replaceable Lego block makes changing them a thousand times easier. My first services worked more like intertwined threads in a quilt than anything that resembled a modular design.
Having a system that requires manual changes and updates to work is not sustainable. Automating theses types of tasks frees up your time and helps avoid user error later on. We used to have a product that required me going in and adding some regex to a list each time a new client was added. Not scalable!

Mistake 2: Thinking Too Large

On the other side of the thinking-too-small coin is thinking too large. Trying to imagine every edge-case in a problem set can quickly become overwhelming and often paralyzing.

After learning not to think about things in isolation, I overcompensated by trying to think about all the interconnections at once. Once I hit my comprehension threshold, my brain froze up and became overwhelmed because I didn’t know where to start.

https://www.youtube.com/watch?v=y8OnoxKotPQ — Me trying to think about our microservice architecture

With the help of my coworkers, I learned the value of breaking large, complex problems into bite-sized ones. Whenever I was stuck, they helped me ask “How can this problem be broken down?” Once I was able to break the mammoth task down into a couple of comprehensible ones, I became unstuck and much more productive.

One of the most valuable tools that has helped me here is flowcharting. I’m a visual thinker, so having a picture to go along with an abstract concept increases my understanding of the problem significantly.

I’ll share an example diagram I made to model a data pipeline I helped build. Whenever I discuss this pipeline with other developers on my team, I always pull this out because it helps me explain it. A picture is truly worth 1,000 words:

Sorry for those who don’t use dark mode…

I’m not going to go into detail and explain the whole pipeline, but you can imagine how useful this has been. I can come back to this diagram for a quick refresher of how the pipeline works and the basic idea of what each service does and how it fits into the bigger picture. Imagine trying to keep track of all this in your head! Plus, if I get run over by a bus, someone can figure out how the autodocs pipeline works…

It was actually cheaper to put in a screenshot than chisel in a QR code…

Mistake 3: Re-enacting the Tower of Babel

For those not familiar, the Tower of Babel is synonymous with miscommunication and the barriers caused by different languages. The struggles that apply to human languages also apply to coding languages as well. Using different coding languages might seem more productive in the short term, but in the long term, can prove costly to productivity.

Have microservices gone too far? — Photo Credit: Tim Anderson

Each person has their favorite coding language. For some (like me), its Python. For others it’s Ruby. When asked to complete a task, it’s tempting to pave the way in a language you are comfortable and competent with. But if that language is different than the other pieces in your puzzle, this can actually become a pain point in the development process.

Most of our microservices and APIs were built with Python, but we had a few made in Ruby. The services made in Ruby quickly became black boxes to the other devs (who were junior like me and only knew Python), and the only person who could fix the service if it went down was the dev who built it.

After this problem kept happening, we standardized the languages we used in our infrastructure:

Automation services use Java
Microservices, APIs, and cloud functions use Python
Front-End application uses NodeJs

This had the added effects of unifying our respective code bases, and improving the hiring process for potential devs. Those with Python and NodeJs on their resume were more likely to be successful.

Yes, this has forced some people to adapt and inwardly cringe each time they have to type self.my_python_variable, but those who don’t adapt don’t last long in this industry.

I acknowledge there are exceptions to this rule because sometimes a certain task can only be accomplished in a certain language. For example, if you are trying to decrease the processing time it takes to process a large csv file, C++ is going to be more performant than Python. But in general, the more consistent your code base, the better off you’ll be.

Mistake 4: Not Documenting My Code

Looking at code I’ve written six months ago usually becomes an exercise of deciphering ancient Egyptian hieroglyphs. Things that seemed so clear in the moment can quickly become obscure in the fog of time and imperfect memory.

# They should have really left some comments… Photo Credit: Alex Donvour

Leaving code comments and spending time to document the code I’ve worked on are important parts of my workflow, but they didn’t used to be. Early on, when people had questions about a service I worked on, I answered their questions as they came, and most of the knowledge I had about the service stayed at the tribal-knowledge level.

As time went on, I built more and more services, and I started having to answer more and more questions. I realized that documenting the things I’ve worked on would answer a lot of the questions I was getting, and would also help me remember what the service actually did.

Spending the time to document your code can feel tedious and wasteful, especially if you are pressured to move fast. But I’ve found that documentation actually saves time in the long run. It saves me time that would normally be spent answering simple questions, and I can get back up to speed with an old project 10 times faster if I’ve documented it.

I always appreciate team members who document their code (leaving good comments is a part of it). And your team will appreciate you too, trust me.

Having a good platform to host your documentation is key too. Look for a platform that allows you to perform full-text search over documents. Trust me, it will come in handy when you can’t remember exact variables. My team uses Atlassian, which I would endorse. Its slightly clunky at times, but looks great and is much easier to manipulate than a README.md file.

Mistake 5: Not Using Pub/Sub

At its core, pub/sub is a messaging system that can publish messages to subscribers. Publishers push messages to a queue (called a topic), which are in-turn ingested by subscribers, who are hooked up to the queue.

This design is very flexible, scalable, and powerful, especially in data pipelines. However, I didn’t even know it existed until a coworker acquainted me with it, and I’ve been in love ever since. If you want to build scalable solutions, pub/sub should be a part of it.

There are different implementations of pub/sub, and its a pick-your-flavor scenario at an ice cream parlor: you really can’t go wrong. Google has Cloud Pub/Sub, Amazon has Pub/Sub Messaging. And there’s a really powerful one called Apache Kafka (but I haven’t had much experience with it yet).

The best resource I’ve found for learning about pub/sub is the following video produced by Hussein Nasser. He gives some great examples of why pub/sub is awesome and how it can be better than the traditional request/response architecture.

Mistake 6: Not Using an Auto-Formatter

Even if you do speak the same language as someone else, there can be differences in dialects and accents. I always love a good Australian or British accent. Sure, it’s fun to listen to, but hard to understand sometimes:

Crumpets and tea govenah? Croickey! Oi think oi I will.

Dialects form in coding languages when people use different formatting and indentations. Consider the following Python code produced by two different devs. They both do the same thing, but look very different.

# Dev A (multiline method chaining, 4 space indentation)
my_documents = db.collection("users").document("tim_estes").\
    collection("documents").where("createdAt", "<", yesterday).get()for doc in my_documents:
    doc.reference.update({
        "first_name": "tim",
        "last_name": "estes",
        "edited_at": now,
        "edited_by": "admin"
    })

_____________________________________

# Dev B (no dict indenting, 2 space indentation, but better method chaining)
my_documents = (
  db.collection("users")
  .document("tim_estes")
  .collection("documents")
  .where("createdAt", "<", yesterday)
  .get()
)for doc in my_documents:
  doc.reference.update({"first_name": "tim", "last_name": "estes", "edited_at": now, "edited_by": "admin"})

Usually, these small differences are not a problem. But once you start working on a team with people who all have different formatting preferences, merge conflicts arise and your code base starts to look like my grandma’s old family scrapbook. There’s no consistency and it just looks ugly (sorry grandma!)

I used to spend a lot of time trying to format my code, until a teammate suggested we start using Black. Black is a no-frills, opinionated code formatter that lets me focus on the part of coding that matters instead of the parts that don’t.

Here’s an example of Black in action. After doing a pip install of Black, simply run black my_python_file.py and it will do all the formatting for you! Here’s that same python script from earlier but formatted into Black’s PEP-8 style:

# ah, much better!
my_documents = (
    db.collection("users")
    .document("tim_estes")
    .collection("documents")
    .where("createdAt", "<", yesterday)
    .get()
)for doc in my_documents:
    doc.reference.update(
        {
            "first_name": "tim",
            "last_name": "estes",
            "edited_at": now,
            "edited_by": "admin",
        }
    )

For those who want to give Black a spin right now, check out the interactive playground. And if you are not sold yet, you should watch a presentation by the guy who made Black.

As part of the merge request process, devs are highly encouraged to run their code through Black. The result being a professional and consistent code-base free of dialects and silly merge conflicts.

While I wouldn’t recommend using a machine that would strip out all the accents in human voices, I would recommend Black or equivalent packages that do the same thing for code languages (Prettier for Javascript, Astyle for C++).

Mistake 7: Not Reusing Code

Imagine the folly of a worker who went out and bought a new shovel each time they needed to dig a hole! We’d call them crazy for not reusing the tools they already had.

Well, I used to be that guy. I had a directory of functions that were super useful in the projects I worked on, and the first thing I did when starting a new one was to take that directory and copy it over to the new code repository.

Not only did I have a bunch of duplicated code, I also needed to update every single repo if one of those functions needed a patch. Eventually, a coworker realized my error and suggested I put these functions into a Python package that could be used internally and exported to the projects that need them.

Those functions became much easier to manage, and more accessible to others who wanted to use the functions in their own project. This marked the beginning of me becoming a force to be reckoned with on our dev team, and the package is small but also something I’m really proud of.

If you are worried about having your code be open-source, there are ways you can host private packages. Gitlab has an option where you can upload packages to a code repository and use private tokens to access said package. Steps about how to use this can be found here.

Mistake 8: Not Abstracting Database Requests

When an application communicates with a database, you have a couple options:

Communicate with the database directly
Communicate with the database through an abstraction layer

Abstraction layers might seem unnecessary at first, but I’ve learned not using them can be a big mistake. I’ll briefly list out the benefits of employing one and the consequences of not.

Benefits:

An abstraction layer is more secure because you can have greater control over what requests can access the database.
Abstraction layers let you throttle requests in times of heavy traffic.
You can simplify the queries made by the app by hiding the uglier query semantics in the abstraction layer.
The code for creating, updating, and deleting data from the database is centralized instead of spread-out in the app.
You could put a cache into the abstraction layer and increase the read speed for frequently accessed data while reducing money spent on database read operations.

Consequences:

If you need to migrate to a different database, an app refactor will be much more expensive than an abstraction layer refactor. If you used an abstraction layer, you could have just switched the layer service out while preserving the app code. My team has experienced this first-hand, and now our app is essentially married to a database and divorce is not a feasible option. Don’t let this happen to you!

Mistake 9: Ignoring Idempotency

The scholar pigeon knows all. Photo Credit: Thesaurus.plus

Idempotency is a fun word. As defined by Stack Overflow:

“In computing, an idempotent operation is one that has no additional effect if it is called more than once with the same input parameters. For example, removing an item from a set can be considered an idempotent operation on the set.”

For the more mathematically inclined out there, here’s an example of an idempotent function (which some of you might recognize as the Identity Function):

Let f(x) = x

A function is said to be idempotent if f(f(x)) = f(x). Putting the input into our function always yields the same output, hence f(x) = x is an idempotent function.

Bringing it back to the real world, idempotency is important when dealing with network protocols. Many services have an “at-least-once” delivery policy, meaning requests are guaranteed to happen at least once, but might also happen more than once. This is to hedge in case of timeouts or failures that lead to message loss. The requests are retransmitted to assure they are delivered at least once.

If you are working with an “at-least-once” delivery policy, its important that the operations involved are idempotent. Consider the following scenario:

A user places an order on a website. The order request goes out with an at-least-once delivery policy.

If the backend is not set up correctly, people could end up getting charged for their order twice! This is because the order request could be delivered multiple times, and if you treat each request as a new order, you’ll duplicate the order each time the at-least-once policy kicks in.

Here’s a tip to help you make your services idempotent:

Create a hash out of the incoming request and use it as the transaction id. Instead of creating a new transaction, it will harmlessly overwrite the already existing one.

For those who are hungry for more, check out this video about Why Idempotency is Important.

Mistake 10: Not-CRUDing Completely

C.R.U.D. is an acronym that stands for Create, Read, Update, and Delete. When dealing with a database, it is essential that you have the ability to perform these four operations.

In my early projects, I was super-focused on Creating and Reading data from the database I was working on, and there was a lot of pressure to get the project into production. Things went great … until a user wanted to Update and Delete documents from the database! Uh oh! I should have thought of that!

I acknowledge that there are scenarios where updating and destroying records will not be important and it doesn’t make sense to add the “UD” in CRUD (think audit-trail data and other types of read-only data). But in the other scenarios, it’s easy to neglect the Update and Delete operations.

Without great delete operations, databases can become cluttered with data that shouldn’t be there. When dealing with huge data-lakes, this data can become very hard to find and adds to your overhead.

A great strategy is utilizing Object Lifecycles to help you manage your files. In Google Cloud Storage and AWS, you can let files expire if they reach a certain age or meet a given condition. Or if dealing with an index, you can set an index lifecycle policy that will purge old documents in the index. I’ve had experience doing this in Elasticsearch, which is the topic of a different Medium article…

Don’t neglect your CRUD operations kids!

Conclusion

Don’t be afraid of mistakes and jumping into things head first. It’s a part of being a beautiful human being and growing. I’ve learned a great deal from all the mistakes I’ve made in my life (not just the software related ones), but I also don’t want to minimize their consequences.

Instead of getting mad at myself when I mess up, I try to take a step back and listen to the wisdom that echos in the footsteps of every mistake. Because I think the greatest one we can commit is making the same mistake twice.