Challenges in naming things in Software and Data Engineering

Chris Cornell the Dataist
7 min readMar 20, 2023

--

Phil Karlton, a computer programmer known for his work at Netscape and Adobe, once said:

There are only two hard things in Computer Science: cache invalidation and naming things.

This quote highlights the difficulty in choosing clear, concise, and meaningful names for variables, functions, classes, and other elements in software engineering and data engineering, which can often be a challenging and time-consuming task. The importance of good naming convention cannot be overstated, as it can make the code easier to read, understand, and maintain for both the original developer and other people who may need to work with the code in the future.

Main challenges in naming things are related to ambiguity, length of names, their context, consistency, and working in multilingual environments. Let’s see what it means.

Challenge #1: Ambiguity

One of the main challenges of naming things in software and data engineering is avoiding ambiguity. Names should be clear and unambiguous, and not cause confusion or misinterpretation. For instance, using generic names such as temp or result can cause confusion when there are multiple variables with the same name. What are other cases to avoid?

  • Foo and Bar — These are commonly used placeholder names in programming, but they can be ambiguous and confusing. For example, if a developer writes code with a variable named foo, but forgets what it represents, they may have a difficult time debugging the code later on.
  • Thing and Stuff — These words are often used when the purpose of a variable or function is unclear. For instance, if a developer writes a function named doStuff, it may not be clear what the function actually does, making it harder to maintain the code.
  • Untitled — Sometimes developers may leave a name as untitled or unnamed, which can cause confusion later on. For instance, if a file is saved as untitled.txt, it may be difficult to remember what it contains without opening it.
  • i and j — These letters are often used as loop counters in programming, but they can be ambiguous. For instance, if a developer writes a loop with variables named i and j, it may be unclear what they represent or how they relate to the rest of the code.
  • Magic numbers — A “magic number” is a numeric value that is used in code without explanation. For instance, if a developer writes a formula using hard-coded value of 365, it may be unclear why that number was chosen and if it will be valid in all cases.
  • Generic words — Imagine you are a software developer and need to add a new feature to an accounting software that would allow users to categorize their expenses. You create a new variable called type to represent the expense category, such as “office supplies”, “travel expenses”, or “utilities”. Seems legit, right? But what if type was already being used in another part of the code to represent the type of transaction, such as “credit” or “debit”. This could lead to unexpected and maybe even dramatic results.
  • Ambiguous abbreviations — Using abbreviations that are not well-known or have multiple meanings can lead to confusion and errors. For example, if a developer used the abbreviation dr to represent both “debit record” and “disaster recovery,” it could be difficult to tell which meaning was being referred to in the code.
  • Too similar names — Using variable names that are too similar can also lead to errors and confusion. For example, if a developer used the variable names customer and customers to represent different data sets, it could be easy to accidentally use the wrong variable name and cause errors in the code.

Challenge #2: Length of names

Another challenge of naming things is finding a balance between choosing a name that accurately describes the resource and keeping the name short and concise. Long names can be difficult to remember and can make code more difficult to read and write, while short names can be too vague or ambiguous.

One common solution to this challenge is to use abbreviations or acronyms to represent longer names. However, this can also lead to confusion if the abbreviation is not well-known or if multiple abbreviations are used to represent different resources.

Another solution is to use descriptive phrases instead of single words to represent resources. For example, instead of naming a variable x, a developer might choose to name it currentTotalRevenue to provide more context and make the code easier to understand.

But using excessively long variable names can also make code harder to read and write. For example, if a developer used the variable name varCurrentTotalNetCompanyRevenueValueInEuro, it would take longer to type out and could potentially cause errors if the name is mistyped. So, the descriptive phrases can also become too long and cumbersome.

Finding the right balance between length and accuracy requires careful consideration and communication among the development team. Clear documentation and consistent naming conventions can also help ensure that code is both concise and easy to understand.

Challenge #3: Context

The same name can mean different things in different contexts, leading to confusion and errors. For example, a variable named current might refer to the current user, the current date and time, or the current item in a list, depending on the context in which it is used.

Likewise, in data engineering area, naming conventions for columns in a database table are critical for ensuring that data is properly organized and easy to analyze. However, the same column name can mean different things depending on the context in which it is used. For example, a column named date might refer to the date a transaction occurred, the date a record was created, or the date a customer signed up for a service.

Choosing names that accurately reflect the context in which a resource is being used requires careful consideration and communication among the development team. By using prefixes or suffixes, domain-specific terminology, clear documentation, and following agreed naming convention, developers can create code that is both effective and easy to understand.

Challenge #4: Consistency

Consistency is an important consideration when it comes to naming things. Inconsistent naming conventions can make it difficult to understand and maintain code, and can even lead to errors or unexpected behavior. However, maintaining consistency can be challenging, especially when working on large projects or with teams of developers who may have different ideas about naming conventions.

Consider a project in which a developers may use words like “status” and “state” interchangeably, even though these words have different meanings. For example, a system might have a status field that indicates whether a user is active or inactive, and a state field that indicates the user’s current location or activity. If developers are not consistent in their use of these terms, it can lead to confusion and errors. For example, a developer who is working on a new feature might assume that the status field refers to the user’s location, rather than their active/inactive status, and inadvertently introduce a bug into the system.

Or suppose a data engineering team that is responsible for designing and maintaining a large database that stores information about customers and their orders. The database includes several tables, including a table for customer information and a table for order information. One of the columns in the customer table is called customer_id, which is a unique identifier for each customer. However, in the order table, the same information is stored in a column called cust_id. Because the naming conventions are inconsistent, it can be difficult for developers to write queries that join the two tables together or extract information about a particular customer’s orders.

Most common solution to the challenge of consistency is to adopt a standard naming convention and enforce it rigorously throughout the project. This can help to ensure that all developers are using the same names for the same resources, making it easier to understand and maintain code.

It is also important to be flexible and willing to adapt naming conventions as needed. As a project evolves, new resources may be added that require new naming conventions, or existing conventions may need to be adjusted to reflect changing requirements. By staying open to feedback and willing to adapt, developers can help to ensure that naming conventions remain consistent and effective throughout the life of the project.

Challenge #5: Multilingual environments

When working in multilingual environments, naming conventions can become especially tricky.

In some cases, developers may attempt to translate variable names or function names into different languages. However, if they are not fluent in the language, they may end up choosing words that have unintended meanings or connotations in the target language. For example, a developer might choose to name a variable “die” in German, without realizing that the word also means “they” in English.

Likewise, the same technical term may have different translations in different languages, or there may not be a direct translation at all. For example, the word “cache” in English may be translated as “puffer” in German, which literally means “buffer”. Or the word “gift”, which in English means “present”, but in German, it means “poison”. If a developer were to name a variable “gift” in a multilingual environment without realizing the word’s different meanings, it could lead to unintended consequences and potentially disastrous results.

Summary

Naming things, whether it be software engineering or data engineering, is a challenging task that requires careful consideration of various factors. In this article, I explored the challenges associated with naming conventions such as ambiguity, length, context, consistency, and working in multilingual environments. I also provided examples of how improper naming can lead to problematic situations.

Stay tuned for the upcoming series of articles on naming conventions in software and data engineering! In the next articles, I will delve deeper into the various aspects of naming conventions and provide tips and best practices for selecting clear and consistent names. Whether you are a software engineer or data professional, these articles will help you improve the maintainability and understanding of your code. Don’t miss out!

--

--

Chris Cornell the Dataist
0 Followers

I professionally deal with Software and Data Engineering.