Incidentally #2

Puneet Awasthi
4 min readSep 10, 2023

--

How outages happen and how to prevent them.

(This is the second installment of a series I have started. If you have been here before, welcome back! Each part is fairly independent but here is the link to part 1 if you are interested.)

I have worked in technology operations for years, witnessing costly mistakes that caused not only stress to the responding teams but also financial loss, regulatory compliance issues, and customer dissatisfaction leading to loss of revenue to the company.

As they say, an ounce of prevention is better than a pound of cure. In my experience whenever the root cause of any incident is determined, it’s rarely earth-shattering. Simple mistakes that can be easy to avoid at the outset, are the ones that cause the most damage when left unfixed. These can be coding errors, procedural errors or even missing control. These might seem like they can all be dealt with common sense until you’re face to face with them — trust me, I would know.

Example 2: Input Validation

A priest, a minister, and a rabbit walk in to donate blood. The rabbit says, I think I might be a type-O.

It is critical to have high-quality data to provide a valuable service to the customers. Data should be validated at the entry point to ensure it meets your expectations. Data validations can be done in many different ways. Use a regex to help you validate the format of the input, the most famous ones are the US phone number (^(\([0-9]{3}\) |[0-9]{3}-)[0-9]{3}-[0-9]{4}$) or zip code (^\d{5}(?:[-\s]\d{4})?$). Other forms of validation may include standard reference data such as country or state code. You can even have your business-specific reference data such as a valid customer ID.

It’s easy to visualize a screen with a set of fields and a function associated with each of the data values to confirm the inputs comply with your expected format. You can provide tooltip hints for the user. If the validation fails beep, beep, show an annoying dialog box, and get the user back right in. Repeat until they get it right!

Another, and more likely, use case in modern systems will be an API-based interaction such as the order service calling payment service. Payment service in turn may need information about the credit card to be used and call account service for that. Similarly, if a client requests information about their past orders, the underlying data may be coming from multiple services being called behind the scenes with relevant payloads.

So what can go wrong?

Each of these services needs to be able to validate the API payload, some of which they are just passing along from their upstream service. To provide valid results, you need to be able to confirm incoming data is all proper. Somewhere in this chain of calls, there is bound to be a human who typed something on the screen mentioned above, and the data quality controls on that screen were sufficient for the designers of that screen but not good enough for a downstream service. Other sources of incompatible data could be transformations performed by one of the upstream services, such as adding their internally generated transaction ID.

  • Field Formatting: The most common issue is with numbers and dates. When a number is passed around in a fixed-size string, do you expect leading zeros or not, or does it have spaces, and is that left justified or right? Similarly, with dates, there could be confusion about British vs. American format, and timezone interpretation, and did you expect just Date or a DateTime? When you see year value 50, do you hear 2050 or 1950? You must have heard of the Y2K problem and how some systems were coded to consider 9/9/99 as an invalid date.
  • Field Delimiter: One person’s data is another person’s metadata! It is important to avoid using a field delimiter (such as a pipe or semicolon) that can also be present in one of the field values. This will cause incorrect parsing of the data and halt any subsequent processing, as this corrupted data no longer satisfies the field-level validation requirements. Obviously, it's preferable to completely avoid text record parsing and use data formats such as JSON.
  • Questionable Characters: If the data has special characters they need to be escaped to keep their special meaning. If you miss doing that correctly every time the data is passed downstream, it could be misinterpreted and cause issues. And when you add the escape characters make sure it doesn't cause issues with the maximum allowed size for the field. Also, when calculating the field size, remember not all characters use the same amount of space.
  • Security Issues: CWE20 is dedicated to this topic and highlights the importance of carefully validating all user inputs. By preventing malicious users from freely entering attack strings, you can reduce your exposure to many injection attacks, including cross-site scripting (XSS), SQL injection, and code injection (RCE).

When designing a robust system, it's important to consider methods for filtering out unwanted incoming data to prevent system failure.

Have an incident-free week! Keep an eye out for the next topic!!

--

--