“Todo” or Not “Todo” — It’s All About Time

Cognizant
Cognizant Softvision Insights
9 min readFeb 14, 2023

By Ionut Chichisan, .Net Enterprise Software Engineer, Cognizant Softvision

CLR, the .Net Runtime. Or ‘time to run,’ quickly if possible.

In the modern era, people want to have quick and responsive software. They want to spend the least amount of time in front of a computer and do their jobs in the most autonomous way possible. Usually, 10 seconds exceeds the limit users expect to wait for an operation. At the same time, application misbehaviors must be quickly solved by support teams, ensuring quick recovery when incidents arise.

In a developer’s world, there are general principles that should be followed as much as possible– guidelines that can increase application performance, reduce the risk of incidents/errors, and allow quick resolutions in case of failure. The following is a checklist of principles and recommendations for developers.

Assume not null

This is one of the most common exceptions/assumptions of all time, resulting in “Object reference not set for an instance of an object” or in the .Net world, NullReferenceException.

An object’s lifetime is dictated by multiple systems in software: the flow interaction (code), the garbage collector, the platform invoke, etc.

“throw ex”

This one results in no stack trace > no log specifics. Later debugging scenarios and incidents will be harder to accomplish.

“Swallow” exceptions

A try/catch without any error handling/logging on the catch block will lose the ability to identify problems later. There would be no logs, no error propagation, nothing to lead to the actual error identification. Too many logs on the other hand will result in spam. Every flow should output just the right amount of information, in order to quickly identify any error along with the most important parameter values.

Thread exceptions

You cannot catch exceptions outside the thread it was running on. Instead, you can use try/catch in threads or the ContinueWith method and check for errors. The ContinueWith method comes in multiple styles regarding its TaskContinuationOptions flags parameter. One of the parameter values specifies ‘OnlyOnFaulted’, which ensures the continuation is only being executed if the parent thread terminates in an unhandled exception.

Loops

Loops are the simplest way of losing execution performance of the code, especially if the input enumeration length cannot be managed. What if the enumeration has 1 million records?

The solution would be to use multithreading with chunks, by splitting the list into smaller chunks that can run in parallel or by making use of maximum degree of parallelism.

There is a catch. Database operations usually are not thread-safe. Entity Framework and other ORMs do not allow the use of their database context in a multithreaded manner. To bypass this, guide the execution flow to collect all the data in a multithread manner, then commit/save the data once, after all threads have completed.

Another thing about the maximum degree of parallelism is that for optimal performance, the number of threads must be correlated with the number of CPU cores on the server.

In case of really large enumerations that result in losing precious time. regardless of whether or not you’re using multithreading, publisher/subscriber systems can be considered in order to be able to use multiple servers processing power. Also consider making the loop asynchronous to the user, meaning that a user launches the operation and is being notified later of the status, without directly waiting for the process to complete.

Local datetime

Never assume the application is going to be used locally, especially client/server ones.

Use UTC datetime centrally (server side, SQL Server, etc.) — beware the caveats of UTC as well — and convert it to local datetime when displayed to the user (client side). Then it would be easier for debugging and new SQL scripts to accommodate any date/time in your application.

Because datetime values could be in many places in an application, the code must be consistent about the time zone the dates are using.

When dealing with web APIs, the API response must output the UTC date (and then converted to the right timezone in JavaScript), while for server-side rendering scenarios (cshtml), the dates must be converted in JavaScript (client side) to the right time zone, after the rendering result is sent to the browser.

The best way would be to have specific user settings to be able specify the local time zone, with automatic selection based on browser specific APIs. This way the user may work on the default timezone (the browser detected one) or a specific one, if the user is traveling and is being accustomed to a specific timezone (other than the one reported by the browser).

With the user-specific time zone known, things might be easier in case of working with text/csv files where dates are not automatically converted upon opening. During generation of the file content, the exact time zone may be used.

Local number formats

Like local date formats, local number formats may be different from user to user.

A dot instead of a comma might ruin the data integrity for a whole year’s budget for the client company, with unforeseen consequences.

Keep server side numbers in an invariant culture (with dot as number separator), while on the client side convert everything to local number format. Ensure the process is reversed while pushing the data back to the server.

Just like time zones, local number formats should be persisted in storage on user settings, in order to be able to generate text/csv files containing numbers.

Note: Some countries prefer to install their servers in a different locale that the default one, so be aware that server-side code could be in a specific locale as well.

Blocking code/threads

Never use Task.Run()/Task.Factory.StartNew() with a Wait() with no timeout.

A CancellationTokenSource can also be used with a timeout defined. That way its CancellationToken will automatically be set to canceled after the time has run out. You can also pass a cancellation token to the Wait() method.

Notify users too late or not at all

According to this article about time perception in human-computer interaction, we could approximate how to interact with the user from a web page or form. Here are some results:

0–2 Seconds = No Indication Needed (generally instantaneous to the user)

2–5 Seconds = Busy Animation, Spinner or Spinning Animation

5–10 Seconds = Progress Indication Needed, Progress Bar or Visual Indication of Start & Finish (Note: A textual indication of the time remaining is generally recommended in these scenarios.)

10+ Seconds = Progress Indication & Cancel Button, User Should Have Ability to Cancel the Process (Note: A textual indication of the time remaining is strongly recommended in these scenarios.)

Parallel.ForEach loops with async/await

The parallel loop doesn’t wait for an async/await code completion; it creates orphaned threads instead. There are also a lot of pitfalls described in this article.

I/O

Assume source availability

Whenever you read from storage, network or any other input devices, it might not always be available. Things like checking for folder existence or bytes received from the network should always be part of initial checking mechanisms before reading actual bytes.

Path/Url length

Never assume a path will fit an operating system max allowed path length, especially if the path is dynamic. Windows usually supports paths up to 260 characters, while Linux up to 4096.

Also avoid using special characters in paths.

Read past the end of a stream

Always check the stream length when peeking portions of a stream.

Always reset the stream back to 0 when done interacting with it.

In case the development is being done in the web environment, and if the request content needs to be gathered, remember to reset the stream back to 0 after reading it. That’s because subsequent readings might expect the stream position to be set at the beginning.

Read stream contents into memory

Never read an entire stream into memory. The most edifying example is trying to copy a large file to other destinations. An out of memory error will be thrown as soon as the memory is not able to handle such a large file.

A solution is to buffer only parts of the source data and send it to the destination.

The tricky part is to implement it in a web environment, for example uploading large files. There are countless client-side libraries that can upload a file using chunking, but the idea is not to store the whole file in memory (server side), but to redirect those file parts (bytes) to the destination storage.

Different speeds, chained read/writes

When you have some raw processing of stream bytes gathered from an arbitrary source, plan to use buffers to accommodate the difference in the reading/processing speeds.

Ignore logs

There is a fine line between an informative log and a spamming one. Too few messages become hard to follow, while too many messages could easily become spam.

Group messages by a common identifier, “Correlation Id,” that should indicate messages coming from the same processing flow, for a specific user.

Give log messages enough information like date/time occurred, process id, user id, thread id, exception details (message and stack trace) to ensure less problematic fixing of issues later on. Include any major parameters that could save you from losing hours of code digging.

Make clear differences about log levels in order to control how many messages should be in the logs. Log more frequently occurring messages as debug level and enable log debug levels only when strictly necessary.

Make threads cancelable

Threads could pile up very quickly and if left unchecked, could decrease performance considerably.

Define timeouts whenever possible and cancel any operation that is lasting more than expected.

.Net has also a possibility to combine timeouts (cancellation tokens) and continue with a new token that will timeout as soon as one of the combined tokens times out.

Beware of timeout ordering: a timeout for a database operation cannot be set longer than the web request timeout the database call is being used in, otherwise the database timeout won’t be effective.

Have a centralized location to define timeouts for every aspect of your application.

Retry specific operations

Some operations (like database read/writes, API requests) have transient errors due to unavoidable factors (ex: network failures). Always plan to retry these operations, before marking the operation as failed.

The default retry strategies could be linear or using exponential backoff. The idea is to retry at the time the operation will most probable succeed.

You can also customize the retry interval by providing specific checks, in order to determine the retry is still feasible for success.

In .Net there is a comprehensive library that can handle retry operations called Polly. It also provides other resiliencies like:

  • Circuit breaker — ability to stop an operation if it can’t be handled by the system anymore; useful in preventing DDoS attacks as well as allowing the system to cool off
  • Timeout — provide timeout where the existing code does not permit it
  • Bulkhead Isolation — independent resiliency of different application elements
  • Rate-limiting — slows down an operation so the system could cope
  • Fallback — alternatives to the main flow in case it fails

Database queries

Retrieving data from a database is one of the most underestimated operations in an application when it comes to performance. Because it became fairly easy to work with databases, the questions about query performance, number of database round-trips as well as database indexes are often ignored.

When working with millions of records in different tables, make sure proper database design has been considered from the beginning (indexes, table partitioning, normalization/denormalization of data), even proper database server.

Redesigning a database from scratch after the application has gone live is a procedure that almost no one wants to happen.

Data maintenance is another aspect that is worth mentioning. As time goes on and the data is accumulating, different indexing strategies must be considered. Azure provides this as automated tuning, a service that does index management automatically based on query history.

Last but not least: Pyramidal thinking

Think of the project as a pyramid. The very base of the pyramid constitutes the environment, hardware and software the application is going to run on.

It could be the most genius application ever, but if the environment won’t support it, all bets are off.

You must ensure the optimal processing power: CPU, memory, network bandwidth, and disk speed are just some of the parameters you should ask about.

Then comes the database. Because the database usually constitutes the “Single Point Of Truth (SPOT)” when it comes to data, you must ensure the server can handle enough concurrent connections to satisfy the application needs.

Usually, database calls are expensive in terms of performance. To overcome that, you must introduce at least some level of caching to keep the data into memory instead of reading it from the database. However, in order to ensure you have the most recent data when working with cache, you should proceed with at least one of the following:

  • Cache static tables data, that won’t get changed at all or at least not very often
  • Have ORMs implement multiple levels of caching. This ensures caches are invalidated as soon as write operations are performed against the database.

Every piece of software out there was designed to perform a specific operation. People tire of waiting quickly, and the slow software application gets scrapped. With a few precautions, serious software incidents and data loss can be easily avoided.

References:

https://learn.microsoft.com/en-us/windows/win32/win7appqual/preventing-hangs-in-windows-applications

https://www.zdnet.com/article/the-escape-button-a-key-to-the-psychology-of-device-design/

https://blog.prototypr.io/guidelines-for-time-indication-and-progress-bars-in-user-interaction-design-4d5038084c84

https://learn.microsoft.com/en-us/dotnet/standard/parallel-programming/potential-pitfalls-in-data-and-task-parallelism

https://github.com/App-vNext/Polly

https://dotnettutorials.net/lesson/maximum-degree-of-parallelism-in-csharp/

https://learn.microsoft.com/en-us/azure/azure-sql/database/automatic-tuning-overview?view=azuresql

--

--