**Why is data structures so fundamental?**

Where is it fundamental exactly?. In computer science. Computer science is about solving problems in just about any domain. To do that, clearly, the ecosystem or the domain has to be represented inside the computer to set the context against which to solve the problems. That is where data comes in. It is common knowledge that the amount of data that is typically represented in these systems is quite high. That is where the science of data structures comes in — to motivate the best way to represent the data in the systems. Algorithms come over data structures as they operate on the data structures.

**Why is data structures so fundamental?**

Knowledge of data structures allows you to choose the right data structures for your purpose at hand. Typical operations you need to do on data are the CRUD operations, you need to create/insert a new piece of data, you need to able to update some data or be able to delete some data or basically just read some data. There are some data structures whose search is faster but whose inserts are much slower. So if your system has data that needs to search a whole lot often than you need to insert, you use the data structure that lets you search faster.

I see. Can you please tell me a data structure that is faster for access and slower for insert and another one that is faster for insert and slower for access?

Sure. Array is a data structure that is pretty fast for access.

If you have an array with 10K elements, you can access the 8738th element as fast as you can access the first element where as for list the system would need to traverse from the first element to the second to the third to ….. One thousand seven hundredth to the one thousand seven hundred and first.. to the five thousand eighty fifth to the five thousand eighty sixth .. to the .. Eight thousand seven hundredth to the eight thousand seven hundred and first to ……..the eight thousand seven hundred and second and so on… to the eight thousand seven hundred and thirty fifth to the eight thousand seven hundred and thirty sixth to the eight thousand seven hundred and thirty seventh and finally on to the eight thousand seven hundred and thirty eighth.

List is faster to insert as it is a dynamic data structure as opposed to Array which is a static data structure. So whenever you need to insert data above the current size limit of the array, the underlying language runtime needs to allocate more memory for your array first and then insert data. You would not realize this while coding it but the user of your system is going to be hit with it by the slowness over the expected time when he runs into this use case. And you know that everybody hates the running whirl. Everyone is so impatient these days.

**Why is data structures so fundamental?**

If the programmer who wrote the system you are using knew data structures well enough and he clearly understood the use cases that you are going to run into and he really loves you and made provisions for all the corner cases you are going to run into, your experience with the software he made is going to be real nice. Your blood vessels are not going to burst as your critical financial transactions bounce back due to slowness issues.

The Array to List comparison with regards to insert did not seem to put List on a higher pedestal that the array. I always thought that the List is the cooler data structure that the Array. Is that not the case?

That is a good question actually. The reason that question surfaced is that I did not give you the best example to illustrate my point. I should have compared the delete operation on the list to the delete operation on the array. When an element in the array is deleted, the elements to the right of the array needs to be moved over to make up for the space. But when an element in the list is deleted, it is a constant time operation, all you do is that the previous node in the list would now point to the node after the node that you are deleting. You just dereference the node there. Simple.

In the same manner, for the insert operation of the array into a particular index of the array, you need to shove all elements from that position over to the right where as for the List, you just bring in a new Node with the data and wire the next pointers in so that the node before where you need to insert would point to the new node as it’s next and this node would point to the node after the point where you need to insert as the next.

Well, Array is a pretty good data structure too. I think you should see every data structure for what it is and make wise choices for your usage scenarios.

**Why is data structures so fundamental?**

Even if you were a well meaning programmer who cares enough for the system to think up all the corner cases, and add all those validations that help the user use the system right and you think up ways to make usage of the system intuitive, your user’s experience may bog down when his usage hits on issues due to the wrong usage of data structures. And your eclipse or your favorite IDE is not going to prompt you to use some other data structure while you are coding. You can code perfect looking well indented and well intended code but still set your system up for substandard experience if you did not understand the implications of data structure choice.

Can you please tell me a scenario where the choice of data structure would hit on the user pretty bad?

Let us go to a rural setting and may be back in time a bit and you are computerizing a system of operation that used to be paper based. Let us assume a hospital. So now you need to manually enter in all the form data of the patients into the system and you do not have the data in papers in the correct order which means there is going to be a lot of data movement while you are creating data. I would imagine that towards evening of the day job of the user who is entering data, he is going to hit timeout issues and frustration at having to re-enter the data over and over, all this for the same user who was excited at the elegance of the form and was excited to take on work on the new system in the morning while his colleagues where still doing paper work and he was Neo, the chosen one.

Honestly, that was not as Hollywood for me as I would have liked but I think I see the point there. I think I dare say that I understand that there is reason to choose appropriate data structure based on my use case at hand. Can I get a little more meat on the specifics of which data structure to use when and how to determine that etc.?

Sure. So theoretical computer scientists have done elaborate studies on what are the operations that are typically needed on data and that is how they came up with the data structures in the first place and after they came up with the data structures, they further analyzed the operation trends on these data structures. I am sure you would have gathered by now that I am talking about the O notations. Order of notations.

The biggest factor that affects the operation time is how much data you already have in the data structure that you are going to operate on. Meaning it is going to be whole lot faster to insert an element in the middle of an array that has three entries in it versus an array that has twenty three thousand entries in it. At this point you are probably thinking about databases. Like you are not going to use an array to put twenty thousand entries, you would probably use databases. To state the obvious, database is just a software as well and a software that better be written with good data structures.

Coming back to the time complexity or O(n) notations, all the standard useful and hence popular data structures have been experimentally as well as theoretically studied and determinations have been done as to the time and space complexity of the various operations. This knowledge should be in the toolkit of the software developers. Typical time complexities are O(n) where the time taken for the operation is directly proportional to the number of elements in the data structure. O(log n) where the time for operation is directly proportional to the log of n which is significantly lower than O(n). Considering two instances of data represented in data structure one with O(n) time complexity, the operation is going to get eight times slower when the number of elements in the data structure increases from 2 to 16 where as the same scenario for the data structure with O(logn) would be log(16)/log(2) which should be four times slower.

Also, please try to plot a graph in your head for log(n) versus n. log(32) is 5, log(64) is 6, log(128) is 7, log(256) is 8, log(512) is 9, log(1024) is 10.

Ok if we pause there for some Hollywood, with 1024 elements, one data structure is one hundred times better than the other there.

I see that. So our rural data entry guy in his nice white shirt and pleasant positive countenance and slightly balding hair would see the response time of his save button drop from one second to hundred seconds(one and half minutes) when he hits the 1024th form?

That is correct. As you can see, the impact is real. His boss had come in the morning and seen that he had completed thirty forms in the first hour and had anticipated about two hundred fifty forms to be done by end of day and he even passed a good word about Kumar to his boss. And that is when he started hitting the timeout issues and then things just don’t look good overall for him and his boss and before long for the software that he was using.

Nice. That drives the point home for me. I am clear that data structures is fundamental. I definitely hope that the systems that are being operated on by the cashiers who give out the 2000 Ruppee notes out were built with good data structure understanding. Those systems are definitely hitting some numbers at the moment and for the sake of the tired farmer at the wee end of the line, I hope that they had O(logn) complexity over a O(n squared) complexity.