Data structures: handling complex information

The basics of Computer Science: part four

Dan Quine
The basics of Computer Science

--

In the second part of this series we looked at abstraction — the art of hiding away complexity to make it easier to write programs. This time, we’ll find out how abstraction can make it easier to handle complex information.

Data structures

Programs are a set of instructions that manipulate information. As we saw previously, you specify which information your code is working on using variables.

More about variables

Variable let you specify which bit of information you are working with. They are names that reference specific parts of the computer’s memory.

Here’s how we create variables:

author = "Mark Weiss"
title = "Data Structures and Algorithm Analysis in Java"
publisher = "Addison Wesley Longman"

This code creates three variables called author, title and publisher. It also puts information into each of these. For example, we put the text “Mark Weiss” into the variable called author. The name of a variable is a useful reminder of what information it contains: you could use anything for the name, but author is much more memorable than, say, hjkdugwh73h.

As we saw in previous examples, you can also use variables to read information from memory:

IF author="Mark Weiss" THEN
title = "One of Mark's books"
END

This code checks to see what information is stored in the variable called author and if it is the text “Mark Weiss”, it changes the information stored in the variable title.

The limitations of variables

Simple variables like these quickly become difficult to manage. A program can have hundreds of variables, each of which you have to remember. Managing all these becomes quite a mental feat, especially when you need dozens of variables to describe a single thing.

If you need to store information about hundreds of books, you will quickly have a lot of variables with very similar names. You might have author1, author2, author3… then title1, title2, title3… Keeping track of all of these and writing programs that use them becomes utterly unmanageable.

Programming languages provide ways to make it much easier to organise complex information: data structures.

Organizing information

One of the most basic data structures is a record. A record groups together all the variables that relate to a single object in the real world. The variables author, title and publisher all relate to a book, so we can create a record called book:

RECORD book
author
title
publisher
END

The three variables are neatly grouped together. Now you can create a book that has an author, a title and a publisher:

java2 = NEW book
java2.author = "Rogers Cadenhead"
java2.title = "Teach yourself Java 2"
java2.publisher = "Sams"

This code creates a variable called java2. The instruction NEW book says that the variable will contain a book record. The record contains variables called author, title and publisher. In this example, the code sets values for these variables within the record.

Another useful data structure is the array. An array is a list of variables. For example, I want a list of all the books I own. I can create an array and add book variables to it:

library = NEW array
library.add(java2)

This creates a variable called library that holds an array. I then add the variable java2 to the array. Assuming I own more than a single book, I can now add all of those to the array as well.

Once I have all my books added to library I can do cool things like show the titles of all the books I have by a particular author:

DEFINE ShowBooksBy authorToFind
FOR i=1 TO library.size()
IF library.get(i).author=authorToFind THEN
PRINT library.get(i).title
END
END
END

This defines a new re-usable code block called ShowBooksBy with a variable authorToFind which will contain the name of the author whose books we will display. The code block contains a loop which counts from 1 up to the size of librarylibrary.size() gets the number of books in the array library.

Each time through the loop, the IF logic is tried. This logic gets the next book in the library (“library.get(i)”) and checks to see if the author of that book (“library.get(i).author”) is the same as the author we are looking for (“library.get(i).author=authorToFind”).

If the author of the next book is the same as the variable authorToFind, then the title of the next book is displayed using the PRINT statement.

Now you can write code that shows the titles of all the books by Mark Weiss:

ShowBooksBy "Mark Weiss"

The ShowBooksBy code block has hidden the details of how to do this operation. Similarly, the library array and the book record have hidden away the details of how this data is stored.

Lots of data structures

The record and the array are two useful data structures. There are many other types of data structure; a few of the more common ones are:

  • set: a group of objects rather like an array, but a set guarantees you cannot add the same object to it more than once.
  • map: a group of objects, but instead of having to access them one by one, you can add and get objects by name.
  • graph: a group of objects where pairs of objects can have connections that define how those objects are related.
  • tree: a special type of graph where each object has one “parent” and multiple “children” objects, defining a hierarchy of objects.

By combining these, and other, data structures, you can create rich and interesting organizations of information that allow you to represent and manipulate data about all sorts of real (and imagined) objects.

Next Time

We’ve now covered the core of Computer Science and how it applies to programming. Abstraction is the fundamental idea and we’ve seen how we can apply it to both code and data.

The next part of this series will look again at what a computer actually is; in particular is the physical machine we normally call a computer really the computer at all?

--

--

Dan Quine
The basics of Computer Science

SVP of Eng at Hologram. Past: Mode, Lever, AltSchool, Songkick, Google, Blurb, Apple. Startup tech entrepreneur. Proud father of twins. @crowquine