Instructions & Data
What software is made of
There’s a famous joke in Computer Science: any problem can be solved by adding another layer of abstraction. When you get this joke, you will truly understand how to build software.
At its core all software consists of two things: instructions and data. Instructions tell the computer what to do. Since these instructions are typically followed by a computer (as opposed to a robot), they usually can’t affect change on the physical environment. Instead, they operate on data (a symbolic representation of information); acquiring it, analysing it, copying it, modifying it, storing it, transmitting it and presenting it to humans.
Unlike humans, who can usually interpret vague or sloppy instructions, computers need to be given extremely precise instructions and will do exactly what you say, not what you mean.
While this may sound like a nuisance, in exchange for being provided with very precise instructions computers will follow your instructions extremely fast for a very long time without complaining, taking breaks, eating, getting sick or wanting to be paid. If you can come up with a set of instructions that a computer can follow to perform a task you need done, it is nearly always cheaper, faster and more reliable to have your task done by a computer than a human. That said, as you will discover, coming up with the right set of instructions can often be tricky, especially for complex problems.
The instructions are generally provided in advance by humans, commonly referred to as programmers (AKA developers or coders). Programmers write down these instructions in plain (i.e. unformatted) text using a language designed specifically for programming computers. There are many such languages, each of which have their pros and cons.
Just like natural languages (e.g. English, Chinese, Arabic, etc.), the popularity of programming languages follows a power law distribution: most software is written in a few of the most popular languages and the other languages are largely relegated to niches (the long tail). Instead of mastering the use of just one language, understanding how various concepts are expressed in several of the most popular languages will make you a much better programmer.
Data may be acquired from a variety of sources: directly from humans or the environment via sensors (e.g. camera, microphone, etc.) or input devices (e.g. keyboard, mouse, touchscreen, etc.), from a storage device (e.g. flash drive, SD card, etc.), or from other computers across a network (e.g. wifi, 3G, ethernet, etc.).
Data may be analysed in sequence by a single computer or split up across thousands of computers and analysed in parallel. When data is modified, the new version may be used to replace the original version or every revision of the data may be stored in case somebody wants an interim version someday.
Data may be deleted entirely after being used (e.g. SnapChat) or many copies of it may be transmitted to and stored in different places (e.g. Google Drive, BitTorrent, etc.), either to reduce the chances of losing it (i.e. backups) or to have more convenient access to it in each place (i.e. caches).
Data may be stored or transmitted in different forms from the original for various reasons: compressed for efficiency/performance (e.g. MP3, PNG, zip), encrypted for security (e.g. FileVault), simplified/translated for archival/presentation (e.g. PDF, HTML, Unicode, etc.), etc.
Data may be presented to humans using output devices (e.g. display screens, speakers, etc.). When presenting data to humans, it is important to be cognizant of differences between individuals in terms of sensory perception; vision and hearing impairments may reduce or eliminate the effect of data presentation that uses only a single medium.
Next up: types of data