Coding for journalists: 10 programming concepts it helps to understand
If you’re looking to get into coding chances are you’ll stumble across a raft of jargon which can be off-putting, especially in tutorials which are oblivious to your lack of previous programming experience. Here, then, are 10 concepts you’re likely to come across — and what they mean.
1. Variables
A variable is one of the most basic elements of programming. It is, in a nutshell, a way of referring to something so that you can use it in a line of code. To give some examples:
- You might create a variable to store a person’s age and call it ‘age’
- You might create a variable to store the user’s name and call it ‘username’
- You might create a variable to count how many times something has happened and call it ‘counter’
- You might create a variable to store something’s position and call it ‘index’
Variables can be changed, which is their real power. A user’s name will likely be different every time one piece of code runs. An age can be added to at a particular time of year. A counter can increase by one every time something happens. A list of items can have other items added to it, or removed.
They can also be combined: an age (one variable) might be calculated based on a birth date (another variable).
2. Strings, integers and other jargon for types of data
You can have different types of variables, which changes what you can do with them. Typical types of variables include:
- Numbers — integers (whole numbers) and floats (those with decimal places)
- Text — generally called strings and indicated by quotation marks like so: “17 August”
- Lists or arrays — explained below — normally indicated by square brackets and commas like so: [“Manchester”, “Glasgow”, “Paris”]
- Dictionaries or dicts — explained below — normally indicated by curly brackets, colons and commas like so: {“Age” : 23, “Name”: “Jane”}
This is important because problems can occur when code encounters information in the wrong format. For example, you cannot perform a calculation with strings or in some cases combine text with numbers.
In those cases coding often involves telling the code to treat ‘7’ as a number and not a string, or even to convert the string ‘seven’ to its numerical equivalent. Computers are great at performing tasks repetitively but need explicit instruction on what they’re doing.
3. .classes and #ids and selectors
HTML code uses the class= and id= to identify particular types of content and make it possible to manipulate it with other code. For example you might have code like this in a webpage:
- <div class=”article”>
- <ul class=”nav”>
- <h2 class=”subhead”>
- <div id=”footer”>
There are at least three ways in particular that these classes and ids then become useful:
- They can be styled using CSS
- They can be changed using Javascript and other languages
- They can be identified and scraped using languages like Python, Ruby and PHP
In most cases, this is done using selectors. These selectors indicate a class with a period, and an id with a hash like so:
- .article (for class=”article”)
- #footer (for id=”footer”)
If you see that sort of thing, or read about classes and ids, you know what’s going on.
4. Functions, methods
Functions and methods are generally one-word recipes to do things that would otherwise take many lines of code to explain. Here are just two examples:
- len in some languages means ‘give me the length of the thing I specify’
- split in some languages means ‘split this thing into one or more things based on a criteria I specify’
In order to that the function needs an ingredient called an argument or parameter (explained below), and the method is attached to an ingredient called an object (also explained below).
The distinction between methods and functions is subtle and not worth exploring here, especially given that their use differs slightly between languages.
One particularly useful thing about functions is that you can define your own in your code — useful for anything you want to do more than once (thanks to @tomp in the comments for pointing out you can also define your own methods).
Other functions are ready to use in the programming language from the start. Here is a list of built-in functions in Python, for example. JavaScript has these, and these as well.
A third type of function or method is only available by using the relevant library — see below.
5. Arguments or parameters
Functions and methods (explained above) need ingredients to work, each called an argument or parameter.
These appear in parentheses following the name of the function, like so:
len("Paul")len(myname)
The len function, for example, gives you the length of whatever argument is supplied in the parentheses.
In the first example that is a string (indicated by quotation marks) — “Paul”. The result here would be 4 (4 characters).
In the second example the argument is a variable — myname — so the result will vary depending on what that variable contains at that point. If myname is “Bradshaw” at that point, the result will be 8 (8 characters).
However, if the variable is a list then the result will be the number of items in that list, not the number of characters. For other types of data it might not work at all, and give an error.
The documentation for a function or method should tell you more about what exactly it does — and what arguments it takes. These are called parameters in general, but they both mean the same thing — it’s just that one term is used for the general (“This function has one parameter: an object to be measured”), and another for the specific (“Taking the argument ‘Paul’”).
Some functions and methods use more than one parameter, each one separated by a comma. And some parameters are optional. Sometimes the parentheses are left empty like so: ready(). Again, this should be detailed in the documentation.
A useful tip when learning coding is to look for the term ‘documentation’ and ‘function’ or ‘method’ along with what you want to do, or the name of a function/method you’re struggling with.
6. Libraries
Libraries are collections of further functions and methods which allow you to do more than you can with just the basics of the programming language. Put another way: they are a way of using other people’s code, and they are one of the most powerful parts of programming.
If you can think of a problem, it’s likely someone has created a library to deal with it: drawing a map; scraping information from a series of webpages; converting a document; charting data or putting it into interactive tables; creating animations or effects.
For that reason, another useful tip is to search for your problem, the language you’re using or learning, and the term ‘library’, e.g. ‘javascript mapping library’.
To be used, most libraries have to be imported — often with a link to the file containing the library’s code.
7. Lists/arrays and dictionaries/dicts
Lists and dictionaries are special types of information which can be enormously useful in programming — but also confusing for those not used to them.
The terminology varies — in some programming languages lists are called arrays, and dictionaries are called dicts. I prefer ‘lists’ and ‘dictionaries’, however, because they are easiest to understand.
A list or array is just that: a list of items, which looks like this:
["Asia", "Africa", "Europe"]
The contents of a list can be one or more of any of the data types listed above.
Lists are enormously useful both for
- storing information (for example in scraping, or a user’s answers in a quiz) and for
- repeating actions (for example plotting or mapping each number or location in a list). See below on loops.
Dictionaries are similar in that they are also a form of list, but with this key difference: they are a list of pairs.
Each pair has a label (called a key) and a value, connected together by a colon, for example:
"Age": 24
The term dictionary is useful: think of it as a collection of words with an associated definition. But you can also think of it as column headings (age, name, location) and values (18, Sarah, Chicago).
Each pair is then separated by commas and the whole is placed in a list in curly brackets like so:
{"Age" : 23, "Name": "Jane"}
This makes it particularly useful for storing data that has more than one label. For example you might store a list of ages as a simple list. But if you wanted to connect each age to a name or location, you’d need a dictionary to do that.
This is precisely the logic behind the data format JSON, a format used by a number of APIs (see below).
As with lists, dictionaries can contain all types of data under each key, including lists and dictionaries (i.e. you can have a dictionary-within-a-dictionary).
8. Loops — for, each, while
As mentioned above, one of the great things about lists is that they allow you to repeat actions many times — one of the main uses of programming.
To do this you normally use a loop. The loop starts at the first item in a list, performs some action with it, and then repeats for the second, and so on until it comes to the last item.
Examples include:
- Taking each location in a list and placing it on a map
- Taking each number in a list and sizing a bar in a bar chart to that amount
- Taking each item in a list (for example ID codes) and adding it to a partial URL to form the full URL
- Taking each URL in a list and running some code to grab information from it
- Running an animation ‘while’ someone’s score is below or above a certain value.
What can be confusing about a loop is the way it creates a variable at the same time, and how many times that variable’s value changes as the loop runs. For example:
numberlist = [1,2,3,4]
for num in numberlist:
print num
In that code, ‘num’ is created as a new variable to hold a value while the list runs. In quick succession its value is 1, 2, 3 and 4 before the loop finishes (all values in the list are used).
9. Objects
‘Object’ is used in regularly in tutorials without any reference to what it means. Without going into too much detail, when people talk about an ‘object’ in programming, they generally mean something which can be manipulated or used in some way by the code, such as a variablecontaining an age, name, list, etc.
It is when the term ‘object’ is preceded by another that it can get particularly frustrating. For example you might read about a ‘jQuery object’ or an ‘lxml object’.
When objects are described in this way it basically means that the object in question can be manipulated or used by code from that library:
jQuery methods, then, can be used on a ‘jQuery object’; lxml methods can be used on a ‘lxml object’. How do they get to become such an object? There is normally a point in the code at which a variable is made into a ‘jQuery object’, ‘lxml object’ etc. — look for that or find tutorials or code comments which make it clear.
10. APIs
An API is an Application Programming Interface. It means a way of asking questions and getting answers.
APIs are particularly useful in programming, because they allow you to ask lots of questions and get lots of answers (generally as structured data), often based on live data and without any middleman.
The coding is often focused on the presentation of the resulting information — for example, on a map or in a chart, or in a timeline.
Many apps are based on APIs. Twitter apps, for example, get answers to the question ‘what are the people I’m following tweeting?’ from the Twitter API. The resulting data is presented in different ways by different apps based on their particular code — but the underlying data (the tweets) is the same.
Typical APIs with journalistic uses include:
- Social media APIs (what are people saying/sharing in a specified location/with a particular term?)
- News APIs (what content has been published by a specified journalist/with a specified category?)
- Political APIs (how has a specified politician voted? What constituency does a specified person represent?)
- Location APIs (what is the latitude and longitude for this postcode? What is the local authority?)
- Crime APIs (what crimes have occurred near this location on this date? What were the outcomes?)
If you have lots of data you can use an API to ask the same question for each piece of data (using loops — see above), for example each postcode or politician or search term.
They can also be combined: for example, you might use the answers from one API as the basis for questions to another.
A question to an API is normally formed as a URL. For example, the URL to ask the UK police API about crimes during a specified month at a specified latitude and longitude is:
http://data.police.uk/api/crimes-at-location?date=2012-02&lat=52.629729&lng=-1.131592
Note that the date, latitude and longitude are all given in the URL, which is formed based on guidance in the documentation.
Have I missed a piece of jargon? Let me know and I’ll try to cover it.
This article first appeared on the Online Journalism Blog. Subscribe to get regular updates here.