Understanding your data
In this post you will learn about data types and best practices for structuring your data.
Introduction
When you have a data set it is important that you use a logical structure so the data can be easily interpreted and accurately represented. In this post we will explore best practices for using and organising your data.
Structuring your data
Data that is easy to understand is easier to analyse. There are several things you must consider in order to produce consistent and reliable data.
Spelling: Check your spellings and typing. It will help you to avoid incorrectly categorised data or creating duplicate categories. Should “small” and “smll” really be two different categories?
Duplication: Check for duplication. Have the same data values been added in different ways by the respondents? For example, if your data is from a survey, have you made sure that “car” and “automobile” are not two different categories?
Notes and labels: Use clear notes and labels. Make sure the logic behind your notes is clear to an outsider; people need to be able to understand your data! Will you remember what “exp1” and “exp2” meant when you look at your work in a few months? What about other people who you work with?
Blanks: Carefully explain the blanks. You should distinguish between readings that are missing because a respondent refused to answer and readings that are missing because that question did not apply to that respondent. For example, use “-1” and “-2” respectively.
Data types
When data is recorded formally in a statistical analysis tool, database or programming tool, you must choose the data type or format for each variable. This affects both how it is stored and how it is displayed.
Typically, you may choose one of the following data types for each variable:
Numeric: This is a common number format that may or may not include decimal places.
Currency: This is numeric data that is specifically in a monetary format.
Scientific notation: This is for very large or very small numbers, where ‘aEb’ means “a multiplied by 10 to the power of b”.
Date: There are usually various date and time formats to choose from; standards for this vary in different parts of the world.
String: Text format, used for qualitative data that is not categorical and cannot be represented any other way.
Further resources
- Introduction to statistics
- Data and Statistics subject guide
- Research data explained
- Specialist Library Support — workshops and online tutorials