Understanding your data

Specialist Library Support
Specialist Library Support
2 min readJul 26, 2019

In this post you will learn about data types and best practices for structuring your data.

Photo by Stephen Dawson on Unsplash

Introduction

When you have a data set it is important that you use a logical structure so the data can be easily interpreted and accurately represented. In this post we will explore best practices for using and organising your data.

Structuring your data

Data that is easy to understand is easier to analyse. There are several things you must consider in order to produce consistent and reliable data.

Spelling: Check your spellings and typing. It will help you to avoid incorrectly categorised data or creating duplicate categories. Should “small” and “smll” really be two different categories?

Duplication: Check for duplication. Have the same data values been added in different ways by the respondents? For example, if your data is from a survey, have you made sure that “car” and “automobile” are not two different categories?

Notes and labels: Use clear notes and labels. Make sure the logic behind your notes is clear to an outsider; people need to be able to understand your data! Will you remember what “exp1” and “exp2” meant when you look at your work in a few months? What about other people who you work with?

Blanks: Carefully explain the blanks. You should distinguish between readings that are missing because a respondent refused to answer and readings that are missing because that question did not apply to that respondent. For example, use “-1” and “-2” respectively.

Data types

When data is recorded formally in a statistical analysis tool, database or programming tool, you must choose the data type or format for each variable. This affects both how it is stored and how it is displayed.

Photo by Markus Spiske on Unsplash

Typically, you may choose one of the following data types for each variable:

Numeric: This is a common number format that may or may not include decimal places.

Currency: This is numeric data that is specifically in a monetary format.

Scientific notation: This is for very large or very small numbers, where ‘aEb’ means “a multiplied by 10 to the power of b”.

Date: There are usually various date and time formats to choose from; standards for this vary in different parts of the world.

String: Text format, used for qualitative data that is not categorical and cannot be represented any other way.

--

--