Cleaning Data with SQL — TRIM Your White Space

Leah Erb
Women Data Greenhorns
4 min readJul 21, 2018

A frequent problem with importing data into a database is unwanted white space at the beginning or end of strings.

Leading and trailing white space can cause issues when you try to identify duplicate records, attempt to JOIN tables on string columns, or simply query for a particular value.

Fortunately, the SQL TRIM function removes that pesky leading and trailing white space from strings.

What exactly is ‘white space’?

White space is a character in a string that represents horizontal or vertical space in typography. In other words: tabs, line feeds, carriage returns and, yes, spaces.

A white space has character value, just like ‘A’, ‘B’ and ‘C’ have a value. We just can’t always see white space when printed.

And just like any other character, a white space has a decimal value in the ASCII table. This little fact comes in quite handy when you are trying to identify a white space character using a SQL command.

The TRIM Function

Safari Books online has a nice snippet on the TRIM command:

ANSI SQL Standard Syntax

TRIM( [ [{LEADING | TRAILING | BOTH}] [removal_char] FROM ]
target_string
[COLLATE collation_name] )

The default for the first parameter, LEADING | TRAILING | BOTH , is ‘BOTH’.

The default for [removal_char] is a space char(32) .

In other words, if you specify only the target_string (the only required parameter), all leading and trailing spaces will be trimmed.

Example

Accept defaults to trim BOTH leading and trailing space characters:

SELECT '  Hello World  ', 
TRIM(' Hello World ');
+-----------------+-------------------------+
| Hello World | TRIM(' Hello World ') |
+-----------------+-------------------------+
| Hello World | Hello World |
+-----------------+-------------------------+

Note: The TRIM function for SQL Server versions 2016 and lower do not have arguments specifying the character(s) to TRIM from a string. They only TRIM leading/trailing space characters.

Example

Remove only LEADING asterisk (*) characters. Notice, even though only one ‘*’ is specified as the [removal_char] parameter, TRIM removes all leading asterisks (*), regardless of how many there are.

SELECT '*** Hello World **',
TRIM(LEADING '*' FROM '*** Hello World **');
+--------------------+---------------------------------------------+
| *** Hello World ** | TRIM(LEADING '*' FROM '*** Hello World **') |
+--------------------+---------------------------------------------+
| *** Hello World ** | Hello World ** |
+--------------------+---------------------------------------------+

An example white space scenario

Imagine you want to convert your toy shop inventory records from spreadsheets to an RDBMS database. You successfully load a database table from the spread sheet.

You verify the load by comparing the original spreadsheet with new the database table. The results looks really good.

Now you want to find how many toy ducks have ever been received, so you query the database:

SELECT toy_name, COUNT(*)
FROM toys
WHERE toy_name = 'duck'
GROUP BY toy_name;
Empty set (0.00 sec)

Wait, what?! That can’t be right, no records are returned? You’re looking right at the TOYS table in the database, you can see the 11 records with toy_name = ‘duck’.

So you run this handy little SQL statement to make white space visible:

SELECT id, toy_name, 
REPLACE(
REPLACE(
REPLACE(
REPLACE(toy_name, char(9), '{TAB}'),
char(10), '{LF}'),
char(13), '{CR}'),
char(32), '{SPACE}')
AS toy_whitespace
FROM toys;

For more on the REPLACE function, click here.

Ah ha! Now you can see all the toy_names have a leading space, and some of them have other trailing white space characters.

You run a series of UPDATE statements using the TRIM function for leading and trailing white space characters: tabs char(9), line-feeds char(10), carriage-returns char(13), and spaces char(32).

UPDATE toys SET toy_name = TRIM(char(9) FROM toy_name);
UPDATE toys SET toy_name = TRIM(char(10) FROM toy_name);
UPDATE toys SET toy_name = TRIM(char(13) FROM toy_name);
UPDATE toys SET toy_name = TRIM(char(32) FROM toy_name);

You run the SELECT again to makes white space visible and, phew, they’re all gone:

Now you can count your ducks with confidence:

SELECT toy_name, COUNT(*)
FROM toys
WHERE toy_name = 'duck'
GROUP BY toy_name;

White space is a common problem when loading data into a database from various sources. That’s why removing leading and trailing white space is considered common practice, and one of the first steps taken when cleaning data.

Happy cleaning.

--

--