Let’s start !

Case study:

We want to know the number of each categories and the condition is that the wording of “description” and “category”.

Jest take a look at below code. Explain later.

We can find out where description like ‘…..trash….’

LIKE and ILIKE queries will help you find relevant descriptions and categories.

% matches 0 or more characters.

Difference between this two is ILIKE don’t need to consider character is upper or lower.

Function for character data type:

change to lower and upper case
Trim function can remove the spaces out, LTRIM only trim the left spaces
Can also trim other characters, case matter!
left and right use to select how many character in a string you want from left or right
Split the string by delimiter and can select which part you want
  • Case study:

Select the first 50 characters of description when description starts with the word "I".

The description column of evanston311 can be very long. You can get the length of a string with the length() function.

For displaying or quickly reviewing the data, you might want to only display the first few characters. You can use the left() function to get a specified number of characters at the start of each value.

To indicate that more data is available, concatenate '...' to the end of any shortened description. To do this, you can use a CASE WHEN statement to add '...' only when the string length is greater than 50.

So I case when to set up condition I want to get is length more than 50 characters, then combine left 50 character with ‘…’, then the result as below.

Case study:

There are almost 150 distinct values of evanston311.category. But some of these categories are similar, with the form "Main Category - Details". We can get a better sense of what requests are common if we aggregate by the main category.

To do this, create a temporary table recode mapping distinct category values to new, standardized values. Make the standardized values the part of the category before a dash ('-'). Extract this value with the split_part() function:

split_part(string text, delimiter text, field int)

also need to do some additional cleanup of a few cases that don’t fit this pattern.

Then the evanston311 table can be joined to recode to group requests by the new standardized category values.

Case study:

Determine whether medium and high priority requests in the evanston311 data are more likely to contain requesters' contact information: an email address or phone number.

  • Emails contain an @.
  • Phone numbers have the pattern of three characters, dash, three characters, dash, four characters. For example: 555–555–1212.

Use LIKE to match these patterns. Remember % matches any number of characters (even 0), and _ matches a single character. Enclosing a pattern in % (i.e. before and after your pattern) allows you to locate it within other text.

For example, '%___.com%'would allow you to search for a reference to a website with the top-level domain '.com' and at least three characters preceding it.

Create and store indicator variables for email and phone in a temporary table. LIKE produces True or False as a result, but casting a boolean (True or False) as an integer converts True to 1 and False to 0. This makes the values easier to summarize later.

If you enjoy this article please click the thump up (Keep pressing can send multiple thump up!) and follow me, I’ll keep updating content about data science! Hope you like it.

Reference:

Christina Maimone: EXPLO RATO RY DATA ANALYSIS IN SQL, datacamp

--

--

Shawn

Self taught — Data Analyst | Business Intelligence Specialist | Business Analyst | Data scientist