Using Google Sheets to quickly visualize event stats

When Thomas asked me to visualize the audience of the CSSConf.Asia and JSConf.Asia conferences, I was curious and excited to see what the data would show. The results?

As I did this on top of welcoming attendees and helping the conference run smoothly, I did not have time to use advanced tools. We registered attendees using Google Sheets, so I simply added some formulas in between registrations.

I’ve published the template spreadsheet so that everyone can see the formulas used. I’ve even added some explanations of the formulas to make the spreadsheet easier to read. Enjoy.

“Conditional formatting” with a “custom formula” for attendee registration

The conference tickets were sold through EventNook (developed by Kyaw Lin Oo from Blk71, Singapore), which generated a simple spreadsheet of attendees. Several laptops were used at the registration desk to mark attendees as having registered, with the data being collected on a Google spreadsheet.

Rows of registered attendees are colored to provide a simple visual clue.

To simplify this process (and avoid manual coloring of cells — which drives me nuts), the rows are colored using “conditional formatting”:

Google Sheets conditional formatting tool

The ‘Customer formula’ option available in Google Sheet allows you to format a cell based on the content of another cell. That’s pretty cool.

The range of “1:284” indicates that the whole sheet (which has 284 rows in this case) will be parsed according to the conditional formatting formula.

To register an attendee ‘YES’ was typed in column A so the custom formula checks this column. The ‘$’ sign ‘locks’ the column so that each cell can be matched against the content of the cell in column A of the row being considered, applying the style for the whole row.

The test sheet ConditionalFormatting lets you observe how the same formatting works when the ‘$’ is applied to individual cells instead of whole rows.

UNIQUE, INDIRECT, COUNTIFS and data validation to explore data

Pivot tables could be used, but I don’t like the way they behave when you want to chart them (because the resulting size of the table is unknown). So, I created a Stats sheet that lists the different data fields available and quantifies them.

Exploring data with the spreadsheet

The first section of the sheet lists the available columns (in column B) using the neat TRANSPOSE formula.

Column A then uses a MATCH to get the column number for each of these.

From here, we can generate a selector of any of these column through a ‘data validation’ menu using the transposed list of column headers — This is displayed cell I4 (the dark green cell). The selector works by obtaining the matching column number from our previous MATCH.

The selected value is then matched in D8 using the INDEX function of the A column and obtaining the row number from a MATCH of the selector

INDEX and MATCH, for those of you who want to go beyond VLOOKUP.

The next step is to extract the possible values of the column considered. That’s when UNIQUE comes in handy, and it does wonders combined with INDIRECT.

  • UNIQUE extracts the unique values of a list.
  • INDIRECT allows you to reference the list through variables. In the screenshot above, D7 represents the sheet name, D9 the first row of data and D8 the column number. This means that we are referring to, for example, ‘Orders’!R8C5:C5 . The ‘ (single quotation sign) is used to make the formula fail-safe in the case of sheet names with spaces. As we used RC naming, instead of the usual A1, the INDIRECT function’s second variable is ‘false’. Without the RC notation, ‘R8C5:C5’ would become ‘E8:E’ but numbers makes it easier to reference as variables based on the selected column.

Using the same INDIRECT reference with a COUNTIFS calculates the rate of occurrence of each of the unique values.

With just 1 condition for the COUNTIFS, a COUNTIF would have worked as well. By habit and to avoid adding the ‘S’ when adding another condition, COUNTIFS is my formula of choice vs COUNTIF.

The last step is to sort the possible values by occurrence, which is done through a -get ready for the function name - SORT of the resulting table in another table.

On the left: table generated with UNIQUE values, and COUNTIFS of the occurrence of each value. On the right: the table is sorted in descending order (the third parameter of SORT being ‘false’). The rate of occurrence is added again, for easy chart generation.

From this, we now have a sheet in which we can select the column to analyse and see all the possible values for it as well as the most popular ones, both as text and as a chart.

Give it a try in Stats sheet.

Mapping nationalities and addresses

Google Sheets comes with native mapping ability, as long as information can be analysed as geographical data.

Maps based on two letter country codes and COUNTIF of attendees.

This means that creating a map of the nationalities of attendees is just a click away.

Determining the country that people travelled from is slightly more time-consuming as we need to extract the country from the address of each attendee. Thankfully, Google Sheets supports regex functions

The address string format, which lists street name, city, state, country and postcode, can be parsed with regular expressions extractions thanks to REGEXEXTRACT functions

The first regex, ‘[a-zA-Z 0–9()\-/]*$’ extracts all letters, numbers, spaces, brackets (parentheses), dashes (which we escape with ‘\’ in the regex) and forward slashes (‘/’). By placing all these characters in a bracket followed by ‘*’ they can be repeated an unknown number of times. The ‘$’ means these characters should be at the end of the string. The result is an extract of the end of the string that terminates when a character that is not listed appears, in this case a comma (‘,’). Hence, this first REGEXEXTRACT extracts the country string and postcode for us.

The second regex (‘[a-zA-Z ]*’) only extracts letters and spaces, excluding postcodes (which are numbers within brackets, not letters obviously).

Once the distance of each country from Singapore is added, along with a COUNTIFS formula and a calculation of the distance travelled, we get a map of the countries that attendees reside in and an estimate of the total distance travelled by all attendees (for the conference attendees, we figured it was 1,200,000km, a distance equivalent to 30 trips around the globe).

Parsing comma-separated strings to analyse the favored programming languages of attendees

As a developers’ conference, languages practised by attendees is of the utmost importance. Luckily, the preferred programming languages of attendees was listed on the registration forms. Unfortunately, one attendee can favor several languages, and the data is thus saved in a comma-separated string, not allowing simple counting of the column’s value.

That’s when the data manipulation formulas of Google Sheets come in handy.

In the filtered language tab, we extract the columns of interest, using ‘{ … }’ to create a single formula, extracting columns E, X and Y. As I wanted to refine the results according to conference (CSS and/or JS), I used a FILTER with a regular expression.

The ‘{}’ generates an array in Google Sheet

Let’s focus on the ‘{ … }’ notation for now. It’s a simple array constructor with a ‘,’ (comma) used to add a column, and a ‘;’ (semicolon) used to add a row. The only constraint is to have each cell of the array must be filled (ie. each row and column should have the same number of values than other rows or columns). An example of an array with 2 rows and 3 columns is in arrays.

Cells filled by an array constructor formula.

Once we have the list of languages, we need to parse it, extracting each value from the comma-separated strings. In order to manage everything using a single formula, I first joined the strings in one.

The different steps are detailed in data parsing which shows the initial list becoming a long string (JOIN), then a horizontal series of values (SPLIT) and then a vertical series of values (TRANSPOSE) before finally representing a list of unique values, which can then be counted (using values from the full vertical series of values, not the initial comma-separated list).

The beauty of how Google Sheets manipulates data.

For the actual calculation spreadsheet, all of these formulas are combined into one (using an INDIRECT reference for flexibility).

Check out the Piktochart infographic and the spreadsheet.

Show your support

Clapping shows how much you appreciated Florian Cornu’s story.