Coda and undouble your (email) list

Published in

Nerd For Tech

8 min readMar 14, 2021

Keeping data aligned is a challenge. Partly because data changes (people move, change phone numbers, use different email addresses and so on) and partly because data lives in different systems throughout the organisation. This blog is about how to bring data together and to deduplicate using a button outside a table.

Create a master table

Often I am called in for a job that should have been ready yesterday and is related to data collection. To move on fast, I create a new table in a doc and link it to the form I put online to harvest the data. When the project is done, I have my own table with end user data, while other data lives in different systems elsewhere in the organisation. It is not yet aligned. If this data is delivered to me, it is mostly in spreadsheet format (mainly Excel). To keep track , I import the data in seperate tables (each page has its own table).

AddRow

The first step is to push all the data in one single table. I use the function AddRow() in a button to complete the master table. Afterwards I hide the pages with the source tables. In this master table I have information like firstname, last name, birthday, gender, salutation and so on. More often than not, end-users appear more than once.

Filter on Doubles in the Master Table

When it comes to addresses (physical and digital) most likely the latest version is the one still in use. However often we are left in the dark and we have no date related information. What we can do is already filtering out doubles. The example below is based on an answer I formulated in the Coda Community

The count:
thisTable.Filter(email.Contains(thisRow.email)).Count()The first:
thisTable.Filter(email.Contains(thisRow.email)).First()

I show the count to have a visual understanding of the issue. In case a value appears really a lot compared to other values, it may teach you something. Normal data sets ‘behave’ normally is a rule of thumb and if we notice rows that differ really a lot, we may want to look into these values. What if 10 different people all use the same general email address?

I used the formula with First() to show which value appears first in the table (so with the lowest RowId). This is relevant only if the order of the data is related to date related info. One could also check for Last() . However in case your source table has no relevante date reference, it is not of very much value, Last() or First() either works.

The final step is to set a checkbox active on the rows we want to keep. We elaborate on the formula with First() and we add Contains(thisRow) . I once shared that you have to read functions from right to left. In this case we read that of this row, we take the first result of any email we may find on this row. As I wrote in my blog over Coda and lists, what a filter does is evaluating all items in a list by default. The formula below stipulates that when you take the list of all email adresses, you check for the email address on this row (thus each row because the formula runs over every row, row after row) and then you filter out the first email in case you find one you came already across (if the count is more than 1).

thisTable.Filter(email.Contains(thisRow.email)).First().Contains(thisRow)

For a rather simple task as deduplicating, this is a complex formula. Over time Coda might come up with a formula that executes this formula with a function as DeDupelicate() Till so far, we use this beauty.

The Coda user asked afterwards for a bit more support. Not only how to delete the doubles (thus the onchecked rows via a button and the action DeleteRows()) but also to take away the hyperlink that sometimes appears when you enter an email address. Before we continue, the simple delete function in the button is this one for the table Duplicates. We filter on unchecked boxes (“False”).

DeleteRows(Filter(Duplicates, [to keep]="false"))

How to use a button to Modify all rows?

The tricky thing with a button outside the table is that we no longer can use thisRow() for obvious reasons. The formula we developed above to check for duplicates we can thus as such not write in a button that lives on the canvas. However we can still access each row, this time not as ThisRow(), but as currentValue. Below we show how this is done. I borrowed the insight from Joostmineur who showed me how to put CurrentValue at work in the simplest possible manner. My initial and working solution was a bit more complicated. I used a virtual rowindex via Sequence() however that is not needed as we’ll see below. I put in bold the two important conceptual steps.

Duplicates.FormulaMap(CurrentValue.ModifyRows(email,CurrentValue.email.ToText().Trim()))

The first observation is that we use CurrentValue in a specific way and that is as a reference to each item in the list we evaluate via FormulaMap() This list however refers to the complete table. Happily for us, every table has a display column and this is the list we talk about when we apply CurrentValue to the complete table. The display column has a status aparte compared to all other columns. It shows up in other tables when you reference to — in our case — the table Duplicates. As the image of the table shows, the display column is the RowIndex. We thus have a list that contains all items of the Column RowIndex and the CurrentValue refers to any position in this list, while the FormulaMap() makes sure we run over all of them, the first row, the second row, the third and so on until the formula has been applied to all rows.

The second observation is that the value we look for to modify is the CurrentValue.email. This is the value in each row in the column email, item after item is subjected to the manipulation we defined (turn into text and trim) via FormulaMap. This approach enables you to modify row by row, without filling out each row with all the (altered) list values.

In sum

The CurrentValue before ModifyRows refers to any item in the display column, the FormulaMap works as a result on each row (as an item in the list) and we alter the value of each item in the list emails by writing CurrentValue.email .

I consider this as an elegant solution and I wondered why do we need a column like ‘to keep’ if we can have the same logic applied via a button?

Delete the doubles and keep the first ones

The formula for the keep formula is this one:

thisTable.Filter(email.Contains(thisRow.email)).First().Contains(thisRow)

Our first step is to get the same result as the ‘to keep’ formula with a formula in the button. With the above in mind, we rewrite this function in the button in two steps. Step one that we put all the checkboxes active via a button:

The formula is a variation on the previous one and reads as follows:

Deduplicate.FormulaMap(currentvalue.ModifyRows(Deduplicate.Checkbox,True()))

Actually, to get this result (all rows checked), you do not need this complexity. We need this logic as part of a larger concept in which we have to replace the always True() with a True() in case the email is the first in the list of all email addresses. We adapt the above formula a bit and it results in:

Deduplicate.FormulaMap(currentvalue.ModifyRows(Deduplicate.Checkbox,Deduplicate.Filter(email.Contains(CurrentValue.email)).First().Contains(CurrentValue)))

The above checks the checkboxes in the same way as via the formula behind the column ‘to keep’. However our focus is to delete duplicates and not (only) to modify a row.

The delete function

By now we understand how to construct the argument properly. We call for the Delete function and add the previously developed filter. Leaving the formula as it is means that the unchecked rows remain, however we need the inverse. That is why we put Not() at the end of the function. As such unique emails remain.

DeleteRows(Filter(Deduplicate, Deduplicate.Filter(email.Contains(CurrentValue.email)).First().Contains(CurrentValue).Not()))

Runactions()

What is left to do is to combine the ModifyRow() function with the DeleteRows(). The put the two actions together via RunActions() and the result looks as below:

And this is the function that does the job for us:

RunActions(DeleteRows(Filter(Deduplicate, Deduplicate.Filter(email.Contains(CurrentValue.email)).First().Contains(CurrentValue).Not())),Deduplicate.FormulaMap(CurrentValue.ModifyRows(email,CurrentValue.email.ToText().Trim()))
)

What it does is first filtering on doubles and keeping the unique values that came in first. Second it puts the remaining emails as text fields and it trims (if necessary) white space away.

We are done. One button on the canvas does the job!

Why this complexity?

Some readers might wonder why all the hassle of creating complex formulas outside a table to direct the data in a table while this can be easily done with extra columns and buttons in the table. The answer is twofold. First it shows how the formula logic works. How we can evaluate each item in a list without using the thisRow logic so typical for the many formulas that evaluate a list towards its own values. Second and of great importance when you deal with larger data sets and thus tables that contain maybe 10.000 rows or more, you do not want buttons in these tables for they slow down the application, but you still need their functionality to modify and delete rows. With the above you have the best of two worlds: execution of the job and speed.

I hope you enjoyed this article. If you do not follow, feel free to reach out. Though this article is for free, my work (including advice) won’t be.

My name is Christiaan Huizer and I am the owner of Huizer Automation. A company specialised in serving SME in harvesting data and keeping it aligned. I mainly rely on AirTable, Coda, Mailjet, Zapier & Paperform to get the job done. I live in Ghent and serve the BeNelux market.