How to avoid mistakes while calculating the GMV of an international marketplace?

Yurgen Bashkatov
Lalafo
Published in
3 min readSep 2, 2017

When working with a marketplace, one will need to calculate the Gross Merchandise Volume (GMV). It seems simple at first: all you need is to gather the prices of all active ads in one column and click SUMM. However, if your marketplace is operating in different countries, things may not be quite so easy.

Why is it hard to calculate GMV?

I work at Lalafo — an AI-powered peer-to-peer marketplace. The platform has 3 million active users from 4 countries each month. At the end of 2016, we needed to calculate the total monetary value of all ads placed on Lalafo for that year.

First, we summed up all the prices suggested by users excluding all non-placed and blocked ads. This resulted in a GMV that was larger than the GDP of a country we used for the evaluation!

We cleaned the data up and converted the currency in real time. This made the total smaller, but it still was unreasonably large.

The most popular way to cut out anomalies in a messy dataset is using an interquartile range (data range between 25th and 75th percentile). This approach did not solve the problem either, as the volume of user prices became very low after interquartile range was implemented.

Pay attention to percentiles

A quick brainstorming session gave us an idea that the problem was in percentile values: the difference between zero and the first was too dramatic.

A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations fall. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found. Percentiles may act as linear or nonlinear depending on if there are any errors in the dataset. (Wikipedia)

We decided to review the way all percentiles between 1 and 100 acted. We found that all data between Percentile 1 and 99 was linear, while between 99 and 100 we found a big surge indicating that the error occurs within this section. We removed all the data between the 99th and 100th percentile which resolved the problem.

The importance of product category

The Lalafo marketplace contains various product categories. The most “expensive” categories are real estate and vehicles. To get a more precise GMV, we decided to account for product categories while calculating percentiles. For example, vehicles and real estate percentiles are dozens of times larger than those of clothing items.

As a result we:

- determined the percentile value for each category.

- used percentiles only in cases of non-linear GMV.

This gave us a precise GMV.

Another important thing — currency

Users often forget to set the currency while placing their ads. This results in iPhone 7S being sold for just $20 or stoves costing more than the entire GDP of Colombia!

To adjust the currency while keeping the results of editing the dataset for the 99th percentile we decided to do the following:

- set a common price for each category that equals the median (Percentile 50) of the category.

- look through the dataset and adjust the currency, should the value be too low or too high.

Calculating GMV. General advice:

- common methods (median, interquartile range) work only under a linear GMV. If they don’t work, double-check the values of all percentiles.

- calculate percentiles separately for each category.

- filter your data before calculating.

- check the currency values.

--

--