Because your product catalog typology matters

Nolwenn Poirier
Akeneo Labs
Published in
9 min readJun 1, 2018

To measure the size of a catalog of products, which axis would you choose?

Lots of people out there love to talk about the number of products. But did you know that it’s not an accurate representation of your catalog size at all?

We conducted a study on this topic and the results are quite revealing! So, let’s dive into what we found. 🚀

Gathering catalog data

First, when you want to study product catalog typology, you need data. Fortunately, we managed to gather a lot of customer catalogs, and especially information regarding the volume of products, families, attributes, channels and locales. In the end, we got 106 product catalogs to analyse, in both the Community and Enterprise editions.

A few words about the catalogs we sampled

Here are some numbers that characterise the 106 product catalogs. As you can see, the split between Community and Enterprise editions users is not quite 60/40 with slightly more coming from Community users.

Almost one fifth of them are operating in the Home & Garden industry, and there is a mix of manufacturers and retailers.

The catalog volume axes

If you want to determine which axis is a better representation of your catalog volume, you’ll need to choose several axes to study. Here are the ones we chose:

  • the total number of products: obviously, we couldn’t resist. It’s the one everyone uses, especially when it comes to assessing performance. “Does your PIM work with 1,284,874 products?” I’m pretty sure you either said or heard that at least once in your life. Here is an example of a product.
A product example
  • the total number of attributes: a small definition here, for those who are not used to our PIM. What we call an attribute is the entity in which you can store an elementary piece of information about your product. In the above example, there are seven attributes: the Description, the Collection, the Brand, the Material and the Care instructions, and also the Name of the product and its Picture.
  • ‎the total number of families: a family in Akeneo PIM is used as a template for your products. It defines which attribute set describe your product best, i.e. in the case of our previous parka example, the following attribute set is: Name, Picture, Description, Collection, Brand, Material and Care instructions. Let’s call this family “Clothing”. You will need as many families as you need different attribute sets for your products.
Two different families
  • the number of channels: channels are extremely important in Akeneo PIM. It corresponds to the point of sales in which you will want to share the product information you are managing in your PIM. You can have different information for the same attribute depending on the channel, as you can see in the parka examples below.
Product information can differ depending on the channel
  • the number of activated locales: the locales help you store translated product data in several languages inside the PIM, as you can see, again, in our parka example.
Product information can be translated and stored in different languages thanks to the locales

Results

So here we are! 106 product catalogs, 5 volume axes. We crunched the numbers, and things got quite interesting.

A few diagrams

Let’s begin with our first axis, the number of products.

In the diagram below, each blue mark represents one of our sampled customers. The more the mark is on the right, the more products the catalog has.

As you can see, the catalog which has the biggest number of products in our sample holds more than 3 millions of products. However, 80% of our customers catalogs in our sample have less than 150,000 products.

We can see exactly the same phenomenon when we look at the other axes. All the diagrams looks exactly the same. Whatever the axis, there are extreme values in at least one of our customer’s catalogs.

For the number of attributes, the vast majority of the sampled catalogs are similarly distributed on the right of the diagram. 80% of the sampled catalogs have less than 500 attributes, whereas one of our customers uses more than 12,000 attributes. Quite a huge gap, isn’t it?

For the families, we can say that most of our customers don’t need too many of them to model their product catalog: 80% of the sampled catalogs need less than 125 families. But like for the other axes, we also have customers who need 10 times more families.

Same thing regarding the channels. 80% of our customers have 3 channels or less, but their number can skyrocket up to 47.

For the number of locales, it’s 5 locales or less for 80% of our customers, whereas some others need up to 40 locales.

So, what can we learn from these diagrams?

It’s telling you that, whatever the axis, there will always be 80% of our customers that have a small count. But the remaining 20% can show some extremes values, that can easily be 10 times bigger than the 80% threshold.

In other words, we at Akeneo, have to face really different volumes when it comes to these axes.

Fine. But the next question that comes into mind is: “Does the customer that has the most products, also need the most attributes and/or families?”

Can we say that the more products there are in your catalog, the more attributes you will need? Or the more attributes you need, the more families you will have?

In other words, are there any links between all the different axes we analyse above? Can we define typologies of product catalogs?

Trying to define product catalog typologies

The objective is to find correlations between the axes, for example, between the number of products and the number of families, or between the number of products and the number of attributes.

The result is really interesting. It tells us that there is no significant correlation between all the axes, whatever the combination you choose.

Let’s take a look at an example to illustrate this fact.

Below, you have a graph of the number of products and the number of attributes by customers.

As you can see, there does not seem to be a correlation between the number of products and attributes. One of our customers has more than 1,400,000 products and 12,915 attributes, so we could think that the two magnitudes may be related. But let’s just take a look at another customer: he has half the number of products but he only has 169 attributes. So no correlation at all. This fact was confirmed by some advanced statistical methods that we will not mention here, because we want you to finish reading this article. 😉

This trend can be visually confirmed if we zoom in on our previous graph, on the values that are gathered in the bottom-left corner.

You can see that there is no kind of relationship between the two magnitudes: each point seems to be randomly distributed across the graph. Total chaos, isn’t it?

So yeah! The important finding here, is that it seems that we cannot define typologies of product catalog.

Your product catalog is totally unique. So is its typology and volume.

It’s not because you have 5,000,000 products that you have the biggest catalog. Another customer could have 1,000,000 products and still have a bigger catalog than yours, especially because he may have 10,000 attributes, whereas you only need 200 to describe your products.

Talking only about your number of products is not enough. It’s quite simplistic and when you do that, you are overlooking all the others axes that can have a pretty big impact on your catalog volume.

Introducing a new axis

When studying all these axes, we realised that there is a simple notion, that already exists for quite some time, and that takes into account all the volume axes mentioned above.

It’s the number of product values.

But what is a product value?

Your PIM is a storage, a container, right? And what do you store inside your PIM? Product information. So simple. Product value is just another wording for the “product information” notion. It’s the P and I that compose the PIM. 🙂

To make it very simple, let’s take an example. In the Akeneo PIM screenshot below, there are several product values: the value of the name, the picture, the brand, the collection and the description, 5 product values in total.

As you can see, the description in this example is “scopable” and “localisable”, meaning that it has a different value for each channel and each activated locale. So, in fact, there are not really 5 product values. There are more, because the description does not store one single value, but n*m values, n being the number of channels and m being the number of activated locales.

Knowing this, you can easily estimate the number of product values that are stored inside your PIM. As we are very nice people, here at Akeneo, we give you the “magic” formula:

To get the estimation for your whole catalog, you’ll have to make the calculation for each one of your products.

Let’s compute it for a fictive catalog, which has:

  • 300,000 products
  • 500 attributes and an average of 100 attributes by family
  • 3 channels and an average of 13 scopable attributes by family
  • 5 activated locales and an average of 25 localisable attributes by family
  • an average of 13 both scopable and localisable attributes by family
  • an average of 49 neither scopable and neither localisable attributes by family

So here we go, a lot of numbers:

( 49 + 5 x 25 + 3 x 13 + 5 x 3 x 13 ) x 300,000 =
122,400,000 product values

It’s potentially more than 122 million values stored in your PIM. We say “potentially” because some of the product information may be incomplete. 😉

As you can see, the formula involves all the previous axes we talked about earlier: products, attributes, families, channels and locales. We acknowledge that it is not totally perfect as we don’t take into account other axes that could be very important, such as the categories, the associations, the product models… But still, it’s a first step toward a good estimation of your catalog volume.

Time to conclude

Well, we hope that you now have a clearer view of how you can measure your product catalog size. It’s not only about your number of products. It’s much more complex than that, there are a lot of axes to take into account.

In this article, we introduced the number of product values as a proper way to measure your catalog size and when you think about it, it’s quite logical.

You buy a PIM to centralize and store product information. Each single piece of information that is inside your PIM can help you reduce your time to market. In other terms, we can say that each one of your product values is valuable and contributes to the ROI of the PIM. The bigger your catalog is, the more product values you will have, the higher your ROI might be. 🙂

--

--

Nolwenn Poirier
Akeneo Labs

Product Owner @Akeneo, enhancing the DX for our Akeneo community