Single Product Storage?

Nicolas Dupont
May 23, 2017 · 4 min read
Image for post
Image for post

In our previous post of this series, we discussed the evolution of our product storage, its current state and limitations. We talk here about our R&D approach to define what .

Using the version 1.7, a customer’s project may be configured to use:

  • MySQL only: with EAV approach for product data and flat tables for other entities
  • MySQL + MongoDB: with product documents in MongoDB and flat tables for other entities in MySQL
  • MySQL + MongoDB + ElasticSearch: with product documents in MongoDB, product index for querying in ElasticSearch and flat tables for other entities in MySQL

In late 2016, we took a break, a deep breath and we wondered if we could .

A unique storage that may be used by both Community and Enterprise Editions through a native packaged implementation.

This storage should obviously support our existing constraints and customers’ use cases. More importantly, using our acquired experience and better understanding of our market and its evolution, we tried to figure out what would be the expectations in 5 years. The idea is to prepare for the future.

We started a first study listing the eligible technical stacks (ie: set of tools) allowing us to address our expectations. This list also includes our current storage implementations, like MySQL + MongoDB + ElasticSearch stack.

We defined a list of to evaluate these possibilities:

  • : opensource, adoptable by the community, by the customers, by the partners, by the team, ease of installation, ease of configuration, availability on main OS distribution, required dependencies
  • : maturity, documentation, implementation for PHP, flexible storage capabilities
  • : full stack simplicity, implementation simplicity
  • : build cost, total cost of ownership
  • : ability to store a high number of products, reliability on products storage (no data loss), ability to query a high number of products

Few criterion directly eliminated a solution, for instance, if a tool being part of the storage stack is not open source.

Other criterion were rated from 1 to 5 by the team. Each group of criterion had a defined weight from 1 to 5. For instance, adoption group being for us more important than cost group.

This first study outcome was a complete matrix of eligible solutions.

Our current MySQL EAV implementation getting a global rating of 3.65 / 5. Being very good regarding adoption or tech stack simplicity. On other hand, being limited regarding querying performance and amount of supported products.

Our current MySQL / MongoDB / Elastic Search implementation getting a global rating of 3.94 / 5 for almost opposite reasons.

Interesting fact, with their now and related features, traditional RDBMS like .

When we did our first EAV implementation in 2013, store the product data as a JSON field was not eligible due to the missing ability to query these data, which is doable since MySQL 5.7+ and PostgreSQL 9.3+.

With this JSON field approach, MySQL and PostgreSQL are respectively getting a rating of 4.47 / 5 and 4.37 / 5. The difference mainly coming from our adoption criteria, PostgreSQL still being (sadly) less popular and used in PHP ecosystem.

An alternative stack appears very attractive, to leverage the querying possibilities. This stack gets a rating of 4.32 / 5, almost the same than MySQL or PostgreSQL only. However, this option is more Pretty easily adoptable, quite simple and very performant, scalable and future proof.

We studied around 10 other options that we’ll not detail here, all being not as good as these or very unbalanced.

We started to work on a POC on the .

Our mission was to benchmark the storage stack in itself, without taking the application layer into account, to validate the following assumptions:

  • we are able to implement read / write queries we currently need in our application to address business needs
  • we can enable future known features, there is no technological lock, we can unleash innovation with this stack
  • we can handle future growing expectations regarding read / write performances and regarding the product data scalability

The first part of this POC was to design a relevant schema for both MySQL tables and ElasticSearch indexes. This schema being challenged through various aspects to guarantee support for our current and future expectations regarding read and write accesses.

Another significant topic was to improve our internal tooling and create new tools. Indeed, these tools allowed us to insert very large volume of data and to benchmark the storage. Its performance has been measured using a matrix of data queries, progressively increasing the data volume and playing with several data axes. These axes aim to represent different catalog typologies, for instance, to push the limits on the amount of structured data, or to design a set of very heterogeneous products.

This realization finally for the storage layer.

Even if this storage is eligible when used directly in a standalone mode, what about our current application using it?

What would be the limitations of our application? Where would be the bottlenecks? What would be the build cost of this implementation? What would be the impacts of such a change for our product and ecosystem?

We’ll answer these open questions in an upcoming post. Stay tuned! 📻

Akeneo Labs

Stories about Akeneo product & engineering experiments

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store