Vestiaire Connected - Medium

Optimizing E2E Test Suites for Reliable Monolith Deployments at Vestiaire Collective

Clement Sehan — Tue, 21 Jan 2025 16:29:23 GMT

Introduction

In the fast-paced world of fashion and tech, ensuring the reliability and smooth operation of critical applications is paramount. At Vestiaire Collective, our monolith application underpins essential features such as Login, Listing Form, and Checkout, which are vital to our business operations. With numerous engineers contributing to this repository, deploying changes can pose risks to the reliability of the entire system. To mitigate these risks and accelerate the deployment of smaller amounts of changes, we embarked on a journey to optimize our end-to-end (E2E) test suites.

The Initial Shift to Cypress

Our first step in this optimization journey was migrating our E2E tests to Cypress. Previously, we were using another tool that lacked integration with GitLab for running CI/CD test suites. This limitation created a dependency on our QA team to manually trigger the test suites and provide the go-ahead for deployments. As a result, our processes often faced delays due to timezone differences, with multiple teams located across different parts of the world. This move to Cypress was motivated by the need to integrate tests seamlessly into our continuous integration (CI) pipeline, distribute ownership of test suites beyond the QA team to all engineers, and ultimately shift towards a CI/CD (continuous integration and continuous delivery) shift-left development cycle. Cypress offered us robust testing capabilities, easy CI pipeline integration, and a shared responsibility model. However, this migration unveiled new challenges.

Challenges: Execution Time and False Positives

First of all, the initial challenge in migrating to Cypress was the fact that our QA team was not familiar with Cypress and JavaScript. This required efforts in training, establishing best practices and reviewing merge requests. However, our dynamic QA team at VC quickly adapted to these changes, demonstrating remarkable dedication and agility. In just 6–9 months, they successfully migrated over 700 test cases from the old solution to Cypress, showcasing their impressive ability to embrace and implement new technologies effectively.

Then, upon migrating to Cypress, we encountered two significant issues: extended execution times and numerous false-positive test failures due to test flakiness. The execution time for running regression tests before deployment expanded to around 45 minutes, which was unsustainable. Moreover, the flakiness of tests — leading to false positives — raised doubt about the reliability of our CI pipeline. Interestingly, we found that these two issues were intricately linked.

With an average of 11 regression runs per week, reducing execution time from 45mn to 20mn led us to the fantastic result of 4,5 hours saved per week on monolith deployment process.

Addressing Execution Time Through Dynamic Waitings

During our migration to Cypress, the primary focus was on decommissioning the old solution as quickly as possible, so code optimization wasn’t a priority at that time. As a result, one of the main reasons for our prolonged test execution times was the excessive use of static cy.wait() commands within our test code. A simple regex search in the codebase — cy\.wait$\d+$ — revealed over 600 occurrences, leading to more than 2 million milliseconds (equivalent to 34 minutes) of unnecessary waiting time. This resulted in over 50% of our test execution time being spent on static waits. We identified two key strategies to address this:

1. Replacing Static Waits with Dynamic Assertions: By replacing cy.wait() with assertions such as should(‘be.visible’), we could leverage Cypress’s built-in waiting mechanisms. Cypress inherently waits for elements to be in a ready state before continuing. This change helped in making our waiting time dynamic, effectively reducing waste and minimizing test flakiness.

2. Using API Synchronization: In many scenarios, the waiting time involves waiting for specific API call responses. In those cases, we implemented the usage of cy.intercept(). This allowed us to synchronize our tests with API call completions, ensuring the application state was ready before proceeding with subsequent test steps.

Before:

myAppPage.submitAction.click();
cy.wait(3000);
myAppPage.confirmationButton.click();

After:

cy.intercept('POST', '**/endpoint/*/action*').as('actionAlias');
myAppPage.submitAction.click();
cy.wait('@actionAlias').then(response => {
if (response.response.statusCode === 201) {
myAppPage.confirmationButton.click();
} else {
cy.fail(`Action failed, status code: ${response.response.statusCode}`);
      }
});

In this example, we are replacing a 3-second static wait with a dynamic wait for a specific API response. This way, if the API takes less than 3 seconds to respond (which is very likely) the test will continue and will be faster. But if the API takes more than 3 seconds, it will still wait for the API (until the default timeout is set in Cypress configuration), preventing potential flakiness.

You might have noticed in this case there is no usage of an assertion such as .should(‘be.visible’). It is not necessary because .click() is an “action command” in Cypress, which automatically checks the current state of the DOM and takes steps to ensure the element is “ready” for the action. For more details, refer to the Cypress documentation on interacting with elements.

These improvements helped make our tests more resilient and efficient by adjusting waiting times dynamically, thus addressing flakiness caused by variations in application loading times.

It’s also important to highlight the outstanding work of the Vestiaire Collective frontend engineering team, who successfully migrated the entire web application to React. This transition significantly enhanced the application’s overall performance and stability, showcasing their technical expertise and dedication to continuous improvement.

Reducing Flakiness and Enhancing Test Efficiency with APIs

To further optimize our test suites, we focused on reducing unnecessary UI interactions. UI interactions are inherently slower and more prone to flakiness. For example, to test the checkout feature, we previously needed to perform several UI interactions — log in, access a product page, add an item to the cart — before reaching the checkout. While these interactions are critical for dedicated testing of individual features, for the checkout test itself, we could achieve the same state using API calls.

Example of Cypress custom command to handle add to cart by API:

Cypress.Commands.add('addToCart', (itemIds, apiUrl, headersKey) => {
  cy.getCookie(headersKey).then(cookieValue => {
    const headers = {
      Authorization: `Bearer ${cookieValue.value}`
    };

    itemIds.forEach(itemId => {
      cy.request({
        url: `${apiUrl}/entities/current/items`,
        method: 'POST',
        headers,
        body: {
          itemId: itemId,
        },
        failOnStatusCode: false,
      }).then(response => {
        expect(response.status).to.eq(201);
      });
    });
  });
});

If we apply the same approach for the other steps, then your test case can look like this:

it('checks payment method availability in checkout', () => {
  cy.login(email, password);
  cy.setCookie(cookieValues);
  cy.cleanCart().addToCart([productId]);
  cy.visit('/checkout');
[...]
});

In this example, you might have noticed that the test is cleaning the cart with cy.cleanCart() before performing the addToCart([productId]) command. By cleaning the cart at the start, we ensure that each test begins with a clean slate, free from residual data that might distort results. This strategy aligns with a best practice in testing known as “state reset”, which emphasizes the importance of initializing or resetting data before executing test operations rather than cleaning up after the test.

Starting with a consistent and controlled state not only prevents unwanted test dependencies but also enhances test reliability by eliminating side effects caused by leftover data. Consequently, this ensures that our tests are both repeatable and dependable, providing accurate insights into the application’s behaviour.

To sum up, by performing API calls to log in and add items to the cart, we were able to bypass several layers of UI interactions and directly access the checkout page using cy.visit(). By applying this API-driven approach to several complex pathways within our product, particularly those occurring post-purchase, we reduced the execution time of some tests by over 60% and significantly diminished test flakiness.

In May 2024, spec files’ median duration was 38 seconds and the slowest spec file had a median duration above 13 minutes.

In Nov 2024, spec files’ median duration was 13 seconds and the slowest spec file had a median duration of 4 minutes. In 6 months we’ve been able to decrease the spec file median duration by 66% and the slowest spec file duration dropped by 70%.

Conclusion

Through strategic optimization efforts, we have succeeded in bolstering the reliability and efficiency of our E2E test suites at Vestiaire Collective. By migrating to Cypress and addressing key pain points such as execution time and flakiness, we have reinforced trust in our deployment pipeline. Our focus on dynamic waiting mechanisms and API integration has not only streamlined the testing process but also paved the way for faster and more reliable deployments, ensuring that our monolith application continues to serve our business and customers effectively.

Optimizing E2E Test Suites for Reliable Monolith Deployments at Vestiaire Collective was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

Speed Up Image Background Removal Service With FastAPI and Triton Inference Server

Tu Ta Quang — Tue, 10 Dec 2024 10:44:52 GMT

VESTIAIRE COLLECTIVE

Serving AI Models for Concurrent Requests at Scale

Introduction

At Vestiaire Collective, we receive tens of thousands of product listings daily. To ensure a uniform and professional look highlighting products for buyers, we need to remove the background from each image.

Until now, this work has been done automatically by BARE, our in-house image background removal service.

Sneakers after being processed by BARE

We have been running the service on Vestiaire Collective’s Kubernetes clusters without GPU support. The core model behind BARE is a U2Net deep learning model trained from scratch on Vestiaire Collective’s product images. (For details of the model implementation, see this post).

The Rise of Problems

We focus on clipping the main image of each product. In total, we process around 25,000 images daily. To meet this demand with no GPU support, the first version of the BARE service scaled horizontally to at least 15 instances, each requiring a minimum of 2 CPUs and 8GB RAM and up to 4 CPUs and 16GB RAM. Despite this setup, resource requirements for what seems like a “simple” service were still high. The average response time of 2.5 to 3 seconds was acceptable for batch (offline) inferencing.

The number of requests increased 3 times from July 2024

However, demand increased when Vestiaire Collective’s internal pricing recommendation service began using BARE as a preprocessing step with stronger speed requirements. They used BARE to remove the background from seller-uploaded images at listing, in order to improve the quality of the product similarity search powering their pricing recommendations. BARE requirements rose to approximately 75,000 images per day (~3x increase).

Additionally, as the number of requests grew, we identified a memory leak in the service. Memory usage steadily increased over time, driving up infrastructure costs and negatively impacting the overall performance.

At this point, BARE’s processing time became a significant bottleneck. This prompted us to refactor the service to improve speed, enable real-time use cases, handle more concurrent clients, resolve the memory leak, and reduce reliance on horizontal scaling.

Review of the Initial Deployment

We reviewed BARE’s initial implementation to identify improvement areas. Here’s a recap of the first version’s setup:

FastAPI: Used to expose the service via HTTP, with a single async endpoint for image background removal.

Endpoint Tasks:

Pre-processing: Resizing images with numpy (CPU-bound).
Model Inferencing: Predicting the foreground mask (CPU-bound).
Post-processing: Resizing the mask to match the original image, clipping, and returning the masked image (CPU-bound).

Model: Served in ONNX format, using the OpenVINO backend, with no GPU support.

In code, the endpoint could be simplified as follows:

import asyncio
import io
from fastapi import APIRouter, Depends, Response
from fastapi.responses import FileResponse
import app_models
from utils.request import read_image
from model.openvino_inference import BareInferenceOpenVINO

router = APIRouter()
# loading u2net
bare_fast = BareInferenceOpenVINO()

@router.post("/clipping", response_class=FileResponse)
async def clipping(
    payload: app_models.PayloadInputU2NetFormData = Depends(
        app_models.PayloadInputU2NetFormData.as_form
    ),
):
    img_pil = read_image(payload.file)
    # preprocessing
    img_numpy = await asyncio.to_thread(bare_fast.preprocess, img_pil)
    # model inferencing
    pred = await asyncio.to_thread(bare_fast.predict, img_numpy)
    # postprocessing
    clipped_image = await asyncio.to_thread(
        bare_fast.post_process, pred=pred, img_pil=img_pil
    )
    # saving the image to a buffer, return it as a response
    buffer = io.BytesIO()
    clipped_image.save(
        buffer, format="JPEG", icc_profile=img_pil.info.get("icc_profile")
    )
    buffer.seek(0)
    return Response(content=buffer.getvalue(), media_type="image/jpg")

We identified two primary issues with this design:

1. Spawning Python Threads for CPU-Bound Tasks

Each request dispatches subtasks to different threads from a pool of 40 (starlette’s default, on which FastAPI is based) as invested in this issue. When concurrent requests exceed available threads, some requests must wait, creating an I/O bottleneck that occupies more RAM at high loads. While we could increase the thread pool, Python threading has limitations.

Why isn’t Python threading efficient?

Historically, Python threads don’t run in true parallel due to the Global Interpreter Lock (GIL), which prevents race conditions but limits parallelism for CPU-bound tasks. For more information on the GIL, see What Is the Python Global Interpreter Lock (GIL).

Our endpoint tasks use numpy and OpenVINO, which are implemented in C/C++ and bypass the GIL. However, Python threading still doesn’t achieve true parallelism, highlighting an improvement area in our service.

2. Violation of Separation of Concerns

While FastAPI is excellent for API development, it’s not tailored for deep learning model serving. Our approach of serving the model directly in FastAPI complicates horizontal scaling.

For example, imagine a scenario where a service requires 1GB of RAM (primarily for background tasks like logging) and 4 CPUs (mainly for deep learning model inferencing). To ensure availability, we deploy two replicas. If a high load pushes a replica to 80% RAM usage, another replica is created, even if CPU usage is only at 50%. This results in inefficiency, as we now have three replicas, each using 4 CPUs, but primarily to handle memory demands.

If we had separated concerns from the start — offloading the model serving to a dedicated platform — we could scale the hardware for background tasks independently, keeping the hardware requirements for model serving constant.

The Refactor

With these issues in mind, we opted to implement a dedicated model-serving server. This offloads the CPU-intensive model inferencing from FastAPI, restores FastAPI’s async benefits, and simplifies horizontal scaling to serve more clients efficiently.

On AWS, CPU instances are generally much cheaper than GPU instances. However, we added GPU support for model inferencing with the idea that a single expensive GPU can deliver more efficient performance than multiple inexpensive CPUs at the same total cost.

Why Triton Inference Server?

We chose NVIDIA’s Triton Inference Server for the model serving part due to several advantages:

Queuing and Batching: Triton efficiently queues and batches requests, boosting throughput, especially under heavy load.
Multi-framework Support: Supports diverse frameworks (e.g., PyTorch, TensorFlow, ONNX), adding flexibility.
Scalability and Load Management: Dedicated platforms like Triton simplify horizontal scaling and provide load balancing.
Performance Optimization: Triton, optimized for NVIDIA GPUs, accelerates inference for intensive models like U2Net.

For setup details, please take a look at NVIDIA documentation. Triton requires a config file defining parameters like batching time and input/output formats. For BARE, the config looks like this:

platform: "tensorrt_plan"
max_batch_size: 4
instance_group [
    {
      count: 1
      kind: KIND_GPU
      gpus: [0]
    }
  ]
dynamic_batching {
    max_queue_delay_microseconds: 1000
}
input [
{
  name: "input"
  data_type: TYPE_FP32
  dims: [ 3, 320, 320 ]
}
]
output [
{
  name: "output"
  data_type: TYPE_FP32
  dims: [ -1, -1, -1 ]
}
]

We used a single GPU and one instance of the model, as this setup was sufficient for our needs. If we increase the number of model instances, Triton will spawn processes to serve them when using Python backend or threads when using TensorRT backend (our case).

We set “max_queue_delay_microseconds: 1000”, meaning that requests arriving within the same millisecond are grouped into a batch for inference. The model was converted to TensorRT for deployment using the TensorRT engine (platform: “tensorrt_plan”) to maximize the performance of the NVIDIA GPU.

In FastAPI, only minimal modifications were required. The key change was introducing an asynchronous call to the Triton server via a client library:

# import libraries

triton_client = new_client(
        config.TRITON_URI,
        config.TRITON_SSL,
        verbose=False,
        use_grpc=config.TRITON_USE_GRPC,
    )
@router.post("/clipping", response_class=FileResponse)
async def clipping(
    payload: app_models.PayloadInputU2NetFormData = Depends(
        app_models.PayloadInputU2NetFormData.as_form
    ),
):
    # preprocessing are kept as before
    # model inferencing
    triton_inputs = triton_client.prepare_triton_input_img(img_numpy)
    pred = await triton_client.predict_mask(
        triton_inputs, model_name=config.TRITON_BARE_MODEL
    )
    # postprocessing are kept as before

Questions Raised in the Refactor

At this point, you may be wondering why the preprocessing and postprocessing steps are still handled in FastAPI, given that they are CPU-bound tasks and could potentially cause blocking.

We kept the processing (resize) part in FastAPI to ensure all calls to Triton use the same input shape, thereby leveraging batching. Consequently, the postprocessing also needs to remain in the FastAPI server.

If you’re a fan of Triton, another question might come to mind: Why not move all processing to Triton, since it supports Python backend? For us, doing so would essentially turn Triton into another version of the FastAPI server, but with queuing support. Additionally, we would have to transfer the input image in its original size (on average 1.5 MB) from FastAPI to Triton, which is time-consuming.

Results

Improvement in Latency

After the refactor, our service became significantly leaner and approximately 15x faster than before, with the capacity to handle more requests. To highlight the difference clearly, the figure below shows the latency data logged by Grafana from July 1, 2024, to the end of August 2024.

Service latency before and after the refactor

We deployed the first version of the refactor on July 21. This version introduced a new API endpoint with the improvements while keeping the old endpoint to give clients time to switch completely. After the initial deployment, clients were still using the old API endpoint with the model being hosted directly on the FastAPI server, so although there was a noticeable drop in latency, it wasn’t very significant.

The steadily increasing latency trend during this period was attributed to the previously mentioned memory leak. On August 5, all clients switched to the new endpoint. At that point, latency dropped significantly and stabilized consistently.

Reducing the Cost of Infrastructure

We were also able to divide the number of replicas, memory, and CPU usage by three with the addition of 2 GPUs (actually the service needs only 1 GPU, 2 is to ensure high availability during Kubernetes Pod eviction scheduled by the platform team).

CPU & RAM cost for BARE before refactoring

The average daily cost for operating the old version of BARE is $78 ($56 for CPU (in green) and $22 for RAM (in blue)).

CPU & RAM & GPU cost for BARE after refactoring

After the refactor, the daily cost is $60 ($14 for CPU, $11 for RAM, and $35 for GPU (in yellow)), corresponding to a reduction of 23% compared to the previous version.

Memory Leak Mitigation

Another interesting point we discovered is that thanks to moving the model inference task to Triton, the memory leak no longer occurs in our FastAPI server. This leads us to suspect that the memory leak was caused by OpenVINO and ONNX.

Memory leak before and after the refactor

In the memory usage chart, before switching all requests to the new endpoint, the replicas’ memory usage was increasing over time. Once the limit (16GB) was reached, Kubernetes redeployed the replicas, thus clearing the memory as shown by the drops on the chart. After switching to the new endpoint, memory usage still dropped, but not because the limit was reached — rather, it was due to the Kubernetes Pod eviction.

Recap

Here’s a concise recap comparing the previous and new implementations of BARE, highlighting the issues in the initial setup and the improvements achieved through the refactor:

Conclusion

This refactor allowed us to handle higher traffic and resource demands, serving multiple concurrent clients with 15x faster inference speed, increasing overall efficiency while reducing the operation cost by 23%.

Speed Up Image Background Removal Service With FastAPI and Triton Inference Server was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to create your own AI Chat Moderation model

Aurélien Houdbert — Tue, 02 Jul 2024 13:17:08 GMT

Vestiaire Collective

Lessons learned from building a chat message classifier internally

1. Introduction — Why monitoring and moderating your platform’s chat is crucial

On Vestiaire Collective, buyers and sellers can discuss through a chat interface to get more information about products, negotiate prices, clarify item conditions, and agree on sales. This direct communication enables more personal and efficient transactions.

However, this chat feature is also an open door for scammers or users trying to avoid platform fees. Despite our platform’s security and protection measures, some users still try to exchange private information to finalize deals in person or on less secure platforms.

Scammers negatively impact user experience, resulting in decreased trust in our platform.
Circumvention directly affects GMV (revenue) and Cost Per Order by shifting transactions outside the platform.
Toxic behavior significantly degrades user experience and engagement.

Challenges

Automatically blocking messages and banning users from the chat raises several challenges:

Our AI model must achieve a balance of recall and precision, capturing a sufficient proportion of unwanted messages while accurately identifying when a message should be flagged. Incorrectly banning users can lead to poor user experience and decreased engagement.
Scam messages and circumvention attempts require different banning processes. Scammers should be identified quickly and permanently banned, while legitimate users attempting to trade outside our platform should undergo an educational process with progressive banning measures.

2. Chat message classification — A short review of available solutions

There are various approaches you can use to classify text.

Regex

Regex, or pattern matching, is often the easiest way to classify text. It involves defining a set of prohibited words/patterns and creating regex rules around them. However, developing a comprehensive list of patterns can be time-consuming and is not robust enough to counteract sophisticated scammers. Regex also lacks semantic understanding, leading to misinterpretations.

For example, if your regex rules include the word “Instagram,” you will not be able to differentiate between:

Circumvention attempt: “Do you have Instagram ?”
Legit information: “I bought it 2 years ago from someone I met on Instagram.”

In our case, the first message intends to move the discussion to Instagram, while the second message only provides information on the item’s origin.

Classic Machine Learning

Machine Learning is a powerful tool for text classification. Most traditional ML techniques utilize word counts and co-occurrence methods, such as TF-IDF (Term Frequency-Inverse Document Frequency). Common algorithms include Naive Bayes, Support Vector Machines (SVM), and Logistic Regression. In many cases, traditional machine learning can achieve performance comparable to larger deep learning models, especially when datasets are not very large or complex.

However, they face three main challenges in the context of chat message moderation:

Scammers’ pattern adaptability: Scammers can quickly adapt their language and strategies, rendering your model’s vocabulary and features obsolete.
Multilingual setting: In a multilingual environment, the vocabulary can grow rapidly, resulting in very large embeddings. This can lead to increased computational resources and complexities in model management.
Broader context comprehension: The vocabulary size can significantly increase when n-grams are used to include more “word groups.”

Deep Learning LLMs

Large Language Models (LLMs) such as BERT excel in text classification tasks due to their transformer-based architecture, which captures text semantics and nuances.

These models are relatively easy to use and fine-tune using the transformers library from HuggingFace. HuggingFace provides pre-trained models and user-friendly tools to customize them for your application.

However, there are several considerations to keep in mind when using deep learning models:

Computational Resources: Training, fine-tuning, and deploying BERT or other deep learning models can be resource-intensive.
Data Requirements: Deep learning models often require large amounts of labeled training data to achieve optimal performance. Acquiring and labeling this data can be time-consuming and expensive.
Interpretability: Deep learning models, especially those based on transformers, do not easily provide insight into which features are used to make decisions, which can be an issue in applications requiring high levels of transparency.

Fine-tuning pre-trained models can help tackle the two first points. Indeed, you will need much less data and much less computing resources. Using the HuggingFace model repository, you can find open models pre-trained on general tasks or tasks similar to yours.

3. Data collection — How we built a dataset out of poorly labeled data

Vestiaire Collective historically used a regex moderation system to identify suspicious messages, which were then manually reviewed by human annotators. This process resulted in a dataset of manually verified messages, providing an excellent source of information for the model to understand semantic nuances in messages.

But there were still two main issues:

This data only represents messages flagged by regex, leaving gaps in unflagged patterns.
Human labelers are not 100% accurate.

Heuristics

To tackle the first issue, we came up with various heuristics to enhance our dataset with safe circumvention and scam messages from our dataset of chat messages sent on the app.

For instance, one heuristic we employ to identify scam messages involves analyzing the number of line breaks, messages in the channel, and account age in days.

We have implemented similar heuristics for phone numbers and safe message identification, among others.

Message relabeling

Because human labelers are not 100% accurate at classifying messages, they create inaccuracies that result in less stable training and lower performance.

To improve labeling accuracy, we tested self-training and clustering/majority voting techniques.

LLM relabeling

When exploring data relabeling solutions, we also tried to use the latest LLMs to label our dataset. Generative AI LLMs have strong semantic understanding capabilities. With a little prompt engineering, it is possible to describe our moderation rules and chat guidelines to the model.

In our experiments, we used private models such as ChatGPT (3.5 and above) from OpenAI and Claude (version 1 and above) from Anthropic, in addition to trying open-source models such as Mistral (Mixtral-8x7b, Mistral-7b) and Llama 2.

These models have strong semantic understanding capabilities but fail to understand the intent behind a single message. To reuse the same example — “Do you have Instagram?” — most of these models fail to understand the real underlying intention, which is to move the discussion outside the platform.

To improve performance, we tried various prompting strategies:

Few shot example prompting: providing a few examples of correct classification with expected output format.
Reasoning strategy: provide a reasoning framework to force the model to explain the message content and interpret the intent before providing a definitive label.

These methods improved the raw performance of the model but weren’t good enough to relabel our entire dataset.

We also fine-tuned small open source models (7B and 13B versions of Llama 2 and Mistral) on a small sample of curated messages and labels. With fine-tuning, we tried to teach the model our moderation guidelines and some reasoning strategies. It worked and the model learned our rules and reasoning strategy, but it still could not understand underlying intents.

Even though the results are not satisfying yet, this work is still ongoing.

Contextualization of messages

The first versions of our model were performing inference at the message level. This strategy works well but sometimes lacks the context of previous messages. For example, the message “33” could be an answer to a user inquiring about the size of the item, but it could also be the first part of a phone number sent over multiple messages (+33 is the French country code).

🙍‍♀️: “Hey, what is the size?”
🙎‍♂️: “33”

🙎‍♂️: “Here is my”
🙎‍♂️: “number”
🙎‍♂️: “33”
🙎‍♂️: “06~”
🙎‍♂️: “ 82”
…

Such a model can work on messages concatenated with a few past messages, but the performance is not great because messages with their context are a lot longer than single messages. The first version of the model had not been trained on such message lengths and often got lost with too much information in large context messages.

Re-building labeled messages within entire conversations

To achieve great performance for both individual messages and messages within their context (past messages), we needed to rethink our dataset. We redefined our heuristics to identify conversations where all messages are safe.

For conversations with unsafe messages, we ideally need perfect labels for all messages in the context. This allows us to create data batches with varying context lengths, helping the model understand what makes a conversation unsafe.

However, as the dataset size increases, this requirement becomes less critical since larger datasets naturally capture more nuances and variations.

Data processing

We mentioned that using LLM models and tokenizers from HuggingFace requires very few pre-processing steps.

However, some special characters might not be handled correctly by the tokenizer and be attributed an [UNK] unknown token (a default token for elements/words/subwords not available in the tokenizer’s vocabulary). This is typically the case for emojis 😀. If your pre-processing doesn’t handle emojis correctly, scammers might be able to communicate information through emojis.

“0️⃣6️⃣4️⃣2️⃣…” and “🔥🔥🔥🔥…” will both be tokenized as “[UNK] [UNK] [UNK] [UNK]…” making it extremely difficult to correctly predict the message label.

But if you convert emojis to text before tokenization, you will end up with a far better emoji representation in your model.

“0️⃣6️⃣4️⃣2️⃣…” is converted to “:zero: :six: :four: :two: …” which will give a precise tokenized sentence.

As part of the tokenizer choice, you can also experiment with the cased (sensitive to case) or uncased (all characters are lowered, accents are removed, etc.) versions. If your use case involves only English text classification you might want to head towards an uncased tokenizer, whereas in a multilingual setup, a cased tokenizer is better suited to keep all accents and specificities of languages.

4. BERT — Efficient text classification using Transformers Architecture

BERT

For text classification tasks, you can find many architectures and pre-training open source. We chose a pre-trained BERT mostly because it was trained on multilingual data (> 100 languages) and had different tokenizers available.

The BERT model we use is a relatively small model of 179 million parameters which requires only 700 MB to fit into memory. Although you need only one small GPU for efficient fine-tuning, this model can be deployed on a CPU and still guarantee a short response time. In our case, the 95th percentile response time is below 100ms. In comparison, a 70 billion parameter LLM (such as Llama 3 or Mistral 70b versions) would require 260 GB to fit into memory.

The very first version of our model was a binary classifier. The poor initial data quality led to merging circumvention and scam labels into one unique category.

Performance was great at this point, but we needed to differentiate between a scam message and a circumvention attempt.

BERT multi-class

Distinguishing between scam messages and circumvention attempts enables targeted blocking and banning procedures. Ideally, soft bans can be implemented for legitimate users unaware of chat guidelines, while scammers should be subject to a more stringent hard ban procedure.

Therefore, the next versions of our classifier were trained on multi-class data and refined using the methods detailed in previous sections.

We observed a slight performance decrease when distinguishing between circumvention and scam messages, likely due to the semantic similarity between both classes. The model struggles to differentiate between the two categories, leading to lower confidence in each class.

BERT + ML classifier

Through the two previous versions of our model, we noticed that binary classification yielded better results but failed to distinguish between circumvention attempts and scam messages. To address this limitation, we incorporated a CatBoost classifier to predict the likelihood of a message being a scam. This approach leverages categorical features such as account age, sender type, and purchase history to improve our model’s accuracy.

5. Model training — Technical details

Batch creation

Given a set of conversations, how can we create training samples? We want the model to train on single messages and messages within context. To illustrate our sampling process, let’s use this example conversation:

“Are you on Instagram?” is a circumvention attempt. The user is trying to move the conversation to Instagram to continue negotiating prices and avoid platform fees.

From this conversation, we can create various data samples:

Single message with positive label

Message with context with positive label

Message with context with negative label

From one single conversation, we can create multiple training data samples of various lengths, various numbers of messages included in context, etc. By doing so, your model will learn to deal with varying message lengths and understand what makes a set of messages or conversations unsafe.

This sampling process also highlights the need for clean labels at the message level. As previously mentioned, having labels for every message in a conversation is more beneficial than having a dataset with labels on individual messages sampled from different conversations.

If you have a large amount of data, it will naturally capture more nuances and variations without needing such a sampling strategy.

Data augmentation

Data augmentation in NLP tasks is less straightforward than for computer vision tasks. The sampling process we use can already be viewed as a dataset augmentation technique.

Chat messages are often misspelled, either by inattention or because scammers are deliberately misspelling words to try and bypass the model. Based on this observation, we came up with three augmentation techniques:

Random character deletion: randomly removing characters in messages. This will break words or tokens, forcing invariance on word misspelling.
Random character insertion: randomly inserting characters in messages. This will break words or tokens, forcing invariance on word misspelling.
Digit replacement: replacing all digits in a message will introduce invariance regarding phone number, prices, sizes, etc.

AWS training job and hardware choice

To train our BERT model, we use an AWS SageMaker training job with Hugging Face estimator on a g4dn GPU instance. The model is small enough to fit on the NVIDIA T4 GPU. The model itself requires as little as 700Mb of RAM to fit on the GPU. The limitation will come from the maximum length of inputs and batch size. In our case, training with a maximum length of 512 tokens (upper bound input length of BERT model), we are limited to a batch size of 12 to fit in GPU memory.

We trained the model for 3 epochs using the AdamW optimizer with linear learning rate decay starting from lambda=1e-5, batch size of 12 with 4 steps of gradient accumulation (larger batches don’t fit on a T4 GPU), weight decay of 3e-4 for regularization and a few warmup steps. We use default values for Beta1, Beta2, and epsilon for the AdamW optimizer.

6. Evaluation — How do we compare models?

Evaluation is a tricky process.

Training evaluation

We use a test set to measure classification metrics such as precision, recall, F1 score, AUC, etc. However, the ground truth labels in our test set are not 100% accurate, which introduces noise and makes it difficult to compare models. Due to these inaccuracies, the differences in metrics between models may be smaller than what is required to achieve statistical significance. This means that any observed differences could be attributed to the noise from mislabeled test data rather than true performance differences, making the comparison statistically meaningless.

Inference evaluation

To evaluate model performance in production, we follow metrics such as precision and recall but also customer contacts or the number of users targeted by bad messages.

We achieve this by using human labelers who review a sample of predictions of the model.

7. Serving — How to serve this model in production?

Even though the model needs to be trained on GPU, we can perform inference on CPU for real-time moderation (with a 95th percentile below 100ms).

Our API, written in Python FastAPI, is served on Kubernetes for optimal service scaling and cost optimization.

8. Conclusion

Chat moderation is crucial to prevent GMV loss due to platform circumvention and improve user experience.

Data is often the main driver of success. More data and granular labels can truly unlock great performances. In our case, data collection, processing, and relabeling were our biggest challenges.

Through continuous refinement and evaluation, our AI-powered moderation system can adapt to evolving threats, maintaining a secure and engaging environment for all users.

Even though BERT is not the latest LLM model out there, it is a better fit for our use case: smaller, faster, bidirectional encoder, etc. Large GenAI models can be a good solution to get started on a subject but keep in mind that there are many other great models out there designed specifically for your use case.

How to create your own AI Chat Moderation model was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

Our Android Application Meets Jetpack Compose

Efe Ejemudaro — Mon, 29 Jan 2024 14:15:30 GMT

Photo by Elodie Oudot on Unsplash

As with every new technology, it’s always a good idea to introduce it pragmatically into a codebase that’s live in production and has millions of users; trying to ensure unnecessary problems aren’t introduced and understanding the benefits that come with this new tool. For us on the Android team at Vestiaire Collective, Jetpack Compose was one of these tools. So even though there was a lot of buzz around it and it was being actively pushed by many, here’s how we went about it.

For a long time in Native Android development, User Interfaces (UI) were built using Extensible Markup Language (XML). Views and ViewGroups are represented using XML tags, and the UI is fully developed as a tree of these tags in XML documents and inflated into Activities or Fragments, actively updating each view’s state wherever and whenever necessary.

But in 2021, Jetpack Compose was released; a brand new declarative methodology for building User Interfaces in Android. Compose has been actively gaining buzz in the Android community, making its way into more and more applications on the Google Play Store. Some advantages it provides over XML are:

Less code to achieve the same UI; leading to fewer bugs and faster development time.
More intuitive to write as Compose utilises a declarative API.
Compose is interoperable with XML. Meaning you can have both of them together in the same screen and codebase, enabling a progressive migration for Compose into projects.
It’s also powerful in terms of what can be achieved. Creating complex animations has never been easier.

Learning Compose as a team using the official Codelabs

With Compose comes a significant shift in thinking about code, methodology and architecture, amongst other things. The official Android documentation has some codelabs that provide a great path for getting on-boarded into Jetpack Compose, from the basic concepts, such as creating a text or a button in Compose and arranging composables in groups to the more complex ones like state management, Side Effects, and UI Testing. In the Android team, we began our Compose journey by going through these codelabs together as a team. These group sessions, while taking a time slot of an hour a week, brought really good results; keeping learning motivation high and relying on one another’s understanding for the confusing part of the new concepts. Altogether, these codelabs gave us a foundation to build on; a solid understanding of the ideas Compose brought forward.

For these sessions, we used the Mob Programming Format, which consisted of three major kinds of roles rotated weekly among the team members. First we have the “driver’’, one team member assigned to be the active developer in that session. They will be the one that actively codes, sharing their screen for everyone to follow. Then we have the ‘navigator’, a second team member that guides and directs the driver, looking for resources on what to do and organising what step of the code needs to be addressed next. And lastly, we have the “facilitators”, which are the other team members, following the resources and validating what’s being done by the driver and the navigator. It’s almost as if we are in a car, going on a journey.

Graphic highlighting the Mob Programming Format

Afterwards, taking advantage of the momentum from the codelabs, we decided to migrate one of our existing screens on the Android application to Jetpack Compose, using the same format. The ideal screen would be one that has low risk on the business side and could do with a polish user experience-wise. These criterias landed us on the user profile screen. Revamping this screen enabled us to think about Compose out of the guides of the codelabs, but actually using it to create a user interface directly related to our project. Of course we did face a number of challenges but nothing too difficult to figure out as a team. Eventually, with these sessions, we gained some valuable experience and confidence to create UI with Jetpack Compose.

Profile screen migration on Vestiaire Android app

Powering our Design System with Jetpack Compose

Jetpack Compose itself can be seen as a ‘design system’. What do I mean by that? Compose, as we interact with it as developers, is basically components (called composables). These components can be grouped together to make a desired UI and can be themed with a base theme providing colours and fonts. The Product Design Team at Vestiaire Collective has been working on unifying screens on the app and with Jetpack Compose came an opportunity to introduce a design system for Vestiaire Collective’s mobile applications. A design system ensures uniformity, as it provides everything from basic tokens such as colours, shapes, and fonts to patterns likeList of Product Blocks and Error Screens. As such, our design system, named Accent, was introduced. Accent provides all components that can be used on the Android application going forward, providing the tokens and also providing the actual components and patterns built from these tokens. The design system provides consistency and is uniform across all Vestiaire Collective platforms: Android app, iOS app and Web app. To learn more about our Accent design system, here’s a talk from Rami Trabelsi describing it and showing how it was developed and introduced into the Android application.

Accent design system showcasing its tokens, components, patterns, and blocks

How is it related here? Well on the Android app, Accent was created and driven using Jetpack Compose. All atoms, colours, dimensions, and typography were created in the base theme used on all new screens called AccentTheme, providing uniformity and maintainability with minimal lines of code on the feature itself. This also means any change made in Accent takes effect across the entire app instantly. Powerful, isn’t it?

Code snippet showcasing the use of Accent

Investigation into Compose performance

As with every new technology, especially one as big as this that would make some changes to the very way we think about code, we wanted to make sure to have as much information as we could before introducing it to the application. Keeping this in mind, the Core Mobile team did a lot of investigation around a lot of topics in Jetpack Compose. Introducing guidelines for a lot of Compose concepts, such as recomposition count, performance, managing state effectively, navigation among composables, and of course, relying on and contributing to the design system.

To also ensure adding Compose didn’t introduce sudden spikes to metrics such as project build time and application size, we added Compose dependencies to the Android codebase and revamped a screen only used internally to get a measure of these metrics: our feature flag manager screen.

Revamped feature flag manager screen

And some features…

After all the preliminary work that has been done, of course we have to introduce Jetpack Compose to the codebase at some point. Finally in the first quarter of 2023, the Hero Product Detail Page was introduced and Compose and the Accent design system were deemed sufficiently ready to work on this feature with. Therefore, we went with it. On a personal note, this was some of the most fun I’ve had working on a new feature.

And yes, there have been more and more features using the Accent design system and Jetpack Compose through the entire year. To name just a few, we have the Pick Up Location and Product Slider components on the Homepage, and the Notification Centre.

Graphic showcasing the use of Jetpack Compose in the application

Some pitfalls we encountered

While the good has undoubtedly outweighed the bad, we have also encountered some issues with Compose in our relatively short stint. Compose is still quite immature as it has been out for just about two years, and that means there are quite a number of functionalities that are not yet developed. For some context, its counterpart, XML, is over a decade old and you can imagine the stability that comes with that.

Some of the issues we have encountered are:

Implementing Pull to Refresh on the new Notification Centre Revamp as Compose Material 3 artifact does not have that functionality yet.
Achieving a sticky scrolling behaviour on the profile screen’s tabs while using Compose interoperability with XML.
Some lag while testing using development builds. (This does not happen on our release builds.)
Consistent issues with composables previews used while screens are under development.

However, we’ve been able to find our way around these pitfalls and overall, it has been a good experience so far and it should get even better in the future with Compose and Accent design system getting more mature.

Conclusion

From my experience so far, I would say it has been a resounding success for us to take the step to introduce Jetpack Compose to our codebase. And due to more features using Jetpack Compose, the Accent design system has also gotten more and more mature, leading to faster development time as new features can take advantage of components already developed in Accent. Currently our most used component is AccentText with exactly 150 usages, proving scalable so far, with the added advantage of only needing to change a configuration in exactly one place if ever needed. Progressively introducing Compose has also proven to be a very smart decision as we have been able to swiftly mitigate any problems we faced in production keeping the precious stability of our application.

Thanks for reading.

Our Android Application Meets Jetpack Compose was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to create your own background removal tool ?

Aurélien Houdbert — Tue, 19 Sep 2023 13:31:09 GMT

Vestiaire Collective

How to create your own AI background removal tool?

Lessons learned from building a clipping engine internally

Photo by ShareGrid on Unsplash

Introduction — Why you should clip images

Images are the heart of Vestiaire Collective. Thanks to images, potential buyers are able to assess the quality, color and condition of products. This makes image quality and consistency critical to user engagement and conversion rate.

When browsing our apps and website, you may have noticed that all main item pictures have no background on Vestiaire Collective, thanks to a third-party background removal tool to clip them. Some advantages of using a background removal tool include:

Luxury look and feel of the platform
The background becomes less distracting
Consistency & better image quality
Better assessment of the color and quality of the product

Products for sale on Vestiaire Collective’s website

While there are many existing providers in the market, building your own tool can save costs and provide a customized solution. This is what inspired us to explore how we could create our own background removal deep learning model. In this article, we will share our experience with building a background removal tool at Vestiaire Collective, including the challenges we faced and the solutions we found.

Understanding the basics of background removal — A quick review of available solutions

Removing the background from an image may seem like a simple task but it’s actually quite challenging. The background of an image can be composed of different objects, textures, and colors, and it can be difficult to distinguish between the background and the foreground objects (even for human eyes).

Background removal is a highly researched subject in AI and recently gained attention with the rise of deep learning models. We can distinguish two different types of approaches: Instance Segmentation and Salient Object Detection (SOD).

Instance Segmentation

Is a technique that specializes in identifying and labeling objects in an image, while also segmenting them from the background. Mask-RCNN is a good example of an instance segmentation network.

Mask-RCNN prediction example

Salient Object Detection (SOD)

Is a technique that identifies the most visually significant object(s) in an image and separates them from the background. The goal of Salient Object Detection is to highlight the most important parts of an image, which are typically the objects or regions that draw the viewer’s attention. UNet or U2Net are great examples of SOD networks.

U2Net prediction example

After a quick review of state of the art models, we identified U2Net models as the most effective approach for our use case. Indeed, U2Net (a SOD model) only needs precise segmentation masks and doesn’t require an object label to segment images. SOD were designed to accurately delineate foreground from background, contrary to Instance Segmentation that is optimized to locate the object in the image. Also, we wanted a model as general and as flexible as possible, and one that is able to handle unseen labels during training.

Collecting the data — Data pre-processing is your best friend

As mentioned in the previous section, in order to train U2Net properly, we need to build a dataset of images paired with their corresponding segmentation masks.

Original image and its corresponding segmentation mask

Good news: Vestiaire Collective has been clipping images for years with the help of a third party provider. This means we have access to an almost unlimited source of images. 🎉

Bad news: At Vestiaire Collective, all images are stored in JPEG format, clipped images are cropped, centered and saved with a white background.

So why is that an issue ?

The white background makes it difficult to extract a clean segmentation mask from the clipped images. If you try to set the white pixels as background, you may end up with “holes” in the object you are trying to extract if it contains white parts.
Because clipped images were cropped and centered, they are no longer aligned with the original image. During the training of U2Net, we need pixels between mask and image to be perfectly aligned.
JPEG images are compressed versions of original images, which will result in poor mask quality around edges.

To solve these issues we had to go through two main pre-processing steps:

Image Stitching

Is the process of combining multiple overlapping images to create a single, wider image. This is achieved by identifying common features in the images and aligning them to create a seamless, panoramic view. In our case, the clipped image was directly extracted from the original picture so this technique works pretty well to realign images.

Image stitching

Mask Refinement

Is used to remove artifacts arising because of the white background removal and the JPEG format. To remove these artifacts, we use a smoothing technique (Gaussian blur) combined with morphological transforms (combination of dilation and erosion).

Segmentation mask artifacts

These techniques proved to be efficient but were not sufficient to recreate blindly a clean dataset. About 1/3 of images were of poor quality, with “white holes” too large to be corrected by the mask refinement step. In the end we still needed to review our dataset manually.

Thanks to these methods we were able to build a 5K images dataset!

Notes: Later in the project, we identified that this mask refinement issue was not scalable as we needed more data to reach even better performances. Our final dataset contains only PNG images coming from our manual clipping provider here at Vestiaire. This first 5K JPEG images dataset provided very encouraging results that gave us traction with stakeholders and helped kick-start the project.

Also, even if our final dataset was built using a different source of data (png format with no background), we still used the stitching method to realign clipped images with the originals.

Building the model — U2Net

As mentioned earlier, the model we selected is U2Net. Their paper and code can be found here:

GitHub - xuebinqin/U-2-Net: The code for our newly accepted paper in Pattern Recognition 2020: "U^2-Net: Going Deeper with Nested U-Structure for Salient Object Detection."

We followed the implementation of the paper and used the exact same training settings. We trained the model from scratch on a dataset of 18,000 hand curated images.

When building the model, we made several general observations:

Better dataset quality led to better results than bigger dataset of lesser quality. Of course more data of better quality would result in even greater performances!
The model is “only” 40 million parameters. The pre-trained model, provided by the research team, was trained on a dataset of 10,000 images. Retraining from scratch is a viable option if you have enough training data. It is particularly useful if your custom dataset is significantly different from the pre-trained dataset (which was our case because our data is fashion items and the pre-training data is DUTS-TR).
e.g. In our case, we wanted to remove human body parts from images, but the pre-trained dataset included numerous examples of humans treated as foreground. When we fine-tuned the pre-trained model on our custom dataset, it resulted in undetermined regions, making the background removal less accurate.

Evaluating the model — A tricky task

At Vestiaire Collective, all images are manually reviewed and clipped images are visually checked by a human. A given proportion of images that are clipped by our third-party provider don’t pass this manual check. Our goal with this project is to be at least as good as the current system, but more economical.

So here comes the difficult part of the project. Background removal quality strongly relies on visual criteria of the clipping and it is very difficult to find a metric that reflects perfectly for the current rejection rate. Usual quality metrics such as f1-score, Dice coefficient, or IoU are not always sufficient to assess the overall quality of the model. The human rejection rate doesn’t correlate well with these classic segmentation metrics.

A much more useful metric we use is the “relax-f1”, a metric used in the original paper of U2Net. This metric is nothing more than a f1-score assessed only on the edges of the clipped object. This metric is particularly efficient because the visual quality of a clipping mostly comes from the quality of the edges.

relax-f1 visualisation

In order to assess the production performance of the model and compare it against our third-party provider, we use a dataset of 1,000 images manually reviewed by our curation agents. This process is long and costly and prevents quick iterations.

The metrics such as relax-f1 were a powerful way to carefully tune and select the best models to be sent for manual performance evaluation.

Post-processing to enhance and refine model predictions

The model choice, data quality and training strategy are very important but in the end, what differentiates a bad clipping from a good one is the post-processing.

The most important thing to understand here is that all quality metrics that we discussed in the previous section will not be impacted by our post-processing step. This means that even if those metrics indicate good results, there’s still a chance that the output may not look visually appealing, which we are trying to solve with our post-processing strategy.

To understand post-processing, it’s helpful to understand that the model (in our case, U2Net) outputs a probability map for each pixel in a low dimension (320 x 320px). When upscaling this map to the original image’s dimensions, even minor errors and inconsistencies can become more apparent and visually disturbing. For example, a blurry region in the predicted foreground can appear much larger when upscaled, leading to poor visual quality, particularly around the edges of the clipping.

Upscaling blurriness effect on the segmentation prediction

The edges are regions of uncertainty with a smooth gradient between 0 and 1. This is expected as it is the region delimiting the background from the foreground. This is a natural consequence of the model’s attempt to delineate the background from the foreground. In our example, we can also notice a blurry region at the bottom of the shoe which can lead to unwanted effects if left unaddressed.

To resolve this, we can try to binarize the map to get rid of these blurry areas (notice how the bottom of the sole was corrected).

Binarization of the segmentation prediction

However, this method may not be entirely sufficient. Although the edges are now better defined, the smooth transition between the background and foreground has been lost, resulting in a stair-step effect that is less visually appealing.

We need to take two additional steps to address this issue: blurring the mask and stretching it. The blurring will reintroduce a smooth transition between the background and foreground. However, this approach can result in too much blurring, which can further degrade the image. To overcome this problem, we can use a linear stretching step to reduce the blurring radius and achieve smooth, steep, visually appealing edges.

Results and Lessons Learned

Our background removal tool has achieved an impressive level of performance, better than our current third-party provider and significantly reducing costs.

Clipping engine results on fashion items

One of the biggest challenges we faced during this project was obtaining sufficient quantities of high-quality data. We spent a huge amount of time searching for suitable data sources in our database. However, even with limited available data, we were able to jump-start the project, generating interest from stakeholders. The traction with stakeholders then enabled us to access more and better quality data, unlocking budgets and resources.

Final considerations

Building our own background removal tool internally helped us drastically reduce image curation cost. In this project, data was our key to success.

Although the model itself (U2Net) greatly impacts the clipping quality of your tool, keep in mind that post-processing must not be ignored as it may help you get perfect results.

How to create your own background removal tool ? was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

SEO Case Study: Migrating Tradesy to Vestiaire Collective in 6 Months

Jean-Eric Blas-Châtelain — Tue, 02 May 2023 15:49:05 GMT

SEO migration | Vestiaire Collective

As businesses grow and evolve, website migrations become crucial for expansion and enhanced user experiences. In March 2022, Vestiaire Collective acquired Tradesy, its American counterpart. As a leading online marketplace specializing in pre-owned luxury fashion items growing rapidly in the US, migrating the Tradesy website to the Vestiaire Collective website smoothly and in a timely manner was critical both for our new and experienced users.

Mission accomplished! Thanks to incredible team efforts, we successfully completed the soft migration of tradesy.com to us.vestiairecollective.com in only six months. This in-depth article outlines the sophisticated, data-driven techniques and strategies employed to ensure a smooth and efficient migration, ultimately leading to significant improvements in keyword rankings, traffic, and overall website performance.

Data-Driven Techniques and Strategies

Leveraging Search Console Data for URL Analysis
The first step in the process was to identify all the URLs on tradesy.com that had received at least two clicks in the past 16 months. This information was extracted from Google Search Console, a tool that helps website owners monitor and optimize their search performance. With this data in hand, the team could then focus on redirecting these high-performing URLs to their corresponding pages on us.vestiairecollective.com.

2. Deleting Duplicate Products
To ensure a seamless transition, all duplicate products on Tradesy were deleted, as they would be redirected 1-to-1 by the IT team. This step helped to avoid any confusion or duplicate content issues during the migration process.

3. Competitor Benchmarking with SEMrush
SEMrush, an SEO tool that provides data on competitor performance, was used to benchmark the performance of Tradesy and Vestiaire Collective. This analysis helped the team to identify gaps and areas of opportunity in terms of keywords and SEO strategies.

4. Matching URLs Using Search Console and Scraping APIs
A combination of Search Console data and web scraping APIs was used to identify the best Vestiaire Collective URL for each Tradesy URL. This process involved finding the keyword that drove the most clicks for each Tradesy URL and then searching site:us.vestiairecollective.com {Tradesy keyword} to find the most relevant page on the new domain.

5. Manual 1-to-1 Matching for High-Performing URLs
For the remaining URLs that generated significant traffic, the team performed a manual 1-to-1 matching process. This ensured that the most relevant and high-performing pages were accurately redirected to their new counterparts on Vestiaire Collective’s domain.

6. Breadcrumb Scraping for Category Matching
To further refine the matching process, the team used web scraping techniques to extract the last breadcrumb element from each Tradesy URL. This information was then used to match Tradesy’s categories with those on Vestiaire Collective, ensuring a smooth user experience and preserving SEO value.

Categories | Vestiaire Collective website

7. Redirecting Unmatched Brands and Closet URLs to the Homepage
For brands not available on Vestiaire Collective and unmatched closet URLs, the team chose to redirect these pages to the homepage of us.vestiairecollective.com. This strategy helped to maintain a positive user experience and preserve some of the SEO value from these pages.

8. Catalog Page Matching and Product Page Redirects
The team successfully matched over 30,000 catalog pages between the two websites, ensuring that users could easily find the products they were looking for on the new domain. For product pages, the team implemented redirect rules based on product IDs, with special consideration for products that were migrated by the sellers themselves. This meticulous approach ensured that legacy Tradesy URLs were taken into account during the migration process.

Results Achieved

Migration announcement on the website

The successful migration of Tradesy.com to us.vestiairecollective.com yielded impressive results, demonstrating the effectiveness of the detailed and data-driven techniques employed:

Catalog Page Matching: A total of 30,000 catalog pages were matched and redirected, ensuring a seamless user experience while minimizing any potential traffic loss.
Product Page Redirection: ID-based rules were implemented to redirect product pages, including specific tactics for products that were migrated by the sellers themselves. This approach ensured that visitors could easily access the desired items on the new site.
Legacy URL Consideration: The migration plan took into account legacy URLs from Tradesy, ensuring that these older links were also redirected and maintained their value in terms of search rankings and user experience.
Keyword Growth: The total number of keywords targeted by the website doubled, reflecting a more comprehensive and robust SEO strategy that catered to a wider range of search queries.
Improved Ranking Positions: The number of keywords ranking in the top 3 positions (1–3) experienced substantial growth and doubled as well. This improvement indicates enhanced visibility in search results, leading to higher click-through rates and increased organic traffic.

TOP 3 trends

Large-scale URL Redirection: A total of 800K URLs were redirected throughout the migration process, showcasing the vast scope and meticulous planning involved in successfully executing such an extensive migration project.
Advanced Tracking: To better understand the user journey and identify visitors searching for Tradesy in Google, tracking was implemented to pinpoint users who landed on us.vestiairecollective.com after searching for Tradesy-related terms. This information provided valuable insights into user behavior and allowed for further optimization of the website experience for these visitors.
User Engagement and Retention: The migration resulted in improved user engagement and retention rates, as users searching for Tradesy were seamlessly redirected to the corresponding pages on us.vestiairecollective.com. This positively impacted key engagement metrics such as bounce rate, time on site, and pages per session.

These results testify to the effectiveness of the data-driven techniques, meticulous planning, and rigorous execution employed throughout the migration process. Speaking of which, let’s dive deep into the key steps of our global migration plan in the next section.

Detailed Migration Plan: 8 Major Steps

Social media announcement

Preparatory Measures: Integration and verification of Tradesy’s Search Console, IP whitelisting to facilitate website crawling, obtaining admin access for Google Analytics, and conducting an in-depth analysis of Tradesy’s SEO templates and tools.
Current Situation Evaluation: Benchmarking hard and soft KPIs (search analytics and indexed pages), assessing link profiles (including toxicity cleanup), generating a migration SEO checklist, and establishing a precise migration timeline.
Redirection Planning: Crafting a detailed mapping strategy by page type based on crawl results, evaluating external link structures, identifying and rectifying crawling errors and additional redirect requirements, refining redirection spreadsheets, and validating redirection rules in collaboration with the development team, followed by a thorough final review with DevOps.
Deployment Phase: Uploading the carefully crafted redirection plan (to remain live for at least one year post-migration) and executing the post-migration SEO checklist.
Tracking Implementation: Updating Search Console and Google Analytics with the change of address features, settings adjustments, and ensuring thorough documentation.
Communication Strategy: Incorporating the former name in title tags and meta descriptions, featuring it in the website footer, designing interstitials for redirected users, and updating social media account handles and descriptions.
Promotion and Outreach: Orchestrating email announcements, public relations campaigns, guest posts, social media engagement, Pay-per-click (PPC) advertising, and updating LinkedIn profiles.
Monitoring and Adjustment: Establishing a dedicated dashboard to track the migration’s impact and make data-driven adjustments as needed.

Conclusion

Vestiaire Collective’s successful soft migration of Tradesy.com to us.vestiairecollective.com exemplifies the power of meticulous planning, data-driven execution, and thorough monitoring in achieving a seamless transition. Leveraging data, employing multiple techniques, and adhering to a comprehensive migration plan enabled the company to achieve remarkable improvements in keyword rankings, traffic, and overall website performance. This case study provides valuable insights and serves as a model for effectively managing large-scale website migrations in the e-commerce sector, particularly when dealing with complex, multi-faceted websites in the luxury fashion space.

SEO Case Study: Migrating Tradesy to Vestiaire Collective in 6 Months was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

Balancing Innovation and Scalability: Developing Vestiaire Collective’s Photo Library Feature on…

Ali Fakih — Wed, 19 Apr 2023 09:53:41 GMT

Balancing Innovation and Scalability: Developing Vestiaire Collective’s Photo Library Feature on iOS

Apple Photos and Vestiaire Collective

As an experienced iOS engineer who has navigated the fast-paced, iterative world of agile development, you know that creating a feature with intricate technicalities can be a daunting task. Whether it’s utilizing native libraries or third-party tools, the learning curve can be steep. I want to take you on a journey through the development of the Photo Library feature at Vestiaire Collective and share the strategies that I found to be effective. Remember, there’s no one-size-fits-all solution, but hopefully, my experience will provide valuable insights and inspiration for your next project. Let’s dive in!

RFC

An essential aspect of any successful iOS development team is the creation and adherence to an RFC (Request for Comments) process. But what exactly is an RFC and why is it so crucial? An RFC is a document that outlines the proposed feature, its impact on users, and the technical and architectural plan for its development. The term originated from the days of ARPANET (Advanced Research Projects Agency Network), where researchers would present ideas for discussion and feedback from their peers. In the team standards, it is a requirement that an RFC is created and approved before development can begin. This may seem like an added step that could potentially slow down the development process, but it serves a critical purpose. It ensures that all necessary research and planning have been done. It also helps to identify and address potential issues before they become costly and time-consuming problems down the line.

RFC to the Rescue: How to Avoid Common Pitfalls in iOS Development

When a developer — let’s call them “A” — is tasked with creating a new feature, they may jump straight into coding without conducting proper research. This leads to a flurry of questions and feedback from the other iOS engineers on the team during the review process. This constant back-and-forth can be incredibly disruptive to the developer’s focus and attention and can leave them feeling like they’re dancing on eggshells. Furthermore, if these changes end up altering the core of the initial plan, it can lead to two types of bugs: those that are hard to detect during testing due to time constraints, and those that slow down the overall progress of the feature’s development. This can be a risky and nerve-wracking experience for the developer.

With a little reflection, it becomes clear that the time spent on creating an RFC may seem like a wasted effort in the beginning, but it is actually a valuable investment in terms of onboarding the team, sharing ideas, and receiving feedback. It’s important to remember that this is just one example of the many benefits that an RFC can provide.

“None of us is as smart as all of us.” — Ken Blanchard

Coding Time

First steps with the Photos framework on iOS

Now let’s go on a journey through the process of developing a photo library. To begin, let’s review the fundamental methods of fetching photos. One way to retrieve photos from the iOS native photo library is by utilizing the photos framework provided by Apple. This framework grants access to the user’s photos and videos, enabling various operations such as retrieving metadata and editing them.

Here’s an example of how to fetch all photos from the user’s library:

Import the Photos framework in your ViewController:

import Photos

2. Create a PHFetchOptions object to specify the options for fetching the photos. For example, you can set the sort descriptor to sort the photos by date:

let fetchOptions = PHFetchOptions()
fetchOptions.sortDescriptors = [NSSortDescriptor(key: "creationDate", ascending: false)]

3. Use the PHFetchResult object to fetch the photos from the library:

let fetchResult = PHAsset.fetchAssets(with: .image, options: fetchOptions)

4. Iterate through the fetchResult object to access each photo:

fetchResult.enumerateObjects { (asset, _, _) in
    // do something with the asset, for example, retrieve the image by calling the following function
    let image = getAssetThumbnail(asset: asset)
}

5. To retrieve the image from PHAsset, you can use the following function:

func getAssetThumbnail(asset: PHAsset) -> UIImage {
    let manager = PHImageManager.default()
    let option = PHImageRequestOptions()
    var thumbnail = UIImage()
    option.isSynchronous = true
    manager.requestImage(for: asset, targetSize: CGSize(width: 100, height: 100), contentMode: .aspectFit, options: option, resultHandler: {(result, info)->Void in
        thumbnail = result!
    })
    return thumbnail
}

Note: You should check for the user’s authorization before accessing their photo library.

Straightforward right? It’s a good place to start.

Defining the architecture of your photo library

With these steps in mind, you need to think about refactoring it and scaling it so it fits the project architecture. Here, you will need to brainstorm your ideas and try to explain every thought to yourself to make sure that you’ve found the right way forward. You will need to convert your thoughts into an architectural approach that you can explain to your team.

Here’s my architectural plan after reading the Apple documentation and researching existing solutions:

UML Diagram

In this diagram, you can see a new way of implementing the functional PHAsset and PHAssetCollection.

Note: I will not dive deep into writing the code.

In the bottom left, the MediaPickerCoordinator is where the navigation is handled (check the coordinator pattern). Inside the coordinator, we initiate two properties:

MediaManager
LibraryDataLogic

The MediaManager is similar to a functional repository that will handle your requests to fetch PHAsset , PHAssetCollection or PHAsset from a specific PHAssetCollection.

On the bottom right of the diagram, the MediaManager holds two kinds of repositories:

1. PhotoMediaRepository

2. AlbumMediaRepository

Each repository is responsible for providing the requested data and passing them to the MediaManager to convert them to a friendly model so that the UI can consume it.

PhotoMediaRepository

Three functions will be contained by this class:

1. loadLibrary

2. getPhotoPaginated

3. getFirstMedia

loadLibrary is responsible for returning the number of assets that exist in the library and retrieving PHFetchResult which is the result of all the assets that do exist in the library, and that is only because we had a different approach previously — this will be deleted after full release — .

getPhotoPaginated, as per its name, will return a paginated MediaProtocol. MediaProtocol is a type wrapper of the PHAsset.

getFirstMedia will return the first media from a PHAssetResult.

AlbumMediaRepository

This class contains one main function named loadCollections and will return an array of MediaCollectionProtocol type (which is a type wrapper of PHAssetCollection).

Building the LibraryMediaManager

Now that we have these two repositories, we can start building the LibraryMediaManager which will feed us with serval properties and let us take action upon album selection.

Here are the provided functions:

1. func loadLibrary()->Int

2. func loadMedia(from collection: MediaCollectionProtocol)->Int

3. func getPhotoPaginated(pagination:LibraryMediaPagination) throws ->[T]

4. func getFirstMedia(from mediaRsult: PHAssetResult) throws -> T

5. func loadCollections(collectionType:MediaCollection.CollectionType) -> [T]

6. func getRecentCollection() throws -> T

Some delegates are also available to refresh the results of library changes.

These functions are called from a DataLogic class that communicates with the ViewModel accordingly. Additionally, the MediaManager also has delegates to refresh the results on library changes.

This architecture allows for a separation of concerns with the following split:

The MediaManager handles the communication with the photo library.
The repositories handle the specific data fetching.
The DataLogic and ViewModel handle the application logic and presentation of the data.

This makes the code more organized, maintainable, and easy to test.

This design also allows for flexibility in the future, as new features or changes to the photo library can be easily implemented in the MediaManager and repositories without affecting the rest of the codebase.

By using a functional repository pattern, the MediaManager can be reused across different parts of the app. It also allows for easy testing as the MediaManager and repositories can be tested independently of the rest of the codebase.

In summary, the MediaManager is designed to handle the communication with the photo library, the PhotoMediaRepository and AlbumMediaRepository handle the specific data fetching, and the DataLogic and ViewModel handle the application logic and presentation of the data. This design allows for a separation of concerns, flexibility, and maintainability in the codebase.

Testing and troobleshooting

With this architectural setup, all our use cases were succeeding without any issues. Surprisingly however, one problem popped up at the last moment before release.

It’s important to keep in mind that even with a solid architecture, bugs may still appear, but having a good structure in place can make it easier to track down and fix them. Additionally, keeping in mind the user experience, accessibility, and security will help you deliver a better product.

During testing, we primarily utilized test devices but occasionally used personal devices. We encountered no performance issues as the number of media on these devices did not exceed 4,000 photos. However, when a teammate tested the feature on their device containing 70,000 photos, it became clear that there were significant performance issues, specifically that the library screen took 2–3 seconds (and sometimes longer) to open. This bug was unexpected as performance considerations were not explicitly detailed in Apple’s documentation or other sources.

Here are my findings on improving fetching performance.

Use caching

In order to apply caching using the Photos framework’s built-in mechanism, you can use the PHCachingImageManager class. This class allows you to cache multiple assets at once, and automatically caches them for fast access.

let manager = PHCachingImageManager()
let assets = PHAsset.fetchAssets(with: .image, options: nil)
manager.startCachingImages(for: assets, targetSize: CGSize(width: 100, height: 100), contentMode: .aspectFill, options: nil)

This will cache all assets of type “image” with a target size of 100x100 pixels and content mode of aspectFill.

The PHCachingImageManager class also provides the stopCachingImages method to stop caching images when they are no longer needed. This can help to improve the performance of your app by only caching the images that are currently being displayed.

It’s worth noting that when using the PHCachingImageManager class, you should consider your app’s specific use case and your company’s requirements. Besides, it’s important to test the app on different device models and iOS versions to make sure that it works correctly. Also, make sure to consider the performance and memory usage when working with large amounts of data.

For third-party caching libraries such as SDWebImage or Kingfisher, you can use the built-in caching functionality provided by the library. For example, SDWebImage provides a UIImageView extension that allows you to load an image from a URL and automatically cache it for future use.

let imageView = UIImageView()
imageView.sd_setImage(with: imageURL)

Kingfisher also provides similar functionality, you can load an image with a URL and it will cache it for future use.

let imageView = UIImageView()
imageView.kf.setImage(with: imageURL)

Use the correct target size

When fetching assets from the Photo Library, you can specify the target size of the image by using the PHImageManager class. The target size is specified in pixels and represents the dimensions of the image you want to retrieve. By specifying a smaller target size, you can reduce the time it takes to fetch the image.

Here’s an example of how to fetch an image with a target size of 100x100 pixels:

let manager = PHImageManager.default()
let options = PHImageRequestOptions()
options.deliveryMode = .fastFormat
options.resizeMode = .fast
manager.requestImage(for: asset, targetSize: CGSize(width: 100, height: 100), contentMode: .aspectFill, options: options) { (image, _) in
   // Use the image here
}

It’s important to note that specifying a smaller target size may result in a lower-resolution image. For this reason, it’s always better to consider the specific use case of your app and the needs of your users when choosing the target size. You should also consider that using a small target size will have a positive impact on the performance of your app, but using too small a target size might not be enough to display the image correctly in some scenarios.

Implement asynchronous requests

Using asynchronous requests to fetch assets from the photo library is a good way to improve the performance and efficiency of your app. When making synchronous requests, the app will block the main thread until the request is completed, which can result in a poor user experience.

By using asynchronous requests, the app can continue running while the request is being processed in the background, resulting in a more responsive user interface.

The Photos framework provides several ways to make asynchronous requests. One of the most common ways is to use the PHImageManager class to make an asynchronous request for an image. Here’s an example:

let manager = PHImageManager.default()
let options = PHImageRequestOptions()
options.deliveryMode = .highQualityFormat
options.isSynchronous = false
manager.requestImage(for: asset, targetSize: CGSize(width: 100, height: 100), contentMode: .aspectFill, options: options) { (image, _) in
  // Use the image here
}

In the above code snippet, the isSynchronous property is set to false. This makes the request asynchronous. You can also specify other options, such as the delivery mode, to further optimize the request.

You can use some third-party libraries like AlamoFire to make async requests as well.

In all cases, it’s important to note that using asynchronous requests will generally improve performance. It’s also essential to handle the responses correctly and not overload the system with too many requests at once.

You can use the options provided by PHImageRequestOptions to specify how the image should be delivered and to optimize the fetching process.

The PHImageRequestOptions class provides various options such as deliveryMode, resizeMode, isNetworkAccessAllowed, version and more which you can use to fine-tune the request.

For example, you can set the deliveryMode to .fastFormat which will increase the fetching speed.

let options = PHImageRequestOptions()
options.deliveryMode = .fastFormat

This will tell the image manager to deliver the requested image as quickly as possible. When this option is set, the image manager may return a lower-quality image.

Another option you can use is resizeMode. You can set it to .fast which will cause the image manager to resize the image as quickly as possible.

options.resizeMode = .fast

You can also set isNetworkAccessAllowed to true to allow the image manager to retrieve the image from the network if it’s not available locally.

options.isNetworkAccessAllowed = true

PHImageRequestOptions provides a version property too. This property is used to request a specific version of an asset.

options.version = .original

It’s important to note that when using the PHImageRequestOptions class, you should consider your app’s specific use case and your company’s requirements. Additionally, you should test the app on different device models and iOS versions to make sure that it works correctly. Make sure to also consider the performance and memory usage when working with large amounts of data.

Choose the correct image format

Use the correct image format. The format of the image you fetch can have a significant impact on the performance of your app. When fetching images that will be displayed on the screen, you should use a format that is optimized for screen display, such as JPEG or PNG. These formats are more efficient and take up less space than other formats like TIFF or HEIF.

JPEG is a lossy format, meaning that it discards some image data to reduce file size. It’s good for images with a lot of color details such as photographs. It’s also supported by all web browsers and many image editing software.

PNG is a lossless format. It means that it preserves all the data in the image. It’s good for images with solid colors and simple shapes. It’s supported by all web browsers and many image editing software as well.

On the other hand, if you are fetching images that will be processed or manipulated, you should use a format such as HEIF or TIFF that can provide higher quality. HEIF is a new format developed by Apple. It’s designed for high-quality images and videos, and it’s also more efficient than JPEG and TIFF in terms of storage space.

TIFF is a lossless format that is good for images that need to be edited and manipulated. It’s also good for images with a lot of color details such as photographs but is not supported by all web browsers and many image editing software.

It’s important to consider the specific use case of your app and the requirements of your company when choosing the format of the image you fetch. Additionally, you should test the app on different device models and iOS versions to make sure that it works correctly and also to consider the performance and memory usage when working with large amounts of data.

Leverage the content mode option

When fetching an image, you can use the content mode option to specify how the image should be displayed. This can help to reduce the time it takes to fetch the image and improve the performance of your app.

The Photos framework provides several content mode options that you can use, including:

PHImageContentMode.aspectFit: This content mode scales the image to fit within the specified size while preserving the aspect ratio of the original image. This can be useful for displaying small thumbnails where the entire image should be visible but doesn’t have to fill the entire space.
PHImageContentMode.aspectFill: This content mode scales the image to fill the specified size while preserving the aspect ratio of the original image. This can be useful for displaying images in a full-screen view where the entire image should be visible and fill the whole space.
PHImageContentMode.default: This content mode returns the image in its original size and aspect ratio.

You can use the appropriate content mode based on your app’s specific use case and your company’s requirements. Additionally, you should test the app on different device models and iOS versions to make sure that it works correctly, and also to consider the performance and memory usage when working with large amounts of data.

It’s important to note that when specifying the content mode, the framework will try to return the image in the exact size you asked for, but if the image size is not available, the framework will scale the image to match the size you asked for. Scaling images can take a long time and consume more resources on a device, so you should use the appropriate content mode for the specific use case of your app.

Conclusion

Developing a photo library or any complex feature for an iOS app can be a challenging task, but with your skills as an iOS developer, you are well-suited to tackle it. Here are a few things to consider as you begin.

1. Understand the requirements: Before you start coding, make sure you have a clear understanding of what the photo library should do and how it should behave. Get answers to questions such as “Will users be able to upload their own photos?”, “Will the library include editing tools?”, “How will the photos be organized and displayed to users?”

2. Research existing solutions: There are many open-source libraries and frameworks available that can help you implement a photo library in your app. Look into popular options such as Photos Framework, Photos UI, and Kingfisher and see if they meet your needs.

3. Plan your architecture: Once you have a good understanding of the requirements and existing solutions, you can start planning the architecture of your library. Consider factors such as performance, scalability, and maintainability.

4. Write clean, well-commented code: As you begin coding, make sure to write clean, well-organized code that is easy to understand and maintain. Remember to add comments and documentation to help others understand your code.

5. Test and debug: As you develop the library, be sure to test it thoroughly, and debug any issues that you encounter. Make sure it works correctly on different device models and iOS versions.

6. Follow the company’s guidelines and use their tools. Also, it’s better to use the company’s RFC process to get feedback and approval before implementation.

7. Remember that coding is not the only element of your project. You should also consider the user experience, accessibility, and security.

8. Don’t hesitate to ask for help or feedback from other team members. They may have different experiences that could help you.

Balancing Innovation and Scalability: Developing Vestiaire Collective’s Photo Library Feature on… was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

How we increased our Web Google Shopping conversion by 65%

Livio ERUTTI — Mon, 03 Apr 2023 13:51:06 GMT

Spoiler: do not underestimate the power of highly relevant product alternatives.

Picture by John Schnobrich — Unsplash

“Why is our Google Shopping bounce rate much higher than our other acquisition channels?”

“What’s wrong with our product landing pages: is it the design or the featured info?”

If you’ve ever made these comments when trying to make sense of poor acquisition campaign results, we have some good news: you’re not alone. We did too.

As part of our company’s Traffic Collective, our mission is to make buying second-hand items desirable and drive more sustainable fashion habits by growing a community of buyers and sellers. To fulfill our purpose, Google Shopping has always been one of our key acquisition channels; so much so that we strongly increased our investment in recent years, making it our first source of new visitors today.

However, we soon started to notice that we were experiencing a low conversion on this channel, limiting our ability to scale it. We decided to launch a full user journey audit to optimize our Google Shopping campaign.

In this article, we’ve gathered some key learnings on how we tackled this landing page improvement opportunity. We’ll emphasize how we identified the problem and tested our new approach to validate the performance of the new Google Shopping campaign.

Initial user journey

Re-clarifying the user journey was the first step in our roadmap. The overall funnel was rather classic, but we noticed distinct behaviors when visitors landed on the product pages (PDPs). We split them into three categories: Happy Case, Successful Alternative, and Unsuccessful Alternative.

You can find the full description of the flow in the image below.

High level customer journey from Google Shopping

Understanding the user pain points

When buying fashion, users look for items matching specific criteria: size, color, expected condition, etc. In that context, it is unlikely that the first item they encounter will be the right one. They want to browse alternatives.

But we already covered that pattern thanks to how we designed our product pages… Or did we?

Well, here’s what we found. When landing on a product page (PDP), visitors could:

Follow the breadcrumbs to go back to a catalog page, but the usage was very low.
Start a new search/browsing session, but this also had limited adoption.
Browse recommendations, but they were below the fold and not always available.
Go back to Google Shopping, which was the main journey.

And there it was: the main source of our problems. Because we were not answering our users’ need to easily find product alternatives on Vestiaire Collective, our Google Shopping campaigns resulted in high bounce and low conversion rates.

From then on, we focused on one single goal: help our users find product alternatives by browsing our platform.

What we learned from the data

Vestiaire Collective Listing page

We further analyzed the behavior of our clients and discovered that users landing on the listing pages had a much lower bounce rate and a higher CVR than those landing on the product pages. So we were able to formulate the following hypothesis: if users can more easily find product alternatives on our catalog pages, they’ll spend more time discovering our platform and purchasing items.

Our solution design

We opted for a first new design showing a listing of items from the same brand and category as the clicked item in our campaign landing pages. We knew this improvement could bring value as these criteria are the most used in users’ searches.

The technical solution had a lot of flexibility as any catalog path could be associated with any product allowing us to test multiple combinations (adding/removing criteria).

Vestiaire Collective Shopping landing page

Testing approach and results

When we started the tests, our website did not have proper A/B testing capabilities. So we had to find a workaround leveraging the ones of our product feed partner.

You can have a look at the full flow below.

Testing approach illustration

Verdict? The results were excellent! We observed a substantial uplift in our Shopping traffic KPIs:

-20% Bounce rate
+24% Product page views per session
+65% Conversion rate of New visitors to New buyers within seven days

Next up? Iterate and learn

We believed there were still improvements we could make to optimize the conversion rate. Because of limited A/B testing capabilities at the time, we had to make trade-offs to be able to move on while keeping a rational approach.

1. Adding a model criteria

Bags are one of the top-sold categories on our platform. Also, half of the bags sold on Vestiaire Collective have an identified model, which is a key purchasing criterion for the users. We hypothesized that filtering on the brand, category, and model could improve the relevancy of contents when that information was available.

Consequently, we launched another Productsup A/B test, measuring the ads’ ROI metric in Italy and the US.

Conclusion → Results were significantly better. We decided to roll out.

2. Showing the Welcome Offer on the Google Shopping landing page

New users can benefit from a special offer for their first order on the platform, which is a strong incentive to browse and convert. This discount was already being highlighted on the classic product page but not on our Shopping landing page. We launched a geo test and saw a positive trend in geographies where the Welcome Offer was displayed. This confirmed data from previous A/B tests indicating that showing the Welcome Offer positively impacts new users’ conversion in general.

Conclusion: We also chose to roll out this change.

Key takeaways

Here are the top three takeaways to remember from this Google Shopping campaigns revamp journey.

Mobile-first doesn’t mean mobile only: Although Vestiaire Collective has a mobile-first strategy, our web platform represents 85% of our new traffic. Hence, optimizing it to help new users discover our products and keep growing our business is crucial.
Do not underestimate the complexity of analyzing A/B test results: This was a challenge for this project, as we had to analyze the browsing behaviors on our website while the split was done in our tool to operate feeds.
Showing the Welcome Offer from the beginning of the journey is a key incentive for users to purchase items.

Picture by Hannah Morgan — Unsplash

How we increased our Web Google Shopping conversion by 65% was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

How we scaled a Data microservice on Kubernetes

Tanakorn Kriengkomol — Thu, 30 Mar 2023 07:58:58 GMT

The story of how our data team performed load testing to validate the scalability of one of their key microservices.

Kubernetes services | Growtika via Unsplash

What is load testing?

Load testing is a performance test that focuses on measuring a software’s response under different real-world load conditions.

This phase is highly important in the lifecycle of a microservice. It is the main piece of the puzzle that can ensure that a software will handle the load as expected in reality.

At Vestiaire Collective, Predator is our official tool for all load tests, and it’s maintained by our Vestiaire Collective platform team. It’s a powerful, flexible tool that we leverage to perform unlimited tests at low cost.

Predator UI

In this article, we’re going to walk through each load testing step of our CRIME service. CRIME is an in-house software dedicated to flagging counterfeit products. Recently, we developed a new Machine Learning model and implemented it into CRIME.

The goal of this post is to share the strategy and learnings that came along with the load testing process of CRIME, so that you can better understand how we make sure that all of the microservices we ship to Production are scalable.

At the end of the performance improvement phase, we wanted CRIME to:

1. Be able to handle 50 RPS.

2. Have a p95 response time < 1 second.

A little context

In the next sections, we will mention elements that play a role in the CRIME microservice architecture, namely Snowflake, DatAPI, Pricing service, DataDog and Grafana. It’s important for you to get a rough idea of why they’re important to better understand the load testing process of CRIME.

Here’s a brief glossary of those elements. Don’t hesitate to come back to these definitions later in your reading!

Technical glossary

Since a picture is worth a thousand words, here’s a diagram of the CRIME service architecture.

CRIME service architecture

Let’s put CRIME to the test

The test was done in the production environment by gradually increasing the load on CRIME. This way, we could ensure the smallest possible impact on other microservices relying on it.

You can find the results in the below table.

First load testing results for CRIME

*At 25 RPS (Requests Per Second), the load caused high latency on the CRIME service and affected other production calls. When checked against other dependent services, here’s what we saw.

Database query latency

DatAPI uses PostgreSQL as a back-end database.

The table that is used for serving features of CRIME doesn’t have an index on the key column.

Spike in CPU usage

CPU utilization of CRIME service went up by a large margin compared to the usual load.

Metrics from Grafana

Metrics from Datadog

Other metrics of the service during the load test

The spikes in figures 4 and 5 represent the increase in CPU usage for the following load-testing scenarios: 10 RPS, 20 RPS, and 25 RPS (canceled early) respectively. Both Grafana and Datadog pictures show roughly the same period.

What we concluded

CRIME was able to serve at most around 20 RPS for a short time (testing tasks lasted 5 minutes each) and was very sensitive to DatAPI performance.

Possible improvements

Thanks to the various tests, we were able to identify three different areas of improvement that could boost CRIME’s performance.

Add an index on the table in PostgreSQL.

2. Change the configuration of our pods’ CPU and Memory.

3. Increase the number of serving pods for CRIME and DatAPI.

Optimizing CRIME in preproduction environment

Every time we get to the optimization phase, our idea is to get a general feeling of what change will be most impactful. That’s why we decided to optimize CRIME based on two axes: CPU and max replicas.

The tests were all done using the below settings in Predator.

Starting RPS: 10 RPS

Ramp to: 100 RPS

Duration: 10 min

Baseline: Initial configuration before optimization

First, we set the initial CRIME preproduction pod configuration to match the production. It was useful to start optimizing in a baseline environment as close to the production environment as possible. We later tweaked this configuration to improve CRIME’s performance.

Here is the initial configuration of the CRIME pod.

CPU: 200m, 400m

Memory: 500Mi, 1Gi

Min Replicas: 2

Max Replicas: 3

Target CPU Utilization: 70%

Results

RPS maxed out at around 27–30 RPS and caused a bottleneck that made the rest of the requests stagnate. CPU was also maxed out and could not serve more requests.

Initial configuration

CPU utilization for each pod — Initial configuration

Optimization #1: Increase max replica pods

We increased the maximum number of replicas from 3 to 10.

CPU: 200m, 400m

Memory: 500Mi, 1Gi

Min Replicas: 2

Max Replicas: 10

Target CPU Utilization: 70%

Results

Increasing max replicas did help to a certain extent but the response time was still too high.

Also, it did not scale up to more than 10 replicas and maxed out at around 6–7 instances.

Only increase max replicas

CPU utilization by each pod — Increase replica

Optimization #2: Increase CPU size

We finally increased the CPU from 200m, 400m to 700m, 1.2.

CPU: 700m, 1.2

Memory: 500Mi, 1Gi

Min Replicas: 2

Max Replicas: 3

Target CPU Utilization: 70%

Results

Increasing CPU size seemed to help much more than purely increasing the maximum number of replicas. With this configuration, we concluded that we should be able to serve requests at a maximum of 50 RPS, which was our target!

Only increase CPU size

CPU Utilization by each pod — Increase CPU

Note

Most of the time within requests was spent waiting for external calls to return their outputs. The performance bottleneck was not caused by the model inference as it took less than 50 ms to complete for most of the calls.

Results of optimization in production

After moving from initial findings to the testing of multiple configurations in CRIME service, we successfully reached the end of our optimization process.

CRIME could now handle loads of 40 RPS, considering that there were no spike loads on external dependencies services i.e. DatAPI and pricing service.

The below charts are load testing results obtained from the production environment.

Most of the time request latency was under 1 second. However, we could still observe high latency spikes due to CRIME spawning new instances, as with the current implementation. This happens because CRIME initialize and set up their models for inference when starting up their containers (cf. cold start behavior).

Load test latency — final configuration

Load test RPS — final configuration

Final configuration

CPU: 500m, 1000m

Memory: 500Mi, 1Gi

Min Replicas: 2

Max Replicas: 10

Target CPU Utilization: 60%

Conclusion

Although we did not reach the target RPS of 50 RPS, the number we achieved after optimization is good for our planned use case. On the response time side, it should be more than good enough, as most requests were responded within 1 second.

For the CRIME service, most of the bottlenecks were coming from insufficient CPU resources. Increasing the CPU size for each pod really helped scale up the load the service could handle. But as the service still has dependency on DatAPI, we will need to look into how to improve it as a next step to ensure that all our data team services are working well together.

This optimization was only possible thanks to the good tooling available to us. Predator as a load-testing tool gives us a very easy time when iterating on a change and seeing the impact immediately after. In addition, both Datadog and Grafana — for service monitoring — give us a detailed view of the service performance and give valuable insights into where the bottlenecks are.

Picture by Fab Lentz via Unsplash

How we scaled a Data microservice on Kubernetes was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.

Designing Vestiaire Collective’s private messaging feature

Charly Leporc — Fri, 17 Mar 2023 15:57:41 GMT

Our journey to enable our community with a safe user-to-user chat.

What is Buyer-Seller Chat?

Are you a fellow user of the Vestiaire Collective app? In that case, you’re probably used to checking your messages in your notification center. Internally, we call this space buyer-seller chat.

New message in the Buyer-Seller chat

Buyer-seller chat is Vestiaire Collective’s private messaging feature on desktop and mobile. It is the go-to place for buyers and sellers to communicate about products for sale and orders in progress. Since its early days, it has become a key feature of the experience to boost transparency and conversion.

Our teams completed many design and development iterations to make the chat as safe and user-friendly as it is today. This article will shed some light on how we came up with the solution that is now live on Vestiaire Collective.

We hope you’ll take away some useful lessons from our failures and successes for your own projects.

So without further ado, let’s get started!

Defining the needs

The private messaging feature is one of the most critical features that Vestiaire Collective launched in 2020. Our members had been in need of this kind of feature for a long time.

Restricted interaction capacities

Before buyer-seller chat became available, our members had to use the comment sections on product pages to ask for more product details. The flow was a bit cumbersome. Plus, real-time communication was impossible for comments moderated by our teams.

The other way our members could connect was via our negotiation rooms, called Make Me An Offer (MMAO), which allow our buyers and sellers to propose discounts on their items. Here again, members were very limited in their interactions because they could only exchange prices.

MMAO negotiation room

Business challenges: staying competitive while maintaining our safety standards

Most Marketplace platforms offer a private messaging feature as their primary interaction tool. Consequently this became an expectation for all users, and not having it put us behind in terms of feature parity.

Another concern was the risk of marketplace circumvention. The role of a marketplace platform is to allow safe transactions and to protect both sides of the platform. Sellers need to be paid and buyers need to receive the item they paid for.

Multiple questions remained that we could not answer unless we launched the new chat to market:

Since Vestiaire Collective is a global platform, could it limit the risk of people taking their trades outside the marketplace?
Vestiaire Collective’s average basket size is significantly larger than the competition. Would selling out of the platform be consequently riskier for both our buyers and sellers?
We provide control and authentication of our products. Shouldn’t that be a sufficient reason to keep sales on the platform?

Product challenges: UX coherence and efficient development

On top of those business questions, we also ran into product development challenges in developing such a feature without going for a complete revamp of the app:

How could we make our Make Me An Offer feature coexist with a private messaging feature?
What control mechanism should we implement to monitor potential marketplace circumvention?
Should we build the entire feature in-house or use a third party?

The design phase

One of my favorite quotes that I remember from my design class is, “a good design answers the brief”. For this project, the brief was “to allow buyers to start a chat and enable sellers to answer a chat without touching a single piece of the current Make Me an Offer feature code.”

The requirements immediately put challenging constraints on the design team, which was already considering blending the two experiences. Blending the experience of chatting and making a price offer would make perfect sense if we had to build the feature from scratch, but that would have led us to touch so many pieces of our architecture that the project would have taken months.

Prototype blending the chat with the negotiation feature

Funnily enough, what we thought was a constraint actually ended up being an important feature. Dissociating the Make Me An Offer feature from the private messaging feature would allow us to enable one or the other for different users and be more nimble in deciding certain rules.

The technical solution

While our design team was working on the mockups, our engineering team and our product manager looked at third-party solutions. As exciting as it sounded for the engineering team to develop such a feature, we also had some expectations regarding time to market and were convinced that finding a backbone would help us gain time.

We decided to shortlist two vendors and spent two sprints building a basic proof of concept with the providers to decide which one would be the right solution to create a feature that we knew was likely to stay for a while. We definitely did not want to choose the wrong architecture, so we gave the time to our Front end, Back end, and Mobile engineers to play with the solution and decide which option they would feel most comfortable building on.

We based our comparisons on the following characteristics:

Scalability
The coding language used
The available features
The reactivity of the third-party tech team
The price of the solution

A few months after launching the feature, we were very happy about our decision and, more importantly, about the process we put in place. When working on such a project, it is essential to obtain approval from all stakeholders, so everyone feels engaged in delivering the best product possible.

This was also a good example of how important it is to deepdive on all the technical aspects of a third-party partner. A private messaging feature can potentially deal with thousand of messages simultaneously. The coding language of the service is important in order to make as much of a saving as possible on IT infrastructure. Any increase of load on our partner infrastructure would inevitably result in additional costs on our side. We would be tied to a partner that couldn’t scale without impacting our ROI. This could be the topic of another post: how infrastructure cost is often forgotten when designing a feature or engaging in commercial activities.

Focusing on the most impactful KPIs from Day 1

As with every new product initiative, setting the proper KPIs is the first step to success. Within Vestiaire Collective, we tend to have different stakeholders for initiatives impacting our daily active users or conversion. Since such a feature could impact both engagement and conversion, it was even more vital for us to define the key metrics we would follow up on to iterate on our MVP. Do we want to create a feature that brings buyers and sellers back to the platform on a daily basis? Or do we prefer to optimise it to help sellers sell faster?

We settled on two main success metrics related to conversion: the adoption and success rate, which we defined as follows.

We also built a monitoring dashboard to examine proxy metrics that helped us monitor marketplace circumventions. One good example is the number of products taken off the platform after a chat was started.

We were ready to launch our product iteratively and be aware in real time of how it was impacting our business positively while making sure we were managing risk.

Managing the risks

As described above, launching this product was planned to bring value to our users while simultaneously introducing the risk of more transactions happening outside the platform.

We ran a few brainstorming sessions with our engineers, designers, product managers, and business stakeholders to consider the different options to manage this hazard.

One might think that bringing added value to users would encourage them stick to the platform and prevent sales from happening outside of it. However, we could not preclude this from happening to some users. Not only are circumventions a source of revenue loss for the company, they are also a threat to our members, who trust us as a safe and controlled marketplace.

Making a transaction secure and preventing counterfeit products from being sold on the platform is the core of our business. Therefore, we started to think about a few features that would at least create a safety net to prevent misbehaviour.

Dictionary search

The first one we put in place was a dictionary-based flag mechanism. Here is how it worked in a nutshell:

A user wrote a message
Our algorithm checked the message against a dictionary of inappropriate words
If we detected unacceptable words, our solution would remove the input and replace it with a statement explaining the reasons.

Automatic message after a message deletion

The first iteration relied on quite a strict dictionary. It evolved manually and we later added a layer of automation to make it smarter at catching the most inappropriate words in every language while allowing all appropriate messages to go through.

This dictionary became the backbone of more features that we then started to add.

Automated user bans

We began to ban users who broke our guidelines multiple times and automatically sent emails to educate them about the good practices around using our buyer-seller chat.

Reporting harmful messages

We also included a report option. We were very positively surprised about the number of members from our community who were proactively denouncing misbehaviour. They significantly contributed to helping us strengthen our engine and making our private messaging feature safer and safer.

As stated before, keeping the chat separate from our Make Me An Offer feature enabled us to turn it off for given users while still allowing them to make and receive offers. Merging the two features would have made that more complicated. This was an excellent example of how a constraint can actually turn into a nice feature.

Conclusion

Risk vs Reward | Unsplash

We opted for a canary deployment strategy, releasing the update incrementally to ever-growing subsets of users starting at the end of summer 2020. Two years after the launch, up to 25% of our transactions are already happening with a private message. On the other hand, the number of conversations that we judged inappropriate was below 3% at the time of the launch. As a result, buyer-seller chat proved to bring more benefits than drawbacks, and our sellers praised the feature as it made them more likely to sell.

Ultimately, this confirms that solving a user’s problem and making their life easier should drive every product development process.

What do you think we should add to this feature?

If you have any questions about our processes and how we made it happen, feel free to drop us a comment.

We are always on the lookout for product folks to share the ride. Visit us here.

Designing Vestiaire Collective’s private messaging feature was originally published in Vestiaire Connected on Medium, where people are continuing the conversation by highlighting and responding to this story.