Building a Product Catalog: eBay’s 2nd Annual University Machine Learning Competition

Participating universities will structure listing data to help solve a real-world ecommerce challenge.

Published in

eBayTech

4 min readAug 25, 2020

After last year’s success, eBay is once again hosting a machine learning competition on an ecommerce dataset of eBay listings. This challenge is open to college and university students, and the winning team* will be offered a 2021 summer internship with eBay.

We invite students to start using our dataset to solve a real-world ecommerce challenge. There are many datasets out there, but the primary focus has been recommender systems, price estimation, computer vision, Natural Language Processing (NLP), and more. None have been at a scale pertaining to mapping unstructured items to well-cataloged products. Like last year, we sincerely hope that making this real-world dataset available will entice students to explore the ecommerce domain further and come up with novel approaches to solve complex problems that can positively impact our platform and services.

The Challenge

Problem

The question we invite students to address is how to identify two or more listings as being for the same product by putting them into the same group. We call this Product Level Equivalency (PLE). That is, if a buyer purchased two items from two different listings in a single group, and assuming the items were in the same condition, they would assess that they had obtained two instances of the same product. PLE is defined over manufacturer specifications. That is, offer specific details such as condition, or item location are to be ignored. For example, a broken phone and a new phone with the exact same specifications (make, model, color, memory size, etc.) are considered to be Product Level Equivalent, while a golden and a gray phone of otherwise the same make and model are not considered Product Level Equivalent.

The objective is thus to produce a clustering of the listings according to PLE. More mathematically, let L be the set of all listings. A clustering C is a partition of L into disjoint subsets:

Ideally, all listings in each Ci are Product Level Equivalent, and listings from different clusters are not Product Level Equivalent.

The measurable objective, evaluation, submission format, and other details are available on EvalAI.

Data

The data set consists of approximately 1 million selected unlabeled public listings. We also provide an Annexure document that describes the columns and parsing logic.

Approximately 25,000 of those listings will be clustered by eBay using human judgment (“true clustering”). These clustered listings will be split into three groups: a) Validation set (approximately 12,500 listings), b) Quiz set (approximately 6,250 listings), c) Test set (approximately 6,250 listings).

The validation set is intended for participants to evaluate their approach. Anonymized identifiers and cluster labels will be provided to the participants. We will release the validation set along with the main dataset.

The quiz data is used for leaderboard scoring. The test set is used as a factor to determine the winner. For the quiz and the test datasets, neither the listing identifiers nor the cluster labels will be provided to the participants.

Hosting

The challenge will be hosted on the open-source platform EvalAI. College and university students will submit their entries through EvalAI, which will be evaluated for leaderboard scoring. Please checkout the EvalAI challenge page for more details.

Timelines

Dates are subject to change, but expected deadlines will be:

August 24th, 2020 — Challenge begins. Access to the dataset is granted. We start accepting submissions through EvalAI and begin the evaluations.

February 1st, 2021 — Challenge ends.

February 22nd, 2021 — We announce winners.

Participation Criteria and Prize

Teams (no more than 5 members per team) must only include students who are interested in an internship.

Assuming eligibility criteria are met, members of the winning team will be offered an internship for Summer 2021 at eBay Inc. eBay’s internship program is a combination of real work experience plus a robust program giving interns exposure to various business verticals, executives and networking opportunities. The internship will also be an excellent opportunity for students to put their ML models into real use.

Further details on the participant eligibility criteria, internship prize eligibility criteria, official contest agreement, and rules for the competition, as well as other details, are available as part of the official contest rule package. See eBay Contact details below to receive the official contest rule package.

eBay Contact

To find out more about how to participate in the challenge and receive the official contest rule package, please reach out to MLChallenge@ebay.com.

*Teams should be no more than five members

Originally published at https://tech.ebayinc.com on August 25, 2020.