Building a Product Catalog: eBay’s University Machine Learning Competition

Image for post
Image for post

Trade has played a critical role in the history of humanity and yet, data from ecommerce, the modern form of trading, has received limited attention from academia. We at eBay want to change that.

At eBay, we use state-of-the-art machine learning (ML), statistical modeling and inference, knowledge graphs, and other advanced technologies to solve business problems associated with massive amounts of data, much of which enters our system unstructured, incomplete, and sometimes incorrect. The use cases include query expansion and ranking, image recognition, recommendations, price guidance, fraud detection, machine translation, and more.

Though most of the above use cases are common among other technology companies, there is a very distinctive and unique challenge that pertains only to eBay — making sense of more than 1.3 billion listings, of which many are unstructured. Currently, we use our in-house machine learning solutions to approach this problem, but we also want to grow our community and future technologists that haven’t had access to this type of data. By working with universities, we hope that it will pique academic curiosity within ML, spur more research in the ecommerce domain powered by a real-world ecommerce dataset, and help us improve our platform.

To support this idea, eBay is hosting a machine learning competition to structure listing data, in other words, producing a product catalog. We are very excited to partner with students at the following universities (list below), which now can start using a subset of our public listing data to help solve a real-world ecommerce challenge. We have more than 40 students from these universities participate as a team or at individual capacity. There are a number of teams competing from:

  • NYU
  • Stanford
  • University at Buffalo
  • The University of Texas at Dallas

There are plenty of datasets out there, but the primary focus of those have been recommender systems, price estimation, computer vision, Natural Language Processing (NLP), etc. None have been at a scale pertaining to mapping unstructured items to well-cataloged products. We are using the EvalAI open source platform for hosting the challenge. Our main challenge page has all the relevant details.

The challenge

The dataset

Approximately 25,000 of those listings will be clustered by eBay using human judgment (“true clustering”). These clustered listings will be split into three groups: a) Validation set (approximately 12,500 listings), b) Quiz set (approximately 6,250 listings), c) Final submission set (approximately 6,250 listings).

The validation set is intended for participants to evaluate their approach. Anonymized identifiers and cluster labels will be provided to the participants. The quiz data is used for leaderboard scoring. The final submission set is used to determine the winner. For the quiz and the final submission dataset, neither the listing identifiers nor the cluster labels will be provided to the participants.

Timeline

Prize

The team behind

  • Engineering and Research — Roman Maslovskis, Uwe Mayer, Jean-David Ruvini, Anneliese Eisentraut, Akrit Mohapatra, Bennet Barouch, Pavan Vutukuru, Sathish Shanmugam, and Jon Degenhardt
  • Program Management — Roya Foroud
  • Legal — Brian Haslam, Brad Sanders, Sonia Valdez, and Kai Weingarten
  • Recruitment — Cindy Loggins
  • Comms — Melissa Ojeda

We would also like to thank the EvalAI team for quickly responding to our numerous queries. And finally a shoutout to our senior leadership ( Mohan Patt and Ron Knapp), who have been supporting this idea from the get-go.

We sincerely hope that making this real-world dataset available will entice universities and students to explore the ecommerce domain further and come up with novel approaches to solve complex problems that can have a positive impact on customers and sellers alike.

Originally published at https://tech.ebayinc.com on October 16, 2019.

eBayTech

All about eBay's technology from its engineers, researchers…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store