Vroom! Vroom! New Dataset Rolls Out 64,000 Pictures of Cars

Synced
SyncedReview
Published in
3 min readJan 7, 2020

To the machine learning community, high-quality data is as vital as the fuel to a car — it’s what keeps the ML engines running. Recently, a dataset with 64,000 pictures of cars appeared on GitHub, the work of data scientist Nicolas Gervais. The Car Connection Picture Dataset is of added interest because its images are conveniently labeled by make, model, year, price, horsepower, body style and more.

Gervais first collected more than a quarter million images from the website thecarconnection.com. His focus was on exteriors, and excluding car interior and other images left him with the 64k set, with picture sizes of about 320x210. Users can also access large versions of the images by adjusting the included scraper settings in “scrape.py.”

To demonstrate the dataset’s potential in practical applications, Gervais created a car price prediction model, and an Audi vs BMW deep learning classification task in PyTorch.

So, what is the first thing the ML community thought of with these 64,000 pictures of cars in hand? Making fantasy rides of course: “Seems like this would be really fun to hook up to StyleGAN2 and be able to generate cars based on those properties” suggested Reddit user Skylion007 in a sentiment echoed by others on the ML discussion reddit. StyleGAN is the hyperrealistic image generator developed by chip giant NVIDIA in 2018. Philip Wang used the tool to create “This Person Does Not Exist,” a website that generates a new hyperrealistic fake human face every time it’s refreshed. The tech has since extended to cats, airbnbs, anime faces — why not cars?

Reddit exchange on the new car dataset’s potential for building dream cars with GANs.

Aside from amusing vehicle style mashups, it’s also been suggested the dataset could be used to predict future car designs, or style and price trends, etc.

Gervais is a Python software engineer with TD in Montreal and a Machine Learning and Data Science student at McGill. The Car Connection Picture Dataset is available on his GitHub.

Journalist: Fangyu Cai | Editor: Michael Sarazen

We know you don’t want to miss any story. Subscribe to our popular Synced Global AI Weekly to get weekly AI updates.

Need a comprehensive review of the past, present and future of modern AI research development? Trends of AI Technology Development Report is out!

2018 Fortune Global 500 Public Company AI Adaptivity Report is out!
Purchase a Kindle-formatted report on Amazon.
Apply for Insight Partner Program to get a complimentary full PDF report.

--

--

Synced
SyncedReview

AI Technology & Industry Review — syncedreview.com | Newsletter: http://bit.ly/2IYL6Y2 | Share My Research http://bit.ly/2TrUPMI | Twitter: @Synced_Global