A Friendly Introduction to Regression Analysis: Bid Suggestions

Published in

Queenly Engineering

6 min readSep 29, 2020

Now more than ever people are cleaning out their closets and purging their stuff for the best offered price — this is where the Queenly “Make an Offer” feature comes into play. One of the key pillars of ecommerce, like eBay or Poshmark, is the online auction feature.

At Queenly, we oversee a great deal of bids, known as “offers”, from our inventory of over 30,000 dresses. The seller has 3 days to accept or reject an offer before it expires, and the buyer will only be charged when the seller accepts the offer. To optimize this feature, we made suggestions for offers based on regression models using data from prior successful orders.

Problem at hand- Effective Price Suggestion

We’ve noticed previous frustration with bidding; for instance, sellers might feel that they’re being “lowballed” or offended from deliberately low offers.

Trust is the foundation of any online marketplace, and we want to ensure our sellers are receiving serious and fair offers on their dresses.

The more realistic your offer is, the more likely the seller will accept. Concurrently, a buyer looking for that perfect dress is also looking for the most value out of their hard-earned dollars. Thus, we have created the “Make an Offer” feature to facilitate bidding rather than having these negotiations be public in the comment section.

The only way we can evolve as an online marketplace is to respond to customer feedback and real time data — for that reason what better than to incorporate a data science solution for the “Make An Offer” feature!

Quick intro to regression analysis

We’ve all studied regression curves in our statistics class, but how are they used in real world applications? Linear Regression is commonly used as a statistical method that has been adopted as a machine learning algorithm based on insights from customer behavior. Machine Learning is the science (and art) of programming computers so they can learn from data. Insights from data science are often applied to machine learning. One of the types of Machine Learning is supervised learning, an algorithm that infers a function from a dataset to predict output values for new data. From this supervised learning, we built a model that can predict output Y given input X based on training data (X,Y). Regression models target prediction value (offer price) based on independent variables (listed dress price).

To put simply, a regression curve is a line through data points to predict future data points. Also known as the “line of best fit” that minimizes the error between the true and predicted values. Regression curves are used in every industry from real estate to auto sales to fashion ecommerce- in our case, it’s bidding on dresses!

Implementation through Scikit-Learn

We want to figure out the optimal bid that’ll be accepted by the user. To predict this, we will use insights from past accepted offers. Keep in mind, that dresses on Queenly can be listed anywhere from under $100 to over thousands of dollars. A dress at $70 would most likely mean the seller is less willing to budge on the price, but a dress listed at $2,000 (which is not uncommon for a custom made designer gown) usually has more flexibility in what the seller is willing to let it go for. Therefore, this feature only applies to dresses over $200.

First, we extracted data from the offers based on listed price, offer price, and status of the offer. The data was then categorized by offer status: “accepted”, “rejected”, “expired”, “pending”, and “canceled”.

Thanks to python packages Pandas and Scikit-learn, running the regression analysis is a pretty straightforward implementation. Here we were able to quickly plot the data to extract the relevant coefficients.

Linear and 2nd-degree polynomial regression curves of “Accepted” offers data

Comparing different regression curves

The coefficient of determination, also known as the R² score, is used to evaluate the performance of the model on a scale from 0 to 1- with zero indicating that your model does no better than guessing the mean, and 1 indicating that your model predicted perfectly.

For our “accepted offers” linear regression model, we obtained a R² score of 0.96. In other words, the proportion of variance explained by our model are almost perfectly correlated. The “rejected offers” regression curve has a R² score of 0.85. A noisier dataset, but enough guidance to suggest a “bottom price” that offers should likely be placed higher than.

For analysis purposes, we ran regression on the “expired offer” data alone (that is, the data points in which an offer on a dress does not receive a “accept” or “reject” response from the seller and expires in 3 days), with the resulting curve having a R² of 0.80. From a product perspective, offers tend to “expire” if the seller is on the fence about the offered price, or are simply not motivated enough to accept the bid. We thus decided against using this data in our prediction model, as the sentiment from the seller is less clear.

Simply put, polynomial regression is much more accurate than Linear Regression. To a certain “degree”, using a higher-order polynomial curve is known to show greater accuracy.

Interestingly in our analysis, the linear and polynomial regression models were quite similar! The 2nd-degree polynomial model had a R²0.97, a minuscule improvement over our linear model. With a higher power, the model has more freedom to hit as many data points as possible. However, the higher degree (more than 2nd-degree) regression doesn’t guarantee accuracy because it also fits the noise in the data. It ultimately all comes down to not under- or overfitting the data.

from sklearn.preprocessing import PolynomialFeaturespolynomial_features= PolynomialFeatures(degree=2) # create object for the class
X_poly = polynomial_features.fit_transform(X) # transform data

Final Results

From a product perspective, we have suggested pricing that simply “makes sense” to our end users. These guidelines help buyers make more informed bids.

Improvements and Future of this project

Our dress inventory is constantly growing, so the “line of best fit” needs to be constantly re-evaluated. As you can see in our charts, we have less accurate data for certain outlier ranges, especially the $5k+ dresses are (as you can guess) sold much less frequently than $200 dresses. Therefore, in the future hopefully we can iterate on our data pipeline and refine the accuracy of our analysis, as well as experiment with weighing and blending different approaches.

For other aspects of the product, regression analysis is an effective strategy. For instance, adding the variable of time since the bid suggestion can be more flexible if the seller has had the listing on the app for longer. Or if the user doesn’t know how much to list their dress for, we can suggest the listing price of their gown according to the designer, condition, fabric, etc. We would develop a regression model and rules engine for this Price Suggestion feature that can account for different seller and buyer behavior. Furthermore, I would like to integrate regression analysis in marketing strategies to make more data driven business decisions.

Thank you for reading! We hope this post gave an educational introduction in showing data science as a fun, approachable field 🤓

Psst! Queenly is always on the lookout for future engineers and engineering interns. Hit us up!