Product Recommendations with Machine Learning

Alexandra Petrus
Bucharest AI
Published in
8 min readMay 30, 2018

Notes from the Joint Meetup — Bucharest JS-BigData-AI — Event

✌️Thank you for joining us for an one to remember Joint Meetup event focused on delivering you #ProductRecommendations with #ML. Thank you as well to our lovely venue, 👌TechHub Bucharest, and their continuous support.

Powerful things happen when like-minded people connect.

We stand by the thinking that it’s the ability to collaborate well, rather than specific technical or cognitive skills that will help humans of the future thrive at work and beyond. For this reason and many more, Bucharest JS — Bucharest BigData — Bucharest AI teamed up for a powerful edition on Recommender Systems: How to build Product Recommendations with ML libs.

As AI introduces new challenges we are still discovering, it is increasingly important to create mixed and diverse teams where a wide range of skills and experiences are considered. Technology is here to augment our lives and tasks if we build it with the right specs in mind and focus on either help or not harming the user of the given tech solution to a problem.

With the increasingly popular demand of recommender systems/engines, we offer you an edition focused on one type of approach to design your product recommendations with Machine Learning: collaborative filtering. If you’re interesting in predicting the “rating” or “preference” a user would give to an item then definitely read on or ping us for help.;)

Collaborative filtering is a model building from the user’s past behaviour (previous purchases, favourite items etc) as well as similar decisions made by other users. Further being used to predict items (or ratings for items) that the user may have an interest in.

The main challenges when on-boarding into the journey of building such an approach are:

  • Cold start: large data sets needed, the challenge of most often ML quests;
  • Scalability: large data sets again, you will deal with millions of items/users, calculations take a large computational power;
  • Sparsity: wide complexity and availability of products on an e-comm site. At a significant scale, users only interact with a small set from the entire DB. What might be popular to a common buyer, it might not be popular enough for building a model on it.

This Edition’s agenda included:

  • Keynote: Building Recommender Systems, by Ruxandra Burtica, Computer Scientist, Machine Learning @Adobe Romania
  • Keynote & Demo: Using Spark’s MLlib to Make Product Recommendations, by Sorin Peste, Technology Solutions Professional, Data & AI @Microsoft Romania
  • Keynote: Pretrained models for TensorFlow.js, by Adrian Oprea, Fullstack Developer w/o ML knowledge, in a quest to understanding it

Thank you kindly to our Welcoming & Networking sponsor, Netopia MobilPay.

The entire talks can be watched via the live streamed recording.

Keynote: Building Recommender Systems

Ruxandra is a Data Scientist and is actively involved in the Bucharest developer community. She is working with us, Bucharest.AI, on designing Machine Learning workshops for the community, as she has previous experience in leading ML training programs. She has also held various presentations on Deep Learning, tech and entrepreneurship in the local developer ecosystem, and has a keen interest in sharing her knowledge with the community.

Bucharest AI asked, Ruxandra answered:

  1. As a machine learning engineer what do you value more, the quality or quantity of the data you work with? Or is it a mix of both?

It’s a mix of both. I don’t think you can find data that is already clean in any production environments, so the quantity plays an important role.

2. Do you remember your first machine learning project? What did you learn from it?

I remember it, definitely! I guess the most important thing I learned was how solid logistic regression can be, when applied to enough data.

3. What are the ‘struggles’ and difficulties of a machine learning specialist?

I guess focus is an important trait one should have, given the multitude of opportunities and fields where ML could be applied.

👆Click for slides deck from Ruxandra.

Keynote & demo: Using Spark’s MLlib to Make Product Recommendations

Sorin has been involved with software in one way or another ever since he was ten. From building his own games for fun, moving to professional developer, Technical Team Leader, Consultant, Solution Architect — and nowadays showcasing the latest and greatest stuff coming to you from Microsoft. Sorin is interested in all the things technology can do to make our lives better and more fun.

Specifically, he is keen to find out about — and play with — the latest and greatest on artificial intelligence, virtual reality, cloud computing and the Internet of Things.

Sorin’s talk covered Apache Spark’s Machine Learning Library (MLlib). This is a library built on top of Spark’s engine which allows us to train, test, validate and operationalize machine learning models while working with lots of data in a convenient way thanks to its robust abstractions over data sets.

Bucharest AI asked, Sorin answered:

  1. What are some of the most common mistakes aspiring data scientists make from your experience?

In my humble opinion, mistake no.1 is making the assumption that, if they just get their hands on some data and build a model, the business will use it. I think it’s vital to define the business objective first, and make sure the data scientists really understand “the right question” the business is trying to answer. After that, it’s about designing and building the entire solution — not just the training / testing / validation stuff, but the entire thing from data acquisition all the way to the results being consumed by business users, from their existing apps. Depending on the business vertical and the problem to be solved, operationalisation may end up where most of the work is taking place.

2. When people think about Data Science they tend to think that you need a PhD in Maths or Statistics. Do you think that’s necessary? What is your view on this?

Well, I don’t have a PhD so of course my answer will be NO. 🙂 But I believe that you do need to have a solid, practical grasp of the basics. For example, with statistics, I would (re-)study things like descriptive statistics, basic probability theory, random variables, the most common continuous and discrete distributions, Bayes’ Theorem and Bayesian inference, the Central Limit Theorem, covariance and correlation, sampling, hypothesis testing, regression, Simpson’s Paradox, finding and dealing with outliers, statistical techniques for dealing with missing data… and I’m sure that I left out a couple more.

Beyond statistics, a good grasp of basic Calculus and Linear Algebra will really help. And with the advent of Deep Neural Networks, a working knowledge of optimization algorithms (gradient descent in particular) is very, very useful to have.

Last but not least, you should really become proficient at visualizing your data. If you want to understand your data, the first step is to turn it into a picture! Fortunately today we have both tools (like Microsoft Power BI) and libraries (matplotlib, ggplot) which make this task much easier.

Of course, all of the above applies only to Data Science practitioners; a researcher will need a much more in-depth knowledge than that.

3. In what industry do you think Big Data will have a big impact in the next few years?

I’m going to say the medical industry and also oil & gas. The first one, because they now have the ability to deal efficiently with tons of data — not only numerical but also visual — with potentially huge implications for our collective well-being. The second, because it’s an industry that’s been traditionally slower to adopt the newest advances in information technology, so there is a lot of potential for using data and AI to optimize their very specific command & control processes, as well as planning, distribution, retail channels etc.

👆Click for slides deck from Sorin. [The code Sorin used in the demo is here: https://github.com/neaorin/databricks-demos]

Keynote: Pre-trained models for TensorFlow.js

Adrian is a full-stack developer and his areas of expertise and interests are:

  • Technology migration and adoption strategies and advice
  • Technical recruitment process optimization for JavaScript interviews
  • Software architecture consulting
  • Git workflow design and SVN to Git migrations
  • Technical trainings on JavaScript development using ES6 and beyond, Node.js, React.js and Git
  • Continuous Integration strategies for GitLab and BitBucket
  • Code review
  • Pair programming
  • Development environment design and setup using Docker
  • Mentorship and advice for aspiring JavaScript developers

In the meantime, here’s a quick starters guide to learning ML {feel free to pass these on}:

https://www.elementsofai.com/ — Simple course covering the basics of ML. It makes very clear distinctions between all the fuzzy words. It’s a good start should you not want to check standard formulas and models out there.

https://ai.google/education/

https://www.datacamp.com/community/tutorials/sets-in-python A must-read for anyone who programs data processing code in Python (and not just Python), courtesy of Andriy Burkov’s LinkedIn post share.

Local events making a difference 💪

Who attended the Joint Meetup? 🙏

More than 150 of you AI practitioners and enthusiasts and our three communities’ leaders as follows:

When’s the next Bucharest AI event?

June 26th, be there or be square if you’re into identity recognition, identification, management and governance. We’ll bring you the AI approach to signatures and facial recognition. Reserve a seat and can’t wait to have you join us.

🏄Summer vacation Alert — July and August we’re off for a seaside brainstorm to applied AI use cases.:) We shall return all diverse and with a fantastic Autumn line-up of events. Have a specific idea in mind that you want to share with us? Jot it down via email or a message.:) Find us on Facebook, LinkedIn or Meetup.

🔱May the AI force be with you & forever mindful and positively impactful.

--

--

Alexandra Petrus
Bucharest AI

New Tech Product Strategist & ENFJ-T | @BucharestAI |@Women_in_AI | ex-VP Products @reincubate | ❤#products #innovation #emergingtech #AI