Published in


RE·WORK: Deep Learning in Retail Summit (London, UK)


RE•WORK is one of the top global Machine Intelligence industry conference/summit organizers.

The aim of Rework: Combining entrepreneurship, technology & science to re-work the future.

Time: 6.1–6.2 2017

Location: ETC Venues 155 Bishopsgate Liverpool St London EC2M 3YD

Introduction for this Rework conference:

In general, Rework summit will invite extraordinary speakers to discover advances in deep learning and smart artificial intelligence from the world’s leading innovators and showcase the opportunities of advancing trends in deep learning and their impact on business & society. Many excellent Data scientists, Machining Learning Scientists and related entrepreneurs will attend this summit to know people, who have the similar interests and discuss technology shaping the future.

The topic of this summit is Discover the latest deep learning advancements and how to leverage methods to improve advertising and the retail experience. There are 19 excellent speakers, attracting many people and companies including IBM, Amazon. The themes of this conference include Deep Learning Trends and Customer Insight, Forecasting and Recommendations, Warehouse and Stock optimization and Computer Vision and Image Recognition. Many of them are from startups, which are quite interesting and energetic in the conference. (

This is a compact report for this summit including the introduction for speakers and content of each talk.

On June 1st, there are ten speakers talking about the application of machine learning/deep learning in the retail field.

The first speaker is Ben Chamberlain, Senior Data Scientist, from ASOS.

In this talk, Chamberlain started with the concept of customer lifetime value. Customer Lifetime Value means how much value can be created by a customer during his/her entire relationship with the company [1]. Then, he talked about two methods: Random forest and the comparison between RF, DNN and logistic regression both over the efficiency and cost. Then he offered a solution to calculate the CLTV using a Wide & Deep model which is consist of logistic regression and deep neural network. After that, he introduced embedding and Hyperbolic space, which are used in ASOS.

The second speaker is Kumar Ujjwal, Sr Product Manager Big Data & ML, Kohl’s Department Stores. He shared their research about Computer Vision and Natural language processing to encourage a customer to make smart decisions during shopping. The talk of Kumar was divided into two parts: the concept of Micro-Moment trend in retail, leverage big data and machine learning Micro Moments. He talked about the importance of Micro-Moment and what people are thinking about when they search online especially with a phone. Since people always search online before they step into the store, what customers do online could offer much information for the company to make analysis and then make a better recommendation for each user. What they basically do is making product discovery more natural and easy via Natural Language Search and Visual Search and personalised content for customers. They use analysis users’ behaviour using big data and machine learning and then make recommendations for users. They also built their own system called AI First Decision Making Approach to provide personalised experience in real-time.

The third speaker is Jan Gasthaus, Machine Learning Scientist from Amazon. He showed their Autoregressive Recurrent Networks to predict the future probabilistic distribution of items according to past data. This is a paper in If we could know the distribution of items in the future, we can make use of our resources more reasonable such as reducing excess inventory in supply chain. A traditional approach like Box-Jenkins or State-space Models requires lots of manual work by experts, cannot learn patterns across time series. Then he introduced feed-forward neural network. This composes complex black-box functions from simpler building blocks and learns them end-to-end, but the outputs not correlated across time, which is not good for this kind of time-series prediction task. Therefore, they argued to use Autoregressive Recurrent Networks with LSTM cell to solve this problem. The model he designed could be applied to forecast yields flexible, accurate and scalable forecasting system. Besides, it can learn complex temporal patterns across time series.The paper is available on Arxiv:

Next up is Rami AI-Salman, Data Scientist & Machine Learning Engineer from Trivago. He talked about their methods to make a good hotel recommendation both for text searching and image searching. Whichever people input text or image, the search engine could give right results for their queries. He started with Artificial Neural Networks, the basic neural network model then turned to word embedding. He talked about two models for representing the words as vectors: Skip-Grams and Continues Bag of Words. Finally, he introduced their Deep Artificial Neural Networks based on a paper DeepTags: Integration of Various VGI Resources Towards Enhanced Data Quality. Their product could be used on the website via googling trivago, but it just supplies text search yet.

The fifth talk is about an online shopping company which called PICNIC. Daniel Gebler, the CTO of Picnic, told us they use machine learning to do the customer behavioral analysis and prediction engine. He started with two challenges of bulk recommendations precision and seasonal variation. Then he talked about the formalization of this question and how to solve it by the neural network. In his talk, he talked about the big data (bread range for many different customers) and deep data (the history of orders of one customer) using LSTM RNN and RFM-based (recency, frequency, month) strategy to preprocess data. Because of the shopping data is special and different season will reflect different features of data, human preprocessing data could improve the model.

The sixth speaker, Calvin Seward, the Research Scientist, from Zalando, showed us a way to find the best way to pick stuff in a warehouse. His outline is three parts: Picker routeing problem, Order batching problem and Neural Network Estimate of Pick Route Length. They developed an algorithm called OCaPi and calculate it using Convolutional Neural Network with ReLUs. In the future, he might use reinforcement learning to solve batching problem because it could also be seen as a game like Go.


The next one is Pau Carre Cardona from Gilt. His topic is deep learning for product faceting and similarity using product image and text description separately. He introduced the automated faceting mechanism they are using to improve Dataset Quality. Then he talked about the ResNet and spatial transformer, which could be used to locate features in the product image. After that, he turned to text description and talked about dilated convolutions to replace Recurrent Neural Networks. Dilated convolutions can detect a pattern between words distant from each other (Convolutions only detect patterns between words close to each other). Finally, he told us they use an unsupervised method to get the product image similarity via embeddings distance as dissimilarity metric. Therefore, given a product, they could retrieve top-N similar products.

Spatial transformers are explained here

and dilated convolutions for NLP are explained here

Next, Miroslav Kobetski, the Co-Founder, from Volumental showed their technology to measure body using 3D scanning with CNN. To provide consumers with the best recommendations and engaging shopping assistance, they try to get the accurate data of consumer to make the analysis. This guy talked a new tech using CNN to calculate the distance between two images to get a similar picture to query. They can reduce the annotation effort needed to reach high accuracy on new types of visual data.

Then a lady named Susana Zoghbi, Postdoctoral Researcher from KU Leuven showed their research about attribute abstract. People can perform a novel cross-modal search task in fashion, develop novel representations for cross-modal translations from noisy data and annotate image via their research. She started with their goal: translate images into text and vice versa. For example, it could abstract red from a picture which is a red hat. This is a good for improving product recommendations with fine-grained attributes. They used Bag of words and semantic word embeddings to do textual representations, scale-invariant feature transform and convolutional neural networks to do image representations, bilingual latent dirichlet allocation, canonical correlation analysis and neural network to do alignment models. Their results show that it is possible to design algorithms that automatically “translate” visual concepts into text and vice-versa.

Finally, Amau Ramisa, Sr Computer Vision Researcher, from Wide Eyes Technologies, showed an interesting method to make search only by screening a picture via phone. Wide Eyes Technologies provides its technology to fashion companies for use in their applications. He starts with the concept what should be used to query: Search by text is out of date, we should use pictures now. They use Siamese networks to calculate the similarity of two pictures and then offer the top-n similar picture (product). They could find almost every product in the picture and recommend similar products as long as they are in their dataset. The point in their technology is that they can basically rule out the noise and identify different products at the same time.

There are eight speakers on second day (6.2)

First of all, Deepomatic showed their product which could help customers to build their own datasets and train it for themselves. It is a kind of service offered by them. People could use it play like a machine learning engineer without the knowledge of building a machine learning model. All the things users needs to do is build a dataset. Augustin Marty, Co-Founder & CEO of Deepomatic start with Siamese CNN, which is similar to Wide eyes on the first day. Then he introduced the performance of their technology and a demo. Their company could offer a tool building an AI for each company. Then these company could use this to analyze customer and offer better services.

The second speaker is Jekaterina Novikova, from Heriot-Watt University. Her speech is about pepper, which is a quite famous robot developed by people from the UK, France and Japan. She talked about the application of pepper in Retail as a social robot to help customers. Evaluation is the first challenge when we try to know whether this robot does a good dialogue. Machine learning is a way to develop dialogue strategy between robot and customer and evaluating the sentence (compare the similarity of human answer and robot answer and give a mark for robot answer) generated by the robot. They used reinforcement learning to combine task-related and chat-related dialogue according to the human behavior since correct rewards are a crucial factor in dialogue policy training. Finally, she thought social robots are coming to retail industry and ML is used both for developing dialogue strategy and evaluating results.


This guy started with the cache technology they are using to speed up the query speed, but which also resulted in a problem — user cannot get the latest price and they complained about this problem seriously. Then he talked about the strategy they are using to predict the binary change of price (increase or decrease) avoiding frequently asking their partner the prices of flight. The method they are using is Random forest, and they picked more than ten features as the factor they are going to use. Then, he said they use augment data to solve no visibility for the Quote Age > TTL and develop a data-trace simulator of cache to eliminate the limitation of supervised evaluation. Finally, he shared Embedding as a way to encode location data with us. After the conference, we get in touch with him and talk some interesting things about the bought of Skyscanner by, and we are happy to hear there is basic no impact to them.

The fourth speaker is Kostas Perifanos from Argos. He started with the basic idea of word embedding and some popular methods to do it. Then he introduced the Neural probabilistic language model here. Then he introduced the training process and results. They use embeddings to find synonyms for the query terms. They transfer the query sentence into the vector and then use this to look for the synonymous improving the searching accuracy.


The fifth speaker is Tom Charman, Co-founder & CEO from KOMPAS. His speech focused on the importance of leveraging data, with the intention of training machines to learn about behavioral patterns, and make recommendations. We look at the accuracy of these recommendations, and how we can test the success of such machines and algorithms. His speech is about computer vision (object recognition and facial recognition), NLP (Machine translation, sentiment analysis and chatbots) and Machine learning (pattern analysis and clustering data). Finally, he indicated the application of AI in the retail field.


The sixth speaker is Ekaterina Volkova-Volkmar from Codec. This lady showed us the necessary to find out what customer needs is a serious problem for all the companies because people nowadays always waste much time on meaningless things like boring video on YouTube. Codec help company to understand their customers and help customer to make a decision. She started with the challenge we (company and customer) are facing now. They make 5 questions (what are they interested in? who do they listen to? Who are these people? How do they interact? How does everything change across time?) into a tribe and do a good community according to this tribe. They could offer an analysis according to this tribe to companies. Then the company could make the recommendation more practical. Finally, she gave a good thought: big data is good, smart data is better.

The seventh topic is the Next generation consumer analytics given by Cathal Gurrin. This is a lecturer from Dublin University who showed how to use the data detected by wearable equipment to make an analysis for a consumer. He thought this is a new era of personal sensing allows us to understand people in previously unimaginable detail. At first, he describes the three steps of consumer analytics (Professional & ExpensiveLower-Cost Data Driven Low cost — high volume — Extreme insights). Then he introduced the wearable equipment and the various data collected by these devices. Finally, he thinks people will engage in this data collection and deep consumer analytics will become as simple as a google search.

The last speech was given by Ofri Ben Porat, who is the CEO & Co-Founder from Pixoneye. Pixoneye can make a profile for a specific person according to the photo gallery in his/her phone. And they said they just abstract the information about users and have no care about what the photo is, so some noise photo does not matter. He started with lifeline then turn to the photo gallery. These two are similar. Then he introduced the image understanding and contextual understanding, which are two technologies applied by them now.




Author: Junyi Li



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store


AI Technology & Industry Review — | Newsletter: | Share My Research | Twitter: @Synced_Global