View of Thessaloniki. The white building next to the sea is the Thessaloniki Musical Hall, where the IFCS 2019 conference took place. (Photos is taken from the conference’s website)

Share and Seek Knowledge unto the Land of Aristotle

Putri Wikie
Inside Bukalapak
Published in
6 min readOct 11, 2019

--

“Machine learning it is not about methods, but principles”
Theodoros Evgeniou, Professor Decision Sciences and Technology Management INSEAD Paris, at Plenary Invited session, IFCS 2019: “Principles for building your own machine learning methods: From theory to application to practice

Flooding by a huge amount of data, Bukalapak is so fortunate by the given chance to optimize the use of such sources to empower more and more small and medium enterprises (SMEs) in Indonesia through dozen of its Data Scientists who passionately work transforming structured and unstructured data to be actionable insights. One could argue that supervised and unsupervised learning are the most commonly used techniques in data science, especially in the e-commerce domain. Data Scientists at Bukalapak face daily challenges that require the knowledge of such techniques in order to answer business problems of ours. We have a number of in-house research that, in our opinion, could not only end-up in the production, but should also in scientific contribution. With the spirit of sharing, we (three Data Scientists at Bukalapak) presented our research to the scientific community, by attending the 2019 conference of the International Federation of Classification Societies (IFCS) that took place on 26th — 29th of August 2019 at Thessaloniki, Greece. It was such a great honor to be the representative of the company to participate as a contributing speaker to share our research with the other fellow research scientists and practitioners from twenty-nine countries.

The conference was formally opened by Theodore Chadjipadelis, the director of IFCS-2019 Conference, at the Thessaloniki Concert Hall. He recalled the significant attribution of the society to the birth of Data Science, where “Data Science, classification, and related methods” was set as the title for the IFSC conference back in 1996. The community were aware of the rapid change of (big-) data availability (which further affected methods to utilize such data) and tried to respond to it by hosting a biannual conference as a place for researchers, academia, and practitioners for exchanging ideas as well as establishing networking and collaboration.

At the first panel discussion on Data Science, Elections and Government, Athanasios Thanopoulos, President of the Hellenic Statistical Authority, addressed a topic related to the aforementioned massive data availability. He floored his opinion that better data can make all types of decision-makers to make better decisions, but more data do not necessarily mean better data or faster and better decisions, as (i) speed of processing is a challenge faced by humans and (ii) emulating aspects of human decision making is a key challenge faced by machines (not facing speed or volume constraints).

David J Hand gave a talk at the IFCS 2019, Thessaloniki Greece.

An Emeritus Professor of Mathematics at Imperial College London, the former president of the Royal Statistical Society, as well as the author of a famous book “The Improbability Principle: Why Coincidences, Miracles, and Rare Events Happen Every Day”, David John Hand, delivered a short lecture entitled: “Deciding what is what: Classification from A to Z”. He gave an important (and persuasive) remark about one of the performance measurements of a classification model, i.e. area under the receiver operating characteristics curve (AUC). He strongly suggested:

“Do not use the AUC to evaluate classifiers unless you are sure it is the right measure of performance”

The reason is as follows,

“The AUC (area under the receiver operating characteristics curve) is equivalent to a cost-weighted average of miss-classification rates as thee classification threshold varies, where the averaging distribution depends on the classifier used.
This is absurd. The relative belief that the different values of the threshold (a probability) will appropriate depends on the problem and the objectives, not on the data nor the method you use to analyze the data.
The relative severity of the two kinds of misclassification cannot depend on the classifier used. The AUC is (then) fundamentally incoherent.”

Given his numerous significant contributions, David John Hand was awarded IFCS 2019 Research Medal by the committee.

Julie Jose, a Professor of Statistics at Ecole Polytechnique in France, shared her though and research findings on the consistency of supervised learning with missing values. She made a remark that a simple mean-based imputation technique is bad for estimating parameter, but it works well for prediction. The remark could strengthen the reason to use of the imputation technique, as it is efficient to apply in practice.

Apart from classification, Clustering analysis received high attention from society as well. An attempt for neutral benchmarking studies of clustering was initiated during the conference. The session was hosted by Iven Van Mechelen (Professor at the Faculty of Psychology and Educational Sciences, KU Leuven Belgium) and Christian Hennig (Senior lecturer in Statistics at UCL London). Five research topics on clustering benchmarking were presented and were intensively discussed, including (statistical-)clustering methods, type of data sets, and metrics performances. Furthermore, Vladimir Batagelj (a Professor Emeritus of Ljubljiana Slovenia) recalled the concept of graph clustering, from the definition to application at the Plenary Invited talk. On a separate occasion, David Hunter (a Professor at Penn State Department of Statistics) discussed clustering in networks and model-based clustering without parametric assumption.

Our Contribution

We were glad to the given opportunity to participate at the conference as a speaker and to contribute to the community by sharing our scientific research. The three research topics that were presented at the conference are as follows (the underlined-names are the presenter)

  1. Knowledge graph mining and affinity analysis for product recommendation on online marketplace platforms

Authors: NS Muninggar, RA Permadi, S Simbolon, Verra Mukty and PW Novianti.
Snapshot: A multi-stage machine learning technique (ranging from knowledge graph mining, clustering via speaker-listener label propagation algorithm, to affinity analysis) were applied to improve our recommendation systems, specifically to provide complementary items.

2. User profiling for a better search strategy in e-commerce website

Authors: PW Novianti and FK Dewi
Snapshots: The study focuses on the application of machine learning algorithm to find users’ signal that were incorporated in the search strategy. Both offline and online validations were performed to evaluate the new proposed search strategies.

3. Double helix multi-stage text classification model to enhance chat user experience in e-commerce website.

Authors: F Revadiansyah, A Ghifari and R Meyvriska
Snapshots: The study emphases on text-based classification on the chat feature to detect emptiness product stock from the seller’s text messages.

Remarks

The conference was attended by academia, research scientists and practitioners. It was a great opportunity to share part of our research and, more importantly, to get a positive response from the community. We always believe that science and research could help us in reaching our goal in a more efficient way.

p.s. shall you interested to get more details about our research, please feel free to stop-by at our base camp in Kemang Timur Raya 22, South Jakarta, Indonesia (we have much more research topics inside!)

--

--