How AI Predicts Air Quality?
The Alibaba tech team’s innovative solution on KDD Cup 2018
Hu Ke, senior engineer from Alibaba’s Advertising Division, Alimama, and his two partners from Microsoft and Beijing University built up a team called “Getmax“ and won the KDD Cup 2018.
KDD Cup, the most influential event in the global data mining field, is held by the ACM Association’s International top conference SIGKDD and has been held annually since 1997. The theme of competition is the application capabilities in actual scenarios. This year, the winning solution is to predict the concentration of PM2.5\PM10\O3 in the next 48 hours base on the weather data of Beijing, China and London, UK, provided by the organizers. The competition is important for improving the harsh environment and human survival.
Moreover, Hu Ke now works for Alimama’s Search Advertising Algorithm team, in helping to advertise sorting algorithms, which include applying deep learning models to solve business problems, and so on. The team is also applying and optimizing the variety of deep learning models, and the learning experience accumulated in Hu’s daily work is the main reason for winning the competition.
Secrets of winning three awards: The feature of air forecast + deep learning model to solve air prediction problems
Unlike the previous year in which they would only count the final result, KDD Cup 2018 counted the result of the competition process and set up three new awards — “The General Track”, “The Last 10 days Special Award” and “The Best Long-Tern prediction Award”, in rewarding the outstanding team in these three dimensions. “Getmax”, who stood out from more than 4,000 teams, because of their comprehensive and outstanding performance. “Getmax” is the only team that wins three awards, which wins one runner-up and two championships respectively.
This year’s competition theme is special and unique. The air quality prediction is not only characterized by weak regularity, instability, and mutation, but also is expected to predict every hour in the next 48 hours, as well as dozens of locations in Beijing/London. Modal time series and location topological relationships present challenges to machine learning models.
In Hu Ke’s view, the final result mainly comes from the optimization of both features and models.
It is found that wind speed and direction are the keys to long-term and mutation prediction, so the characteristics of weather forecast are refined in dimensions of time and space. It is also noted that the noise processing, binning of data mining, and neural network structure adjustment can be used to solve the inconsistent problems of missing weather forecast training data.
In addition to the tree model using fine-grained feature engineering, the deep learning model is also used to perform relatively automatic feature-to-sequence and inter-sequence relationship mining. And for the characteristics of the long-term sequence problem, the DNN network and the RNN network are optimized and adjusted respectively, which solves the problems of close prediction values between sequences and unstable prediction values of long sequences.
Since the working experience plays a role of techniques support and accumulates the ideas in solving the problems, Hu said,” The application of deep learning model is the key to distinguishing the ranks of the previous teams. Luckily, I got the in-depth exploration in the field of advertising in using the DNN and RNN models in my daily work.
Alimama builds the Ad Tech: Drives marketing with technology advancement
“There are some open source solutions inspired in the competition, which play the key role in promoting the industrial development,” Hu said. Hu Ke is interested in the algorithm competition, and also is the champion of the KDD Cup last year.
He works hard for the AI aspect. It seems to be two different scenarios of predicting air quality and advertising, however, they are using the same technology. The competition is modeling and optimization using machine learning related algorithms, which as same as Hu’s daily work. On the one hand, Hu can practice his skill in applying to various practical problems, on the other hands, the understanding of technology is deepened and the new understanding can be applied to his future work.
As a big data marketing platform of Alibaba, Alimama proposed a brand strategy for creating marketing technology “Ad Tech” this year. Based on the exploration of its original business, Alimama continuous to have a deeper exchange with the scholars. Every, the paper proposed by Alimama was selected at the international top conferences in IJCAI, WWW, AAAI and other technical fields. This year, a total of 14 articles proposed by Ali Group were selected again for the SIGKDD conference. At the same time, Alimama exchanges the information of algorithmic by hosting the algorithmic contests, such as IJCAI, the international artificial intelligence event, and the Alibaba Tianchi Platform jointly organized the IJCAI 2018 Alimama International Advertising Algorithm Competition.
Under the philosophy of Ad Tech, the ideal marketing is gradually moving into reality through the continuous advancement of technology in driving the optimization of advertising scenes.