Science Reversions in Torino: DSAA’18

Oh, sweet Torino! Far from the mainstream international routes that lead businessmen to Milan and tourists to Rome and Venice, Torino is a gem of art and architecture and a vibrant hub for science and innovation.

Last week, Torino hosted three high-profile scientific events: the 5th Conference on Data Science and Advanced Analytic (DSAA); an Industrial Day sponsored and hosted by Intesa Sanpaolo in its brand-new skyscraper designed by Renzo Piano; and “Science Crossroads”, a yearly event organized by the ISI Foundation to award young scientists from all over the world with a Fellowship in recognition of their scientific achievements. Francesco Bonchi and Foster Provost, general chairs of DSAA (+team) and Ciro Cattuto, Scientific Director of the ISI Foundation (+team) did an excellent job in putting together five full days of scientific gatherings.

Even though the the program was very interdisciplinary, most of the talks felt interconnected by a common philosophical theme: in the inchoate era of Artificial Intelligence, we may need to revise our scientific methods to gain full understanding of the technologies we develop and of their ethical implications. Sometimes, this process might entail borrowing some forgotten elements of our past to move faster towards a brighter future. Here’s the summary of these “Science Reversions”.

The Role of Modeling in the Era of Big Data

A generative, mechanistic epidemic model: Big Data and Machine Learning form a predictive system wrapped around a core model tailored on the specific pandemic under study

A Mechanistic View of Science (Alessandro Vespignani, Northeastern University). Alessandro Vespignani is a modern numen of computational epidemiology. He opened his talk with a brief survey on the history of numerical epidemic models, emphasizing how their advances virtually halted in the ’50s. Those data-hungry models started to work accurately only after the Big Data revolution: Multidimensional and granular data from a galaxy of public and private providers allowed epidemiologists to predict with striking accuracy the spreading of pandemics like H1N1, Ebola, and Zika. Inebriate by the power of Big Data, the scientific community explored new predictive models that were increasingly data-driven and less focused on understanding the underlying phenomena. That line of thinking led to glaring failures, epitomized by the infamous Google Flu Trends fiasco. The “I don’t care about understanding, as long as it works”-narrative is fundamentally wrong, Vespignani argues, advocating for a reversion to a mechanistic approach. Researchers should first produce carefully-crafted models with domain knowledge baked into them and then integrate those into a larger predictive system. The use of Big Data and machine learning is still crucial for these models to work with maximum accuracy. He provided examples of how social media data can be used to bootstrap modern epidemic models and machine learning techniques are beneficial to create ensembles of different models. The Big Data enthusiasts who proclaimed The End of Theory ten years ago may have spoken too soon. His new, colorful book Charting the Next Pandemic is out.

Chris Bishop illustrating the results of models based on factor graphs with a toy interface for movie recommendations

One Model to Rule Them All (Christopher Bishop, Microsoft Research). There is quite an abundance of Machine Learning methods out there. A question that must have flashed in the minds of most machine learning students at least once is: which of those approaches is the absolute best? Unfortunately, in the general case, there is no free lunch: when averaged over all the possible problems, any algorithm is as good (or bad) as any other. But what about neural networks? Aren’t those sort of magic? Well, sadly, neural networks have a lot of assumptions and prior knowledge baked into them, which makes them excellent tools only for some (albeit very important) classes of problems. In line with Prof. Vespignani, Chris Bishop (world-renowned pioneer of AI and author of a best-selling technical book on machine learning) asserted that we cannot abandon domain knowledge and rely on Big Data only: to obtain good predictions we need to lay assumptions based on our knowledge of the problem. Only after embedding those assumptions into a model we can turn to Big Data for training. But what if we could automate this process? Could we have a tool that asks the analyst to write down a set of assumptions and takes care of all the rest? Infer.net is Microsoft Research’s attempt to achieve this vision. Infer.net offers a compiler that converts a high-level model description directly into source code to perform inference. Its flexibility is its power: it can cope with the variety of real world problems and it has been already used in a number of real-world use cases. Bishop’s new book Model Based Machine Learning is out and in early open access. Physical copies will be sold and the proceeds will be matched my MSR and will go to fund research on cystic fibrosis. So, in Bishop’s words: “please buy the book even if you don’t read it”.

It is easy to tell bots apart from humans, less so to identify different types of users

The Limit of Predictability (Claudia Perlich, Dstillery). How far can we push the accuracy of machine learning predictions? In many real-world problems, classification models can reach impressive accuracy. But this doesn’t mean that those models are particularly smart. By design, machine learning models focus on what’s easy. In classification, for example, the model focuses on those datapoints that can be easily separated from the others. Regrettably though, easily separable instances are those with the littlest value. Strong of her experience as an advisory scientist at Dstillery, Claudia Perlich elucidated this concept with a number of examples coming from real-world ML tasks. It is easy to classify types of web users until you realize that most of your true positives are bots that leave very distinctive digital traces. Click prediction on mobile devices can achieve excellent results mostly because of people accidentally tapping on links while trying to activate the flashlight on their phone. Last, by mining geo-referenced data, you may be able to easily spot plenty of frequent-travelers only to realize later on that all them are flight attendants. To avoid frustration, we need to stop and (again) revert to the basics. Her main advice is to think carefully about the set of metrics we optimize for. Blindly using Click Trough Rate (CTR) is often not a good idea. More crucially, when machine learning is applied to critical tasks such as policing or job recommendations, we ought to make a very responsible use of the learning techniques we use: picking the wrong metric can yield disastrous societal outcomes.

Explaining machine learning predictions is ultimately a causal inference question

Explaining High-dimensional Machine Learning Models (Foster Provost, NYU). In a machine learning prediction task, when the gain in precision given by new training instances declines, adding new features is the main way forward. With more features, a model can once more benefit from additional training points. Looping over the addition of rows and columns to a training set yields monstrously complex models that work well but whose output is hardly interpretable. Foster Provost wants to find out why high-dimensional model work. Prof. Provost can boast a long-lasting experience in the startup world (he founded 5 successful ones) and his faceted background gives him a good edge in the ability of addressing real-world problems by solving their underlying fundamental challenges. He rightly contended that listing the top-predictive variables (maybe out of millions) is not a good way to provide explanations. Instead, he argues that explainability should be approached as a causal inference problem: finding the minimal piece of evidence that, if not present, would have led the model to take a different decision. In an algorithmic setting, causal inference is possible because we can potentially observe the outputs of all the possible inputs. He used this reasoning to provide explanations in the task of document classification but the method he proposes is very general, as it is completely agnostic of the learning algorithm to focus only on the input and observed outputs. Prof. Provost’s book Data Science for Business is out. He is an artist too: his first album “Mean Reversion” (which inspired this post’s title) has been recently released. Check it out!

Scientist and musician: listen to a sample of Mean Reversion’s first single ”Blessing of My Birth”.

Enlightening the Dark Side of Technology

Disabilities are mapped into their corresponding “Coolabilities” for the purpose of job matching

Averting the Apocalypse of Intelligent Technologies (V R Ferose, SAP). A decade ago, Web technologies were all but good: they were bringing people together and information to everyone. In the last few years, This positive view of technology has changed dramatically. V R Ferose, visionary senior VP at SAP, lists some examples of the long-term negative consequences of technology ranging from the emerging behavioral issues in kids interacting with home assistants to the deaths caused by fake news spreading in local communities. Ferose’s mission is to learn how we can understand the long-terms consequences of technology and spin them towards good outcomes. He provided five possible ways to move forward. The first is decentralization: when the data is controlled by just a few it is harder to prevent misuse (Tim Berners Lee’s Inrupt is an attempt of bringing the Web back in the hands of The People). The second is changing narrative: he provided a powerful example of describing as “coolabilities” the unique strengths of people with disabilities and how those can be used to increase job offers matches. The third is automation of policies: ideally, any technological requirement decided by law should propagate automatically on every existing deployment and implementation. The fourth is changing learning metrics: the way in which machine learning methods optimize their objective functions can dangerously amplify existing anomalies (e.g., discrimination). The last is democratization: it should not be only up to a few billionaires to solve the world’s problems, especially when they have huge conflicts of interest; he exemplified this concept with a Gandhi quote: “it’s not what to do with your money that is important, it is the purity of the means of what you earn that matters”.

The Web suffers from a number of biases that are all inextricably entangled

The Gordian Knot of Biases (Ricardo Baeza-Yates, NTENT). The Web is also plagued with biases. Ricardo Baeza-Yates, one of the fathers of modern Information Retrieval, has recently devoted much of his work in the study of how biases emerge online and interact with each other. There are biases of all kind. Most of the content online is produced by a tiny fraction of Web users; we consume that content through interfaces that catch our attention in predictable ways; the information we access is filtered and ranked by algorithms which, in turn, are trained on biased behavioral data. These biases have very different roots (cultural, statistical, cognitive) and are all entangled in a Gordian knot that is impossible to untie (and it would be probably absurd to cut it off with the swing of a sword either). Some solutions indicated by Ricardo recall Fenrose’s recipes. He advocates for going back from a “human in the loop” paradigm to the “human at the center” philosophy, which entails the adoption of better policies and a more conscientious use of Big Data (by transitioning to small data when needed). He concluded with an optimistic rebuttal to Noah Harari’s last book, leaving the audience with the hope that benefits of AI greatly outweigh its potential risks.

Debunking attempts of resurrecting physiognomy

Learning Bias Mitigation (Margaret Mitchell, Google). Try to play this little game: take a look at an image of bananas and come up with five terms to describe it. Done? It’s quite likely that your list of words won’t include the color of those bananas. This is because our brain has a stored representation of reality which does not necessarily reflect what happens in the world: in our mind bananas are yellow and we don’t need to mention that because it sounds obvious. Similarly, we associate the word “doctor” to a male figure and “couple” to heterosexual partners. When describing things around us, we report surprising experiences (a murder) and gloss over obvious and very frequent facts (a person blinking). The online data we use to train machine learning models are soaked in such human-reporting biases and propagate in all the automated pipelines of our algorithmic systems (a phenomenon called “bias network effect” or “bias laundering”), ultimately leading to the amplification of injustice. Margaret Mitchell is a scientist working on AI at Google and she is interested in understanding how we can mitigate the biases in machine learning. The are glaring examples of these biases percolating into machine learning models, including the latest lombrosian algorithms that detect criminals and homosexuals from faces, which Dr. Mitchell promptly debunked. She then proposed a number of ways to mitigate biases, the most interesting of which is an adversarial approach. The model simultaneously learns a predictor and an adversary: the objective is to maximize the predictor’s ability to predict the outcome variable while minimizing the adversary’s ability to predict the protected attribute.

Maximizing diversity of information in social networks

Breaking the Filter Bubble with Algorithms (Aris Gionis, Aalto University). Echo chambers are yet another facet of social media biases. Aris Gionis, professor at Aalto university and newly-nominated ISI Fellow, provided three possible algorithmic solutions to burst those filter bubbles, or at least to mitigate their effect. The first mitigation strategy is to improve awareness. He proposes to do so by mapping users and content from Twitter into a latent space which allows to automatically compute an ideology score that can be associated to the user. The second is to maximize the diversity of content users are usually exposed to in their social circles (“tell me something that my friends don’t know”), which turns out to be a NP-hard problem also in its approximated version. The last is selecting people who are good representatives of opposing ideologies but whose agreement is as high as possible. He concluded with a warning, though: we don’t know to which extent our attempts of reducing polarization might actually backfire and increase conflict rather than reducing it.

Social contagion is one of the fundamental social processes that has impact on a number of societal outcomes

Understanding Collective Phenomena for Social Development (Marton Karsai, École Normale Supérieure de Lyon). Understanding social phenomena on the Web is the first step to turn technology into something that improves people’s life. Marton Karsai, a complex system scientist who masters data science and computational methods, has studied large-scale online data to link collective social phenomena to social development. He studied social influence on the Skype social network, the relationship between the use of language and socioeconomic status, and how homophily is impacted by economic status. In this last work, he looked at an impressive dataset that matches mobile calls and credit information history and found that social classes are homophilous and the richest class is much better connected than the poorer ones. His methods and data open up to many opportunities to corroborate theories from social sciences about class mobility (Bordieu’s theory is the first that comes to mind). Prof. Karsai is also one of the recipients of the ISI fellowship.

Advancing public policy by relying on algorithmic systems can be quite a nightmare

The Governance-by-Design Dystopia (Deirdre Mulligan, UC Berkley). Many speakers pointed out the importance of smart policies to save ourselves from the potential risks of new technologies. We were lucky enough to be able to listen to an expert in the domain: Deirdre K. Mulligan, professor of law at the UC Berkeley School of Information, an academic at the forefront of the endeavor of designing new policies for privacy management. According to Prof. Mulligan, we live already in an era of Governance-By-Design: we use and design technological systems for the advancement of public policy. Even if governing through technology is a quite seductive concept, she acknowledges that, at the moment, this approach suffers from severe limitations. To illustrate her point, she told the stories of four famous cases in which law and technology clashed in various ways: the Apple vs. FBI encryption war; the EU General Data Protection Regulation (GDPR); the Stop Online Piracy Act (SOPA), and online voting. GDPR is a good representative example of how achieving privacy-by-design is hard: it is tough to foresee what the implications of GDPR might be on future businesses and on desirable properties orthogonal to privacy, such as fairness. Unfortunately, the inability of estimating the impact of design choices on future developments is one of the curses of design in general (read the story of Robert Moses for an example in the domain of urban design). However, prof. Mulligan offered some suggestions on how governance-by-design could be improved. 1) We should design with modesty and do not dictate exactly what to do because it’s hard to predict the future; 2) We should improve the technical expertise of regulators; 3) We need to keep the public in the loop of the discussions about policies. The importance of those propositions becomes apparent when thinking that even the choice of a simple threshold in a classification system becomes a heavy political responsibility when the system has the power of impacting the life of many.

Mapping health, culture, and well-being

A platform to monitor exposure and estimating impact on health

Mapping Air Pollution Mortality (Francesca Dominici, Harvard University). 
Only in US, thousands of people die every year because of air pollution. Francesca Dominici, an interdisciplinary statistician who has built a stellar career by linking Big Data with health outcomes, talked about her experience in studying this silent killer. She has developed a neural network model that uses on-the-ground air-monitoring data and satellite-based measurements to estimate daily pollution levels across US at very high granularity. By matching that information with Medicare data (460 million health records covering 97% of the population ages 65 or older) she showed that exposure to air pollution is killing thousands of senior citizens each year. These people could be saved by just cutting 1 microgram of fine particulate matter per cubic meter of air below current standards. The richness of the data she has studied opens wide opportunities but poses also challenges. She discussed how the threat of unmeasured confounding bias is amplified, and causality is even harder to assess with observational studies. More about her project in this podcast.

The intangible layers of the urban fabric

Spatial AI for Health and Well-being (Rossano Schifanella, University of Torino). The surrounding environment influences us in ways that we often don’t consciously perceive. Upon receiving his ISI Fellowship, Prof. Rossano Schifanella talked about his extraordinary research journey to map the urban space with the aim of improving well-being through new services and design policies. He argued that well-being is connected to intangible aspects of urban life. Over the years, he mined data from social media and a variety of other sources to build a multi-layer model of intangible aspects that define the urban space. He mapped the sensory perceptions that people have while walking in the city (visual beauty, smell, sound), the walkability of a street, the activities that happen indoor and outdoor, and even the ambiance of neighborhoods. He has also worked to measure how physical transformations of the city can impact subjective well-being. Using data from a mobile operator, he has studied the impact of the construction of large-scale urban infrastructures on the behaviour of city dwellers. He also talked about his recent work in mapping the health of city dwellers using granular and large-scale medical data. Overall, an impressive range of studies about urban life, some of which are collected on the goodcitylife.org portal.

Languages in tweets reveal the level of ethnic integration of cities

Studying Geography from the Lens of Twitter (Bruno Gonçalves, JP Morgan). Bruno Gonçalves is a physicist, computer scientist, new ISI fellow, and prolific twitterologist. In his talk, He provided a visually-powerful overview of how geo-referenced tweets can be used to learn about cultural and social dynamics at world-scale and with an impressive granularity. He studied the multiplicity of languages used to tweet in cities to estimate their level of ethnic integration. He also looked at how the discussion around a topic spreads spatially to spot the trend-setting cities and to identify the urban centers that follow them (the paper cleverly uses transfer entropy to estimate influence between cities). Last, he presented a comparative study between Twitter and Sina Weibo to cast a cautionary tale about data representativity: the observed characteristics of a phenomenon may greatly vary depending on the social media used as a platform of analysis. So, pick the appropriate data to answer your research question.

What else?

There was so much going on including more parallel sessions with scientific presentations and tutorials, a great social dinner and celebrations of all sorts. In this post I reported a selection of the presentations I saw but check out the DSAA program for more pointers.

Until next time…

The Science Crossroads will be held next year in Torino, as usual. The next edition DSAA will be hosted in Washington, DC. I hope the organizers will keep the tradition of having an industrial day, this year it was terrific!

DSAA 2019 in Washington, DC

In the meantime, if you happen to be in UK in December, you should consider dropping by the Complex Network Conference in Cambridge. Hope to see you there!

Complex Networks conference in Cambridge, UK

Thanks for reading! If you liked the post, please hit the applause 👏 button. You can also connect with me on Twitter to share ideas or give suggestions.