MultiMedia in 2016: One Step Beyond!

Miriam Redi
SocialDynamics
Published in
4 min readNov 3, 2016
[you can listen to this Madness beat while reading]

This has been my 6th ACM MultiMedia (on-and-off). Very short time compared to senior members of the community. But still enough to start feeling signs of change within the multimedia analysis field. Most of the papers some years ago were about content-based multimedia retrieval - how to automatically tag and rank images/videos according to their semantic content (e.g. this is a cat). This year, we went one step beyond, and showed the world how multimedia analysis can be used to read, tell, and create stories through audiovisual data. And this is VERY exciting :)

To give you an example — I’ll start with the best paper awards:

1) Best Paper: Topic-opinion mining from images and text from social media, with visualisation applications!

2) Best Student Paper: a Deep Learning framework for ingredient recognition in food images.

3) Best Short Paper: Deep Learning for Image Memorability Prediction: the Emotional Bias — an image memorability detector revealing important insights about how datasets for non-semantic image analysis need to be appropriately distributed within the emotional space.

More! What can Multimedia Analysis tell from audiovisual data?

Subjective properties of visual data. Multimedia analysis can see invisible properties beyond semantics. This is done mainly using deep learning -> with limited insights about why certain properties gets detected — hope to see more analytics works next year.

  1. Emotions. In the past few years, few works in MM looked at understanding the emotions conveyed by visual data. This year we went one step beyond by Predicting Personalized Emotion Perceptions of Social Images! Many other works on affective audiovisual analysis this year follow the same trend: a context-aware emotion recognition for videos, and an emotion-biased soundtrack generator for videos (this is VERY cool).

2. Aesthetics. The good old computational aesthetics field has gone more:

a) multimodal (text+image) with the Joint Image and Text Representation for Aesthetics Analysis (see my very bad picture)

b) multicultural with our work in collaboration with manovich on analysing aesthetics of photo cultures around the world (with insights like “people in Berlin tend to take monochrome pictures of Architectural subjects” — some people said “oh, hipsters!”).

I’ll also put in this category an automatic make-up artist based on deep learning called Beauty eMakeup

3. Ambiance. From IDIAP’s @SabMayaHai a work on detecting the ambiance of places by training a CNN to look at pictures of the venue, called InnerView: Learning Place Ambiance from Social Media Images

4. Interestingness. Michael from ETH @GygliMichael has shown his amazing work on detecting interestingness of animatedGIFs. People find pictures of cats more interesting than pictures of people, apparently.

5. Sarcasm. Rossano Schifanella and @palomadejuan made a beautiful work on teching machines to detect sarcas,in social media posts!

Rossano Schifanella on stage

Social dynamics and visual data. Some fairly recent works have started exploring the impact of visual data in social and urban spaces — hoorray!

  1. Popularity. @MasoudMazloom did some fantastic work on social multimedia analysis for marketing: Multimodal Popularity Prediction of Brand-related Social Media Posts. Also at MM we saw very novel work on my beloved world of micro-videos showing the power of visual features to predict Vine popularity.
  2. Use of Emoji. Francesco Barbieri from UPF Barcelona did an interesting work on understanding emoji usage around the world using twitter data. Finding that Italians use the TOP emoji to say “OK” :D
  3. Urban Spaces. Marco de Nadai from University of Trento @denadai2, using Deep Learning + mobile phone data mining, answered the question: are safer-looking neighborhoods more lively? Finding that safer looking neighborhoods are more active than what is expected from traditional metrics, and that the correlation between appearance of safety and activity depends on the demographic of the population.

Other Highlights

Photo Credits david @ayman shamma — thanks!
  1. Our friend Bart got the “Multimedia Rising Star Award” and we are very proud of him. Bart, @ayman and Gerald released this awesome Flickr dataset with 100 Million pictures.
  2. At the Multimedia Rising Star Symposium young impactful researchers had the opportunity to talk about their work and experiences in a plenary session: @informusiccs Cynthia on being a professional musician and an assistant professor in computer science, @HayleyHung on “analog” social signal processing (can we discriminate academics vs non-academics from the way they seat? YES!), @judith_redi on making the quality of video experience better (especially for mom-to-daughter-skype connections :) )
@judith_redi on stage
  1. The Art exhibition this year was MASSIVE! Including manovich’s On Broadway, and Boredomresearch’s beautiful visualisations on malaria spreading.
Photo (CC) Of boredomeresearch

--

--

Miriam Redi
SocialDynamics

mad astonaut, vision research scientist, teaching machines to see the invisible