Computer Vision Networks

Janna Joceli Omena
iNOVAMedialab
Published in
7 min readSep 4, 2020

Reflections on novel perspectives and challenges for digital visual methods.

The history of “climate emergency” visuality based on Google Image search results (2008–2019) and the circulation of this visuality across the web according to Google Cloud machine learning services. Image source: https://smart.inovamedialab.org/2020-digital-methods/project-reports/cross-platform-digital-networks/climate-change/ (Work with GV API by Jason Chao, network visualisation with images by Giacomo Flaim).

Computer vision web services have been serving well big tech companies for many years and with exponential growth over the years. However, the potentialities of artificial intelligence and machine learning models for the purpose of digital network studies are still unknown or little-explored. There is nothing to blame on this lack of exploration, after all, we are talking about networks that demand to be created and, not exactly about networks afforded by social media APIs. This is the case, for instance, when one uses YouTube Data Tools to visualise and explore networks of video-content relatedness or to map political affinities through channel networks. Computer vision networks meet an ever-more-complex context because they are built upon

i) pre-trained machine learning models; ii) the advantages of software and data for building and plotting networks; iii) and, not least, the medium-specific perspective proposed by digital visual methods.

This attempt to define computer vision may speak for itself, right? I mean, computer vision networks are not so simple as they may look like — such as the beautiful network visualisation above. In this network, we see the visuality of “climate emergency” according to Google Image Search results and their sites of circulation according to Google Vision’s web page detection module. The creation and purpose of this network are explained elsewhere (here and also here), but, now, more important is to understand what is inherent to the process of building and analysing such type of network. That is a process made of many layers of technical mediation; a research process that also makes the existence of digital visual methods.

This short introduction sets the scene for my first Medium Story (I’ve written more stories elsewhere) which raises questions such as: Why should new media scholars look at vision APIs for studying networks of images? What for? How to read these networks? As you may have realised, yes, this story will be quite informative and descriptive as well.

But before responding to these questions, there is another important note to be highlighted that concerns the original proposal of merging machine vision and networks for social research; introduced by a group of designers and researchers in 2017 (Donato Ricci, Gabriele Colombo, Axel Meunier and Agata Brilli). Their work has inspired other scholars, including myself, in exploring computer vision networks as much as benefiting from the interesting findings afforded by these networks.

A second important note concerns the types of services that are currently offered by the main artificial intelligence and machine learning services web-based services. In the alluvial diagram below (which does not provide a thick description of these services), we see the services and the year of the launch of Amazon Rekognition, Google Vision API, Microsoft Azure, Clarifie and Imagga. Image automated classification with basis on predefined labels and content moderation (or the detection of unsafe content, e.g. sensitive, violent or porn content) are the services shared by all Vision APIs. The detection and recognition of faces and facial attributes, demographic analysis and celebrity detection are other potentialities afforded by computer vision. Moreover, specific features of Amazon Rekognition called my attention, such as detecting labels in a video or detecting people path in stored videos. This latter provides information about “the location of the person in the video frame at the time their path is tracked and facial landmarks such as the position of the left eye when detected”.

A map of the main computer vision APIs and services. Source: https://www.slideshare.net/jannajoceli/how-to-read-computer-visionbased-networks-repurposing-machine-learning-to-social-media-research

Why computer vision for studying networks of images?

Web-based vision APIs have indeed several affordances, not only positive in terms of re-purposing technologies but controversial and problematics for research. However, here, I will be addressing specific elements of Vision APIs and their potentialities for social and medium research. Basically speaking, see below three features that allow the creation of image networks:

  1. Image classification according to pre-defined or custom labels which allow the building of networks of images and their descriptive layers.
  2. The detection of web entities in an image which allows the building of networks of images and their web entities
  3. The detection of web pages in which an image has appeared allowing networks of images and their sites of circulation across the web.

While the so-called image-label networks have been gaining space in digital research over the years by facilitating the interpretation of large image datasets, the potentialities of computer vision image-web entities and image-domain networks are still underexploited. Here we face innovative methodological attempts that aim to develop new forms re-purposing computer vision for digital networks-driven studies. To illustrate how the three networks look like, also to understand their particularities, see the visualisation below.

Instagram #microcephaly engagement I 10.797 images published between June 2012 and October 2017. Google Vision API I Modules: Detect Labels and Detect Web Entities and Pages

What for?

How computer vision networks may serve digital research? What for? What research questions can be made? To respond to these questions, let´s keep using the same structure previously presented, but feeding it with extra information.

  1. Image classification according to pre-defined or custom labels which allow the building of networks of images and their descriptive layers. Computer vision image-label networks serve as means for studying, mapping and exploring the imagery of (based on pre-defined or custom labels) and for interrogating the medium (the vision API in itself). For instance, the imagery of political polarisation, institutional communication, issue networks, cultural representation etc. While doing so, we can also detect the limitations and bias of the Vision API.
  2. The detection of web entities in an image which allows the building of networks of images and their web entities. Computer vision image-web entities networks serve as means for studying, mapping and exploring the imagery of (labels obtained from the Web) and for interrogating the medium (the vision API in itself combined with the cultures of use within the Web environment and its infrastructure). For instance, using web entities descriptive terms to the study of a collection of images related to Covid-19 or Zika virus (e.g. before an image containing the mosquito that transmits the Zika virus, Google vision API would return labels such as Chikungunya virus infection, mosquito-borne disease, outbreak, infection).
  3. The detection of web pages in which an image has appeared allowing networks of images and their sites of circulation across the web. Computer vision image-label networks serve as a means for studying image circulation - the sites of image appearance across the web and the related actors (link domains). Moreover, it allows the detection of the visuality that sticks within or flows out of social platforms. If compared with the previous ones, computer vision image-domain networks have a dynamic view of the subject of study. They also allow both the detection of the dominant link domains within the network (those capable of gathering a more diverse number of image appearances) and the detection of clusters of link domains that share similar visualities.

How to read computer vision networks?

This is definitely the most complex question here, that is also the reason I am writing an academic article about it. So, instead of sharing solutions, I am closing this medium story with preliminary findings that may help the exercise of reading computer vision networks (see below).

Image source: https://www.slideshare.net/jannajoceli/how-to-read-computer-visionbased-networks-repurposing-machine-learning-to-social-media-research

But, I also want to say that in order to read computer vision networks, we should first address another type of question:

What precedes and takes place with and in computer vision networks?

This is actually “the question” or what drives us to reflect on the challenges of using digital visual methods for social and medium research. Let´s say that the first step has been taken, through the provision of a technical description and definition for computer vision networks and the introduction of their potentialities for digital research. Let’s get back to talk about this matter in the near future, shall we?

If you want to learn more about the affordances of computer vision networks for digital research, have a look at the following list of papers:

Ricci, D., Colombo, G., Meunier, A., & Brilli, A. (2017). Designing Digital Methods to monitor and inform Urban Policy. The case of Paris and its Urban Nature initiative. In: 3rd International Conference on Public Policy (ICPP3)-Panel T10P6 Session 1 Digital Methods for Public Policy. SGP, 2017. p. 1–37.

Mintz, A., Silva, T., Gobbo, B., Pilipets, E., Azhar, H., Takamitsu, H., … Oliveira, T. (2019). Interrogating Vision APIs. Lisbon. Retrieved from https://smart.inovamedialab.org/smart-2019/project-reports/interrogating-vision-apis DOI: 10.13140/RG.2.2.17204.40323

Omena, J.J., Chao, J., Pilipets, E., Kollanyi, B., Zilli, B., Flaim, G., … Nero, S. (2019). Bots and the black market of social media engagement. https://doi.org/10.13140/RG.2.2.30518.52804

Omena, J.J., & Granado, A. (2020). Call into the platform! Revista ICONO14 Revista Científica de Comunicación y Tecnologías Emergentes, 18(1), 89–122. https://doi.org/10.7195/ri14.v18i1.1436

Omena, J. J., Rabello, E. T., & Mintz, A. G. (2020). Digital Methods for Hashtag Engagement Research. Social Media + Society. https://doi.org/10.1177/2056305120940697

Geboers, M. A., & Van De Wiele, C. T. (2020). Machine Vision and Social Media Images: Why Hashtags Matter. Social Media + Society, 6(2). https://doi.org/10.1177/2056305120928485

Two references for Portuguese speakers:

Silva, T., Mintz, A., Omena, J. J., Gobbo, B., Oliveira, T., Takamitsu, H. T., … Azhar, H. (2020). APIs de Visão Computacional: investigando mediações algorítmicas a partir de estudo de bancos de imagens. Logos, 27(1), 25.54. https://doi.org/doi:https://doi.org/10.12957/logos.2020.51523

Silva, T.; Barciela, P.; Meirelles, P. Mapeando Imagens de Desinformação e Fake News Político-Eleitorais com Inteligência Artificial. 3o CONEC: Congresso Nacional de Estudos Comunicacionais Da PUC Minas Poços de Caldas — Convergência e Monitoramento, 413–427, 2018. Retrieved from https://conec.pucpcaldas.br/wp-content/uploads/2019/06/anais2018.pdf

--

--

Janna Joceli Omena
iNOVAMedialab

Digital methods researcher interested in platforms & software studies, technicity-of-the-mediums, digital networks https://thesocialplatforms.wordpress.com/