Adventures of a TensorFlow.js n00b: Part II: The Machine Trains Me

7 min readAug 7, 2019

David Weinberger, an independent writer, is currently a writer in residence on Google’s People + AI Research (PAIR) team. During his ongoing residency, he is looking at machine learning technology within a broader context of social issues and ideas, and documenting his experiences using new ML tools, from What-If to TensorFlow.js. (See Part One of this series: “Adventures of a TensorFlow.js n00b.”)

Recap

Taking advantage of the fact that I am a writer in residence at Google embedded in a machine learning (ML) research group staffed by extremely patient developers, I’m learning how to write an ML app using TensorFlow.js, which happens to be one of the group’s projects. Note that I am not a developer. My aim is to better understand conceptually what goes into training a machine learning system. This is not a tutorial. (This is an official tutorial. And here’s a good one from outside the group.) I am a hobbyist JavaScript coder. I greatly enjoy it. I am very bad at it.

This is part two of a series in which I document just how bad. (And how greatly I enjoy it.) In my first post, I wrote about how, after some false starts, I decided on an example project to train a machine learning system to figure out which tags apply to images a user inputs. So I downloaded metadata for about 3,000 images at Flickr that were posted under a Creative Commons license. That metadata includes the tags users have applied to those photos. Then I found the 200 most-used tags; the least used of these had been applied to nine photos. I threw out any image that did not have at least one of those top tags, leaving about 2,000 photos. I created a JSON file that records the relevant metadata — especially the tags — for each of those images.

The cliffhanger was: Might I have mixed up the images and the tags while doing all of the above? To check this, all I had to do was write a trivial little JavaScript app that has nothing to do with machine learning …

Some time later

…and it took me a couple of weeks. Not full time, so at least I have a little dignity left.

My little JavaScript app reads in the JSON files and displays an alphabetized list of the tags. Click on a tag and it loads in all the images that have that tag, as well as the metadata for each of those images. Then you can use your EBIS (eyeball-brain inspection system) to check that the right tags are associated with the right images.

Snippet of my fabulous viewer app. Note the error in tag counts — an error in the viewer, not in the data set. Oh well.

Weeks. Sigh. For example, I felt it crucial to my education in machine learning to make sure that the images show up in horizontal rows that resize as the window resizes. I mean, the proper use of flexboxes is at the very heart of machine learning, isn’t it?

Isn’t it?

Wrangling the data to death

Now, this might sound like a side story about error, poor judgment, and failure. And it is. But the story of my attempts to clean up the data is also about how machine learning can be influenced by seemingly extraneous factors in judgments about data — seemingly extraneous to ML n00bs like me, that is. And, spoiler alert, it’s also about how we humans are influenced by the process of thinking through what data to give machine learning to chew on.

Some of the issues had nothing to do with Flickr. They require a mental model of how the ML model learns. Here’s one example: In my randomly downloaded set of images, there’s a batch of about a hundred, all uploaded by the same person, and each with the same set of forty tags. Each is tagged “Osaka,” and depicts the sorts of things one might photograph as a visitor, such as landmarks, but also memorable lunches, shoes in windows, colorful fruit in a grocery store, and so on. These are meaningful tags to the user, but they’re likely to just confuse a machine learning system. For example, If my finished app were to suggest “Osaka” for your photo of colorful fruit in your local (non-Osaka) grocery, I’m pretty sure you’d conclude that my app is broken. And you’d be right.

The Osaka 100 were not the only image sets that shared identical tag sets. It seemed (according to my feeble mental model, abetted by Yannick Assogba, the Google dev guiding me through this process) to be asking a lot for ML to sort out what to learn from multiple, and often quite generic tags, attached to an image set. It’d be like trying to teach you Japanese by showing you a hundred Osaka photos each tagged with the same forty tags in Japanese.

So I tried a bunch of things, including reducing the big tag sets attached to some images to the single most used tag among them. For example, “woman” is a highly used tag across the entire set of photos, but if it is one of twenty tags attached to a scene at the Grand Canyon because there happens to be a small figure of a woman balanced on a canyon edge, reducing those twenty tags to just “woman” would be telling the ML model that it should learn about “woman” from that tiny figure in the Grand Canyon photo. That’s likely to mislead the ML model. Maybe not, but maybe.

Photo by Eric Froehling on Unsplash, fully openly licensed

Plus, the tabulation of tag usage was slanted by the inclusion of the big repeated tag sets. It all just got hairier and hairier.

I also did a lot of manual pruning. I first removed tags that were about the photography or the camera used, not about the content of the photo, e.g. the tags “Nikon” and “no_flash.” Then I decided to delete tags that are translations in multiple languages, only because this simplifies the task for my toy app. (If I were working on a project that would actually be used by as many users as possible and to avoid bias, I would not delete these tags in this way; this was purely to simplify the task, for this exercise only.)

I’ll spare you the details of weeks of trying to wrangle this wild flock of tags. In short, I gave up on this photo set and looked for a new source of photos. The photos had to be available for use without restrictions, they had to have good, sensible tags, and they had to be available via an API (application programming interface) that enables batch downloading. I quickly found two sources: Unsplash and Pixabay. Two hours later, I had a new set of images, tags, and associated metadata. (After I’d finished, Creative Commons launched its own search site for openly licensed photos. And after that, I learned that TensorFlow itself has datasets good for educational purposes. D’oh!)

Why go on about this, other than that working with badly tagged images cost me two months of time? Well, actually, that is the reason. Flickr photos are tagged by users to serve their own needs. But that means that many of them are tagged badly if your aim is to train machine learning models about the content of photos. Because photos at Unsplash and Pixabay are tagged by the site — not by photographers or visitors — to help people find an image they want to use, the tags tend to be more reliably descriptive of the actual content of the photo.

My problems with tags are not unique to my little app. Machine learning systems often learn from data not designed to train machine learning systems. Often that data has been carefully labeled; for example, data from the U.S. Census comes with unambiguous labels such as “zipcode” and “sex” — although the Census only allows two values for the latter, encoding a bias. But not all data comes so well structured and labelled that a literalist statistical engine can easily learn from it.

Prepping data — often the hardest part of creating an ML system — requires thinking like a machine learning system, which is difficult because machines don’t “think” the way we do. Then, after the machine is trained, it means spotting wrong, and sometimes prejudicial, lessons the system has learned. And then you have to figure out what to do about it.

In truth, in trying to come up with a suitable data set, I’m not teaching machine learning to think like me.

It’s training me to “think” like it.

I seem now to have a clean set of images and tags. The tags are generally clear, straightforward, non-controversial, and boring. Good. I will be training a machine learning system on how to tag photos boringly. I’m not looking for surprises. Boring is good.

So, why do I have a strong hunch I’m not done managing the data yet…?

In David’s next post, he learns how machine learning trains itself.

Adventures of a TensorFlow.js n00b: Part II: The Machine Trains Me

Recap

Some time later

Wrangling the data to death

Written by TensorFlow