On artificial intelligence, museums, and feelings

A few years ago, my improv teacher had our class stand in a big circle, look each person in the eyes, and say what we liked about each other. Each of us heard at least 14 glowing, positive qualities about ourselves. It was heartwarming, humbling, and painfully awkward.

And it created an instant connection between all of us. As it turns out, building genuine connections is a key quality of good improv — it creates trust between scene players, which fosters creativity, which leads to better scenes. (Though it didn’t make me the next Kristen Wiig or Amy Poehler, but that’s another set of challenges…*cough*notgonnahappen*cough*)

In a small class setting, it’s easy to imagine how this feedback could lead to insights in individual personality traits. It’s another thing to scale “tell me how you feel about me” feedback from an individual up to an organizational level.

On the organizational side, there are well-known feedback tools like customer satisfaction surveys and Net Promoter Scores. To gather direct, qualitative data, museum visitors might be interviewed in a hall, emailed after a visit, or surveyed on a website to learn more about their experience.

How, when, and where these questions are asked is often done by the museum or accomplished with a hired partner. Once it’s done, the findings are a snapshot in time and the process must be repeated to understand trends in visitor sentiment.

And yet, these hard-won findings are a tiny piece of visitor feedback, compared to third-party website reviews.

Not surprisingly, more visitors are sharing their experiences through reviews on websites like Yelp or TripAdvisor. Mid-sized and larger museums can have reviews in the tens of thousands on every aspect of a museum visit….but manually surfacing insights from this gigantic, rich data set is challenging and difficult to repeat.

Machine learning (a branch of AI) offers a set of natural language processing (NLP) algorithms that can analyze text and quickly extract information. Popular NLP tools include:

  • Translation services: Translation of text from one language into another
  • Entity analysis: Given a body of text, what are the people, places, and things described?

And the focus here:

  • Sentiment analysis: Given a body of text (or an entity), how negative or positive is it?
Sidebar: If you’ve used social media listening tools like Meltwater or the Tweet Sentiment Viz tool, you’ve seen sentiment analysis in action on social media content.

Sentiment analysis can be applied to reviews on sites like Yelp.com. Using a service like Google Cloud’s Sentiment Analysis, the process might look like this:

  • Review texts are identified and compiled
  • The text is sent to the machine learning tool via an API or uploaded to a cloud storage environment
  • The NLP service analyzes the text and returns results
  • Corrections are made on erroneous analyses
  • Analyses are run until a sufficient acceptance threshold is reached

Sentiment is often expressed as a score ranging from -1 (negative) to +1 (positive). A score of zero is neutral. If the review text is analyzed and returns with a score of -0.6, your visitor wasn’t very happy. On the other hand, if the review comes back with a score of +0.7, you had a happy visitor. Words and phrases like “beautiful” and “we loved it” indicate a positive sentiment and push the score over 0.

To get acquainted with Google Cloud’s sentiment analysis service (and to try a simple & fun experiment), I ran sentiment analysis on the top 7 museums in the U.S. by attendance, according to Wikipedia. I compiled reviews from a popular tourism website and submitted them to Google Cloud’s Sentiment Analysis service to generate the scores.

The average sentiment score for the top 7 museums in the U.S. was 0.43

The range of scores on the compiled review content was 0.3–0.6. Only English reviews were considered.

Some notes:

  • The score in and of itself is an interesting finding… but better used as a way to measure trends in sentiment over time.
  • Bear in mind this is a default analysis, without attempts to train or correct Google’s NLP model.
  • Volume of reviews is a big factor. One museum in the list had over 80,000 reviews; another had less than 5,000.
  • An attempt was made to submit “clean” text (no invalid characters), but no additional changes were done.
  • And, last but not least, sentiment analysis isn’t a flawless algorithm. Context is everything and you’ll find a fair number of erroneous classifications…it comes with the territory. That said, Google’s service gets very high marks for its accuracy.

(Also worth noting: Everything described here was free.)

Sidebar: Beyond what’s covered here, sentiment analysis on extracted entities is an even better application of the algorithm (and more relevant to visitor-focused organizations, in most cases). For example, you can use sentiment analysis to analyze focused feedback on specific things, like special exhibitions, museum admissions, or onsite food options.

Sentiment analysis of third-party reviews is a solid supplement to quantitative data about visitors. It’s not a magic trick of jaw-dropping insights, but it does offer a way to surface visitor insights on content that’s tough to analyze.

This was think piece #4 in the “On artificial intelligence, museums, and hot dogs” series. Thanks for reading! Thoughts? I’m at @CuriousThirst.