Exploring the AI Landscape. Part 3: Picturing the Present — Visualizing the Rise and Fall of Topics

12 min readMar 22, 2023

Image generated by the author using Stable Diffusion.

Welcome to the last part of our journey through the AI landscape. In the previous publications, Part 1: Crafting the Data Foundation — Data Selection and Preparation, and Part 2: Words of Change — Gaining Insights from Article Titles, we demonstrated how to collect and prepare data from arXiv.org for trend analysis, as well as how to extract valuable insights from article titles using trigrams. In this concluding part of our series, we’ll visualize and interpret the results, shedding light on the key topics and trends poised to define the AI industry.

This part consists of the following sections:

Visualize Trigram Frequencies: Histogram, Radar Chart, Circular Chart,
Visualize Trigram Ranks: Heatmap, Scatterplot with Outlier Boundaries,
Exploring the Rise and Fall of Topics in AI Research: Emerging Trend in AI, Topic in TOP-10 with Notable Growth, Other Rising Topics, Topics with Declining Interest,
Conclusion.

We will be using various visualization techniques to display the data and trends, which will make it easier for readers to understand and interpret the information. By the end of this article, we will have a better understanding of the state of AI in 2022.

Visualize Trigram Frequencies

Let’s start by visualizing the frequency table. Download the frequency table trigram_frequency_table.csv which we obtained in Part 2 of our Series:

import pandas as pd

df = pd.read_csv('trigram_frequency_table.csv', sep=',')

Histogram

Histigram is a powerful and user-friendly visualization tool, which enables us to easily explore the frequency distribution of trigrams across different years. We create an interactive, animated histogram that displays the frequency of the TOP-15 trigrams in arXiv.org Computer Science articles from 2018 to 2022. The histogram presents the trigrams on the x-axis and their corresponding frequencies on the y-axis. The bars in the histogram are color-coded based on the year, and the animation frame allows users to see how the trigram frequencies change over time. The layout and formatting of the plot are customized for better readability and visual appeal.

We use the Plotly library, specifically the Plotly Express module, to create an interactive histogram for the TOP-15 trigrams in our frequency table. Then, we create a histogram by calling px.histogram, passing in our frequency table, melted and reshaped for better compatibility with Plotly. We set the x-axis, y-axis, color, and animation frame to display the trigrams, their frequency, the corresponding year, and to animate the graph over time. Finally, we update the layout and formatting of the plot, and display the resulting visualization.

import plotly.express as px

fig = px.histogram(table_freq[:15].melt(id_vars='Trigram', var_name='year', value_name='frequency'), 
                   x='Trigram', y='frequency', color='year',
                   color_discrete_sequence=px.colors.qualitative.Dark24,
                   nbins=len(table_freq['Trigram'].unique()),
                   animation_frame='year')

fig.update_layout(title='Frequency of Trigrams in arxiv.org CS articles from 2018 to 2022',
                  xaxis_title='Trigram',
                  yaxis_title='Frequency',
                  xaxis_tickangle=-45,
                  xaxis_tickformat = '%',
                  yaxis=dict(range=[0, 0.003]),
                  height=600,
                  width=1000)

fig.show()

Animated Histogram that Displays the Frequency of the TOP-15 Trigrams in arXiv.org Computer Science Articles from 2018 to 2022. Image by the author.

Radar Chart

Next, let’s build a radar chart. A radar chart is a graphical representation that displays multivariate data across multiple axes originating from the same point, allowing for a clear comparison of various data points. One of its main advantages is its ability to effectively visualize complex data relationships.

The frequency of the “convolutional neural network” (CNN) trigram in 2018 is significantly higher than the frequencies of other trigrams throughout the entire study period. To avoid switching to a larger scale and losing the dynamics of the other years, we’ll create a radar chart specifically for the years 2019 to 2022. This will enable us to better compare the trends and trigram frequencies within this specific timeframe.

To construct the radar chart, we import the graph_objects module from the Plotly library using the alias go. The graph_objects module provides a wide array of trace types and layout options for creating interactive and visually appealing charts. In this case, we'll be utilizing the ‘Scatterpolar’ trace to generate the radar chart:

import plotly.graph_objects as go

trigrams = table_freq['Trigram'][:5]
years = ['2019','2020', '2021', '2022']
data = table_freq.set_index('Trigram').loc[trigrams, years]

colors = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA', '#FFA15A']

fig = go.Figure()

for i, trigram in enumerate(trigrams):
    fig.add_trace(go.Scatterpolar(
        r=data.loc[trigram],
        theta=years,
        fill='none',
        line=dict(color=colors[i], width=2),
        name=trigram
    ))

fig.update_layout(
    title='Radar Chart of Trigram Frequenct in arxiv.org CS Articles from 2018-2022',
    polar=dict(
        radialaxis=dict(
            #range=[0, 0.0045],
            tickfont=dict(size=10)
        ),
        angularaxis=dict(
            tickfont=dict(size=10)
        )
    )
)

fig.show()

Radar Chart Describing the Frequency of the Top-5 Trigrams of 2022 in arXiv.org Computer Science Articles for range (2019, 2022). Image by the author.

Circular chart

Another type of chart, the circular chart, also known as a polar or radial chart, offers a way to visualize data in a circular format. This chart type is especially useful for displaying cyclical or periodic data, where relationships or patterns become more apparent in a circular layout. Additionally, it serves as an attractive alternative to conventional bar or line charts.

To visualize the changing frequency of the TOP-5 trigrams of 2022 for each year from 2018 to 2022, we define a function, create_circular_chart, that accepts a year as an argument. Within the function, we select the TOP-5 trigrams for 2022 and their corresponding data for the specified year. We then set up colors and create a polar plot with a horizontal bar chart for each trigram. Next, we adjust the polar plot settings, such as the theta zero location, direction, and label positions. We configure the legend and title for each plot, displaying the data’s year. Finally, we loop through the list of years (2018 to 2022) and call the create_circular_chart function for each year. This process yields separate circular bar charts for each year:

import numpy as np
import matplotlib.pyplot as plt

def create_circular_chart(year):
    trigrams = table_freq['Trigram'][:5]
    data = table_freq.set_index('Trigram').loc[trigrams, year]

    max_val = data.max() * 1.001
    colors = ['#636EFA', '#EF553B', '#00CC96', '#AB63FA', '#FFA15A']
    
    ax = plt.subplot(projection='polar')

    for i, trigram in enumerate(trigrams):
        ax.barh(i, data.loc[trigram] * 2 * np.pi / max_val,
                label=trigram, color=colors[i])

    ax.set_theta_zero_location('N')
    ax.set_theta_direction(1)
    ax.set_rlabel_position(0)
    ax.set_thetagrids([], labels=[])
    ax.set_rgrids(range(len(trigrams)), labels=[''] * len(trigrams))  # Set labels to an empty list

    plt.tight_layout()
    handles, labels = plt.gca().get_legend_handles_labels()
    order = list(range(len(labels)))
    order.reverse()
    plt.legend([handles[idx] for idx in order],[labels[idx] for idx in order],
               bbox_to_anchor=(1.1, 1.01), title=f'Year: {year}')
    
    plt.title(f'{year}', y=1.1)
    plt.show()

years = ['2018', '2019', '2020', '2021', '2022']

for year in years:
    create_circular_chart(year)

Circular Bar Charts Illustrating the Frequencies of the Top-5 Trigrams in 2022 and Their Changes from 2018 to 2022. Image by the author.

The resulting charts are visually appealing and make it easy to compare the trigram frequencies across different years.

Visualize Trigram Ranks

Let’s move to visualisation of trigram rank — the rank table offers a greater level of insight compared to the frequency table. It enables us to better comprehend the changes in the position of a specific trigram in relation to the overall rankings of trigrams for each year. This understanding is essential for tracking the evolution of research trends and the significance of specific topics within the field over time.

We download the rank table trigram_rank_table.csv which we obtained in Part 2 of our Series:

table_rank = pd.read_csv('trigram_rank_table.csv', sep=',')

Heatmap

A heatmap is a powerful data visualization technique that uses color to represent values in a matrix or grid, making it easier to identify patterns, trends, or outliers. It is an excellent choice for our task of visualizing the top values of trigram rank table.

Let’s create a heatmap to visualize the evolution of the TOP-20 trigrams’ rankings of 2022 across the 2018–2022 years, highlighting shifts in their significance within the field. We use the ‘graph_objects’ module from the Plotly library to generate a heatmap with custom colors and annotations. We apply a logarithmic scale to better visualize the data: the color of each cell corresponds to the natural logarithmic scale of the rank, and the number displayed in each cell represents the actual rank. The logarithmic scale helps us display a wide range of values in a more compact and readable manner.

import seaborn as sns
import matplotlib.pyplot as plt

table_filtered = table_rank[
    table_rank['Trigram'].isin(table_rank['Trigram'][:20].tolist())
]

table_filtered = table_filtered.sort_values(
    by='2022', ascending=True
)

heatmap_data = table_filtered.set_index('Trigram')

log_heatmap_data = np.log(heatmap_data)

sns.set(font_scale=1.3)

plt.figure(figsize=(10, 12))
ax = sns.heatmap(
    log_heatmap_data,
    cmap='viridis_r',
    annot=heatmap_data,
    cbar=False,
    fmt='.0f',
    yticklabels=True
)

ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')

plt.xticks(rotation=45)

plt.show()

Heatmap Visualizing the Evolution of the TOP-20 Trigrams of 2022 Rankings across the 2018–2022 years. The color of each cell corresponds to the natural logarithmic scale of the rank, and the number displayed in each cell represents the actual rank. Image by the author.

The heatmap highlights the dynamic evolution of trigrams, showcasing several trigrams that have gained have gained significant popularity in 2022 compared to previous years:

Graph Neural Network,
3D Object Detection,
Bidirectional Encoder Representation (Transformer),
Pre-trained Language Model,
Large Language Model.

The heatmap offers a convenient visual representation of the data, making it easier to identify trends and shifts in the AI landscape. However, one drawback is that if we were to display all the top topics, which amount to 1,000, the graph would become almost unreadable due to the sheer volume of data points. To cover all topics, we may need to explore other visualization tools that can effectively display a larger volume of data points.

Scatterplot with Outlier Boundaries

Scatterplot

In the previous section, we highlighted that the heatmap visualization, while effective for a limited number of trigrams, might not be suitable for displaying a large volume of data points due to readability issues. To address this challenge, we will now explore an alternative approach using a scatterplot to visualize the rank changes of trigrams between 2018 and 2022.

Given the large number of trigrams (1000) and their potential rank range from 1 to 1000, employing a logarithmic scale on both axes is a more effective approach for visualization. We calculate the rank difference and store it in a new column. The rank difference (in the standard, non-logarithmic scale) is represented on the scatterplot by the color of the point for each trigram.

We create a scatterplot with the 2022 rank on the x-axis (natural logarithm scale) and the 2018 rank on the y-axis (natural logarithm scale). This method allows for improved visualization of trigram rank changes while maintaining readability and interpretability of the data.

import plotly.express as px
from scipy.stats import linregress

table_filtered = table_rank.copy()
table_filtered['Diff'] = table_filtered['2018'] - table_filtered['2022']
table_filtered['Diff_ln'] = np.log(np.abs(table_filtered['2018'] - table_filtered['2022']))
table_filtered['2018_ln'] = np.log(table_filtered['2018'])
table_filtered['2022_ln'] = np.log(table_filtered['2022'])

table_filtered = table_filtered.dropna(subset=['Diff_ln'])

slope, intercept, r_value, p_value, std_err = linregress(
    table_filtered['2022_ln'], table_filtered['2018_ln']
)

table_filtered['residuals'] = table_filtered['2018_ln'] - (
    slope * table_filtered['2022_ln'] + intercept
)
residuals_std = np.std(table_filtered['residuals'])
num_std_dev = 2

fig = px.scatter(
    table_filtered, x='2022_ln', y='2018_ln', hover_name='Trigram'
)

fig.update_traces(
    marker=dict(
        size=12,
        color=table_filtered['Diff'],
        colorscale='RdBu_r',
        colorbar=dict(title='Rank Difference: 2018 vs. 2022'),
        line=dict(color='black', width=1)
    )
)

fig.update_layout(
    title='Scatterplot of Trigrams Rank in 2022 vs. 2018 with Outlier Boundaries',
    xaxis_title='Rank in 2022 (natural log scale)',
    yaxis_title='Rank in 2018 (natural log scale)'
)

fig.update_traces(text=table_filtered.index)
fig.show()

Outliers Detection

To identify trigrams with anomalously high growth or decline, we will focus on finding outliers. Outliers are data points that significantly deviate from the overall pattern or trend observed in the dataset. We add the following lines to the scatterplot:

A grey solid line represents the best-fit line for the data points found using linear regression.
Two red dashed lines represent the upper and lower boundaries for outlier detection, based on double the standard deviation from the best-fit line.

# Add lines for outlier boundaries
def add_line(fig, x0, x1, y0, y1, color, dash=None):
    fig.add_shape(
        type='line',
        x0=x0, y0=y0, x1=x1, y1=y1,
        line=dict(color=color, dash=dash)
    )

add_line(fig, min(table_filtered['2022_ln']), max(table_filtered['2022_ln']),
        slope * min(table_filtered['2022_ln']) + intercept,
        slope * max(table_filtered['2022_ln']) + intercept, 'gray')

add_line(fig, min(table_filtered['2022_ln']), max(table_filtered['2022_ln']),
         slope * min(table_filtered['2022_ln']) + intercept + num_std_dev * residuals_std,
         slope * max(table_filtered['2022_ln']) + intercept + num_std_dev * residuals_std,
         'red', 'dot')

add_line(fig, min(table_filtered['2022_ln']), max(table_filtered['2022_ln']),
         slope * min(table_filtered['2022_ln']) + intercept - num_std_dev * residuals_std,
         slope * max(table_filtered['2022_ln']) + intercept - num_std_dev * residuals_std,
         'red', 'dot')

The scatterplot with the 2022 rank on the x-axis (natural logarithm scale) and the 2018 rank on the y-axis (natural logarithm scale). The grey solid line represents the best-fit line for the data points finded using linear regression. Two red dashed lines represent the upper and lower boundaries for outlier detection, based on a twised number of standard deviation from the best-fit line. — Scatterplot to Visualize the Rank Changes of Trigrams between 2018 and 2022. An interactive scatterplot is available at the link. Image by the author.

An interactive scatterplot that allows you to view the value of each point is available at the link.

Let’s take a closer look at the trigrams that lie outside the boundaries of outlier detection. These trigrams are the ones that have shown a significantly higher (or lower) increase in interest compared to other trigrams in the field of AI. Analyzing these trigrams can provide valuable insights into the specific topics or applications within AI that are gaining traction or losing popularity among users and researchers.

Exploring the Rise and Fall of Topics in AI Research

Emerging Trend in AI: Transformers

Three trigrams representing one of the most significant AI trends — transformers — have emerged as a distinct group:

BERT (2018: rank 513, 2022: rank 9),
Pretrained Language Model (2018: rank 511, 2022: rank 11),
Large Language Model (2018: rank 792, 2022: rank 14).

Transformers have revolutionized the field of artificial intelligence, particularly in natural language processing. Key players like OpenAI, Facebook, and Google Research have developed groundbreaking models such as GPT-3, GPT-4, PaLM, RoBERTa, and BERT. These models have seen a rapid rise in popularity, evident in the significant increase in their ranking between 2018 and 2022.

These transformer-based models have influenced AI research, academia, and real-world applications, including customer service chatbots, content moderation, and automated content generation. As the trend continues, more breakthroughs and advancements in AI are expected, further enabling complex applications in human-machine interaction and intelligent decision-making support.

Topic in TOP-10 with Notable Growth

Graph Neural Networks (2018: rank 44, 2022: rank 1).

As a powerful tool for processing graph-structured data, GNNs have found applications in diverse domains such as social network analysis, recommendation systems, and drug discovery.

Topics with Declining Interest

Some trigrams have seen a decrease in interest, including:

Convolutional Neural Network, which lost its first place and in 2022 landed in fourth position,
Generative Adversarial Network (2018: rank 3, 2022: rank 6),
Neural Machine Translation (2018: rank 7, 2022: rank 20),
Non-Orthogonal Multiple Access (2018: rank 12, 2022: rank 49),
Long-Short Term Memory (2018: rank 10, 2022: rank 64),
Deep Convolutional Neural (Network) (2018: rank 11, 2022: rank 92).

The decline in popularity of some of these topics can be attributed to the growth of transformers, which have demonstrated exceptional performance in various AI tasks, particularly in natural language processing. For instance, the rise of transformers has overshadowed Neural Machine Translation, as models like GPT-3, GPT-4 and BERT have shown superior performance in language translation tasks. Similarly, the advancements in transformers have led to a reduced focus on LSTM, which was once the go-to choice for sequential data processing.

While CNNs and GANs have not been directly replaced by transformers, the rise of other techniques like Graph Neural Networks and Physics-Informed Neural Networks has contributed to their reduced prominence.

We encourage readers to explore the graph in more detail and draw their own conclusions about the evolution of topics in AI research.

Conclusion

In this concluding part of our series, Exploring the AI Landscape, we have journeyed through the mesmerizing realm of AI, utilizing a variety of data preprocessing and visualization techniques to illustrate the prevailing topics and trends. As we reflect on our experiences in Part 1: Crafting the Data Foundation — Data Selection and Preparation, and Part 2: Words of Change — Gaining Insights from Article Titles, we appreciate how these earlier segments have set the stage for our discoveries in this final part.

In Part 3: Picturing the Present — Visualizing the Rise and Fall of Topics, we learned valuable techniques for data visualization, including:

Histogram,
Radar Chart,
Circular chart
Heatmap,
Scatterplot with Outlier Boundaries.

By successfully extracting and visualisating the 1000 Most Popular Trigrams for 2022 from titles of computer science articles on arXiv.org (2018–2022), we delved into the most significant trends and themes in the AI realm.

Thank you for joining us on this enlightening expedition through the AI landscape. We hope that the knowledge and skills gleaned from this series serve as a springboard for your own exploration and discovery in the ever-evolving world of artificial intelligence.

The complete code for all parts can be found in my GitHub repository.