China is closing in on the US in AI research

By Jiangjiang Yang and Oren Etzioni

Jiangjiang Yang
May 11 · 5 min read

Abstract

In 2017, China announced plans to become the world leader in AI by 2030. In March 2019, the Semantic Scholar team performed an analysis of over two million academic AI papers published through the end of 2018 and concluded that China was indeed poised to overtake the US as world leaders in AI research. We recently ran a similar study using AI research publication data updated through the end of 2020 — we found that China has indeed surpassed the US in the total number of published AI papers, as well as taken the lead in the top 50% of the most-cited AI papers. Our new projections are that China will surpass the US in publishing the top 10% most-cited AI papers this year, and will become the leader in the top 1% of most-cited papers in 2023.

Our Findings

China’s rise in AI research is not only reflected in its total quantity of AI publications (past studies [1][2][3] showed that China had already surpassed the US in total AI publications in 2006) — it is now also reflected in China’s growing share in the most-cited AI papers.

In this study, we found that China has already overtaken the US in the top 50% most-cited AI papers in 2019. By 2019, China’s overall share among the top 50% papers had grown to 31.5%, vs. the US’s 30.0%. China achieved this lead by outperforming the US in journal publications; among the journal publications of top 50% papers in 2019, 11.5k were from China vs. 8.0k from the US. (On the other hand, the US still led in conference publications with 8.0k papers vs. China’s 5.3k).

When we focused on the most impactful AI papers — the top 10% and top 1% most-cited — we saw that China’s proportion of these papers had also grown quickly in recent years and is closing in on the US’s share. In 2020, for example, the US contributed 37% of the top 10% most-cited AI papers to China’s 36%.

If we run linear extrapolations of the publication trends of the US and China, based on their yearly shares from the last six years, we can see China is well on the way to catch up with the US in 2021 for the top 10% papers, and in 2023 for the top 1% papers.

Comparing our current projections to those from our 2019 study, we can see that they mostly align, but that China’s contributions to AI papers in the top 1% of the field will outpace the US even sooner than we expected.

Conclusions

Our data highlights the impact that Chinese investment in AI research has had on their leadership in the field. Our original analysis projected that China would have the greater share of top-1% papers by 2025, but our updated projections make it clear that the gap is closing even faster. Future work from the Semantic Scholar team will continue to examine data around citation behavior and the relationships between preeminent individuals and groups publishing in AI-related fields to best understand this activity and what it means for AI research supremacy in the coming years. For the US to remain a key player in AI research globally, we will need to make strong investments in AI technology as well as in attracting and retaining the top global talent in this critical field.

Appendix

1. Methodologies

We employed Microsoft Academic Graph (MAG) [4] data from the 2021–03–01 release to run the analyses for this study.

To identify AI papers: we included (a) all journal and conference papers with the explicit ‘Artificial intelligence’ concept/topic tag from MAG fields-of-studies, and (b) all papers accepted to a list of top AI conferences — Neurips, ICML, KDD, AAAI, IJCAI, CVPR, ICCV, ECCV, ACL, EMNLP, NAACL. For (b) we identified 112k AI papers from the top conferences between 1980 and 2000 (~80% of them do not have an explicit ‘Artificial intelligence’ concept/topic tag).

To get paper citation count: we used the estimated-citation-count (ECC) from MAG Papers data.

To link papers to countries in this study, we relied primarily on the GRID-ID in MAG’s PaperAuthorsAffiliation data. By mapping an author’s affiliation to the GRID metadata [5], we can reliably find the country of the affiliation. For author affiliations missing a GRID-ID, we used the ‘Iso3166Code’ or the website URL of the affiliation to deduce its country of location. We classified a paper as a US paper if any of its author affiliations were from the US, as a Chinese paper if any of its author affiliations were from China. A paper co-authored by a US researcher and a Chinese researcher would add to both the US and the Chinese paper counts.

2. Data and Codes

We used the following data from the Microsoft Academic Graph (2021–03–01 release): Papers; PaperFieldsofStudy; ConferenceSeries; PaperAuthorAffiliations; Authors; Affiliations

We also used GRID data (2020–12–09 release) to find the countries of the author affiliations.

Note: The number of papers tagged with the ‘Artificial Intelligence’ concept/topic can vary between different releases of MAG PaperFieldsofStudy. Newer releases can have ‘tighter’ definitions and fewer papers tagged as AI papers. However, this variance does not appear to have shifted the relative positions of the US and China over the years of study.

The source code for this work can be found in GitHub.

References

[1] Yoav Shoham, Raymond Perrault, Erik Brynjolfsson, Jack Clark, James Manyika, Juan Carlos Niebles, Terah Lyons, John Etchemendy, Barbara Grosz, and Zoe Bauer (2021). The AI Index 2021 Annual Report. AI Index Steering Committee, Human-Centered Artificial Intelligence (HAI), Stanford University, Stanford, CA.

[2] Hellwig, Joerg; Huggett, Sarah; Siebert, Mark (2019). Data for Elsevier’s AI report Artificial Intelligence: How knowledge is created, transferred, and used. Mendeley Data, V2, doi: 10.17632/7ydfs62gd6.2

[3] Field Cady, Oren Etzioni, Carissa Schoenick (2019). China May Overtake US in AI Research. Allen Institute for Artificial Intelligence.

[4] Wang Kuansan, Shen Zhihong, Huang Chiyuan, Wu Chieh-Han, Eide Darrin, Dong Yuxiao, Qian Junjie, Kanakia Anshul, Chen Alvin, Rogahn Richard (2019), A Review of Microsoft Academic Services for Science of Science Studies. Frontiers in Big Data

[5] www.grid.ac/downloads (2020–12–09). DOI: 10.6084/m9.figshare.14316596

AI2 Blog

AI for the Common Good.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store