Top Songs to Learn Spanish According to Data Science

Stéphanie Crêteur
Geek Culture
Published in
5 min readJun 29, 2022

After I wrote my previous article about the best songs to learn French, many people came back to me to do the same exercise but in another language. So I thought I’d give it a go with Spanish.

By the way, if you’re interested in the easiest French songs, my article can be found here:

I have more or less followed the same process, therefore I won’t dwell so much on the code in question, but rather on the results. Some new features though: I used the MusiXmatch API to find the music's genre and get the singers' nationalities. The procedure to register for an API key is available by following this link.

Exploratory Data Analysis

My dataframe consists of 4582 songs by artists from 38 different countries. The most represented genres are pop (1139 songs), Latin (505 songs) and Latin Urban genres (275 songs).

Photo by Ibrahim Rifath on Unsplash

As mentioned earlier, the novelty in this analysis is the addition of nationalities. Despite a large number of missing data, we can still see a clear top three with the USA (862 songs), Spain (671 songs) and Mexico (596 songs).

In the same way as for French songs, the vocabulary remains linked to love (‘amor’, ‘amar’), life (‘vida’) and desire/love (‘quiero’). Compared to French lyrics, I do have the impression though that there is a greater emphasis on the body in these songs with a regular occurrence of terms like ‘labios’ (‘lips’), ‘besos’ (‘kisses’), ‘piel’ (‘skin’), ‘sentir’ (‘feel’), ‘boca’ (‘mouth’), ‘cuerpo’ (‘body’).

The artists with the most songs are Julio Iglesias (78 songs), Shakira (62 songs), Daddy Yankee (58 songs), Wisin and Yandel (53 songs) and Enrique Iglesias (53 songs).

Classification

This time for my vocabulary I used the frequency list offered by the Real Academia Espanola (REAL ACADEMIA ESPAÑOLA: Banco de datos (CORDE) [en línea]. Corpus diacrónico del español. http://www.rae.es [22.06.2022]). A major difference between this list and the one used in French is that it contains not only the basic form of a word but all its forms classified according to their occurrence. Thus, we find “soy” (‘I am’) as well as “fuimos” (‘we were’) or “ser” (‘to be’), whereas my previous list was limited to the infinitive verb (here it would have been ‘ser’).

In order to estimate how many words I should include in each level, I based myself on this article. In this way, I have again created four lists of words divided into A1, A2, B1 and B2. For the rest, the process is quite identical to the French songs (creation of a matrix with CountVectorizer, the insertion of the level of each word in this same matrix, creation of a dictionary with the percentage of words per level for each song, …)

Results

Here are the results of the analysis. First, you can see here a graph showing the distribution of songs according to their lexical range: either A (A1 to A2) or A + B (A1 to B2).

And finally here are the three lists of songs: one with the highest percentage of known words for A1 level, then for A2 level and finally for B2 level.

TOP 10 songs for A1 level

  • Sueño de Noche by Gipsy Kings (genre = Latin)
  • ¡Qué bien! [Hot Dog Dance] (Latin Spanish) by Mickey Mouse Clubhouse (OST) (genre = Children’s Music)
  • La despedida by Shakira (genre = Folk)
  • Te quiero tanto by OV7 (genre = Pop)
  • Toma mi vida by Milly Quezada (genre = Merengue)
  • ¿Dónde Están, Corazón? by Enrique Iglesias (genre = Pop)
  • Así es la mujer que amo by Victor Manuelle (genre = Latin)
  • Más by Nelly Furtado (genre = Pop)
  • 5 razones by Manu Chao (genre = World)
  • Nada ni nadie by Soy Luna (OST)

Again I created a Spotify playlist with some A1 level songs from the corpus (almost 3 hours of songs).

TOP 10 songs for A2 level

  • Tú estás aquí by Jesús Adrián Romero (genre = Christian & Gospel)
  • Reflejo de Luna by Alacrán (genre = Electronica)
  • Eres tú by Mocedades (genre = Pop)
  • A Lo Mejor by Banda MS (genre = Pop)
  • Mi primer amor by Ender Thomas (genre = Latin)
  • Lo que yo sé de ti by Ha*Ash (genre = Pop)
  • Me voy by Yasmin Levy (genre = Pop)
  • Por qué será by Xtreme (genre = Salsa y Tropical)
  • Amor Genuino by Zion & Lennox (genre = Latin Urban)
  • Renuevame by Christian Hymns & Songs (genre = Christian & Gospel)

Here is the playlist for the A2 level.

TOP 10 songs for B2 level

  • Porque by Yasmin Levy (genre = Raices)
  • Dame tu mano by Glorya (genre = Electronica)
  • Los adolescentes by Dënver (genre = Pop)
  • La incondicional by Luis Miguel (genre = Pop)
  • Vayamos compañeros by Marquess (genre = Pop)
  • Años by Mercedes Sosa (genre = Latin)
  • Amor, amor, amor by Julio Iglesias (genre = Pop)
  • You are my hiding place by Selah (genre = Christian & Gospel)
  • Vai (LLP Remix) by Muneca (genre = Pop)
  • Amada amante by Roberto Carlos (genre = Brazilian)

Finally, the playlist for the B2 level.

If you rather want to search by yourself according to your preferences, you can use the table below to browse the whole corpus. You can either search by artist, title, genre or nationality. You can also classify the list according to the lexical range: A1-A2 or A1-B2.

To see the table in full screen follow this link: https://datawrapper.dwcdn.net/D8Xul/1/

If you are interested in the full code here is the GitHub repository. Thank you very much for reading!

--

--

Stéphanie Crêteur
Geek Culture

Python | Data analysis lover. Learning about AI and Natural Language Processing.