Exploratory Data of K-Pop Idols

Feni Rahmi
Feb 1 · 6 min read

K-pop (abbreviation of Korean pop) is a genre of popular music originating in South Korea. K-pop is a term that is often used on the internet, and there is quite a popular fan following around the world. This ‘once local, now global’ phenomenon has an interesting background.


K-pop idols are groups and artists formed by the various entertainment companies creating catchy Korean popular music and targeting younger audiences. The music groups are formed from a group of people who are all particularly talented in at least one of the following: singing, rapping, and dancing. These idols often enter the entertainment company in their teens and then train hard for years in areas like singing, rapping, dancing, and foreign languages. Then, if they are lucky and talented enough, by the end of their teens they will be picked for an idol group.

K-pop idols have data! And it will be interesting if we explore this to answer several general questions. I created some features based on the existing features. I used jupyter notebook for this project. The dataset is about K-pop idols’ profiles from 1992 to 2020.

Data source : K-Pop Database

First of all, data preparation: importing needed packages (pandas and matplotlib) and dataset also checking data types (entirely: object) and missing values. Here’s the data :

I dropped Korean Name and K. Stage Name columns. I assumed mostly the readers here can not read Hangul (Korean letter). I also dropped Birthplace because it is not a necessary feature.

Count of missing values.

I didn’t have to handle these missing values because the data type is not numeric. I think these values aren’t missed.

The six data above is really interesting, how can be the idol doesn’t have a full name? Let’s make an assumption, they have not submitted their full profile on the internet yet. People already know them by their stage name, and most of them are coming from the Lusty group.

Next, I used the data to answer several questions.

  1. How is the comparison between male and female idols?
K-pop idols gender comparison

There’s a slight difference between the percentage. So, it can be concluded that the number of female idols is not much different from the number of male idols.

2. Do K-pop idols only come from South Korea?

Counts of idols countries.

We can see that the K-pop idols aren’t only South Korean. There are foreign idols from 11 other countries. I want to know who is came from Indonesia (because I’m Indonesian hehe).

K-pop idols from Indonesia.

Here is Dita from SECRET NUMBER and Loudi from 14U.

Back to the count. Due to significant comparison (in number), I chose ‘The Top 3 Country’.

From the bar chart above, the top 3 K-pop idols country is South Korea, China, and Japan. According to The Pudding, there’s International Casting. Agencies began more proactive global recruiting in the mid-2000s, perhaps to capitalize on K-pop’s international growth. Global auditions were held primarily in the USA and Canada before expanding into China, Japan, Thailand, Australia, and more. Many agencies now enable hopefuls to audition online and, while there are not always limits on nationality, the majority of idols that debut are from East Asian countries or have East Asian heritage.

3. a) How many groups in K-pop?

The number of K-pop groups.

There are 208 K-pop groups in total from 1992 to 2020. But, if we look back to missing values, the Group column has 91 missing values.

Idol that has no group.

If we do internet research, Ailee, for example, is a solo singer, she doesn’t have a group. So, the missing value in the Group column isn’t missing. The idol is a soloist and doesn’t belong to any group.

3. b) How many idols have more than one group (other groups)?

Idol that has other group.

I will choose one example, Yuta. Yuta is an NCT group member, but he also NCT 127 group member. The Other Group here means sub-unit group. NCT 127 is a sub-unit group of NCT.

As groups became larger and international casting became more popular, subunits became more prevalent. These are smaller groups-within-a-group that may target a different market or audience by exploring different musical influences or promoting in non-Korean languages.

There are 122 idols which became a member of their sub-unit group.

I didn’t make a visualization to answer question 3, because the comparison is very significant, so it won’t be able to see the clear visualization.

4. How many idols use a stage name?

To do this, I separated the surname and first name of the idols and dropped the Full Name column. I matched the stage name with the first name. Then, I created a pie chart.

The percentage of K-pop idols name.

Impressive, 44.4% of K-pop idols use stage names! Selma Finn said they do it for a variety of reasons :

  • There’s another debuted idol with that name
  • They want to distance their stage/on-camera personas from their real selves
  • They’re a foreign idol and don’t want to stand out
  • Their name is too long
  • They don’t like their name
  • They want to appeal to an international audience, so they pick an English stage name instead of their Korean one
  • It’s already the nickname their friends call them

5. How about K-pop idols age generation?

To answer this, I changed the type of Date of Birth column from string to datetime. Next, I extracted the year into the Year column.

Year of birth unique values.

From this unique value, it can be seen that the range of year is 1977 – 2005. So, I used these three age band of generations (according to The Pew Research Center) :
- Generation X : Born 1965–1980 (39–54 years old)
- Generation Y (Millennials) : Born 1981–1996 (23–38 years old)
- Generation Z : Born 1997–2012 (7–22 years old)

Then, I made the visualization.

The percentage of K-pop idols age generation.

From this pie chart, it can be seen that the K-pop idols consist of three generations, and Millenials (Generation Y) is dominating. Until 2020, the K-pop idols were mostly born in 1981-1996.

Conclusion :

After these explorations, we can see some insight behind the K-pop idols data from 1992 to 2020 :

  • The number of male idols and female idols is slightly different.
  • The idols not only came from South Korea, but they also came from 12 countries. The top 3 countries: South Korea, China, and Japan. The goal of recruiting foreign idols is to capitalize on K-pop’s international growth.
  • There are 208 K-pop groups in total. Some of the idols are soloist that doesn’t belong to any group.
  • The K-pop groups also have sub-unit groups that may target a different market or audience by exploring different musical influences or promoting in non-Korean languages.
  • About 44.4% of K-pop idols are using stage names than their original name for several reasons.
  • Until 2020, K-pop idols consist of three generations (X, Y, and Z) and the Millenials (Y) generation is dominating the K-pop industry.

Please read the next article here and visit my GitHub to see the full notebook.

Bonus :

Here’s the final look at the dataset after data exploration.

Dataset after data exploration.

The Startup

Get smarter at building your thing. Join The Startup’s +737K followers.