If you have read my previous article here, you will not be surprised about this article. I continued exploring K-pop data.
As I say before K-pop idols have data! And it will be interesting if we explore this. I analyzed each feature to make the data exploration easier. I used two datasets: K-pop boy group and K-pop girl group profiles from 1992 to 2020. I used pie charts to visualize data because I want to see the percentage of the totals data.
Data source : K-Pop Database
First of all, data preparation: importing needed packages (pandas and matplotlib) and datasets also checking data types (mostly: object) and missing values. Here’s the datasets :
Most boy and girl groups here didn’t have short names, then I dropped
Short column, and also
Korean Name because I assumed the readers here already know the groups without their Korean name.
The missing values in both
Fanclub Name is above 50% percent. I assumed, it has two possibilities :
- They didn’t have official fanclub name yet.
- They already have official fanclub name, but the dataset is not updated yet.
According to WiDS Datathon, if the information contained in the variable is not that high, you can drop the variable if it has more than 50% missing values. I dropped this column.
Now, let’s getting insights from each feature!
We could see here, there are 147 boy groups and 152 girl groups in K-pop from 1992 to 2020. The girl groups is greater than boy groups.
110 is a big number because the boy group itself is 147. If we look deeper, we can find data, for example, 2AM belonged to two companies, JYP and Big Hit. And many data that is similar to this data. The groups can be managed by one or multiple companies.
The company of girl groups is in the same case here. A company can have one or more girl groups and boy groups. For example, SM Entertainment currently home to idol groups such as SUPER JUNIOR, Girls’ Generation, SHINee, EXO, Red Velvet, NCT, SuperM and aespa in addition to numerous non-idol groups and solo artists.
So, the .nunique() method I have used here didn’t give an exact result. It will be challenging to get how many companies are here, do you want to try? Let me know.
3. Member and Original Member
I see something interesting from these two columns, they are not the same. Why? Let’s look deeper.
Firstly, I created a function to compare the values in these two columns. Then, I created a new column
Memb. Status with three statement :
- ‘Same’, if member = original member
- ‘Added’, if member > original member (there’s member added to the group)
- ‘Subtracted’, if member < original member (there’s member subtraction from the group)
I visualize it in folowing pie charts :
a) Boy groups
b) Girl group
If we compare these pie charts, the boy group is stable enough than girl group with 70.1% of them maintain the member. The girl group has more subtraction members but also more added members. To understand why the number of members can change, let’s see from the examples below :
a) Subtraction members
EXO is a nine-member boy group under SM Entertainment. They debuted on April 8, 2012, with their first mini-album Mama. The group originally consisted of twelve members; Xiumin, Luhan, Kris, Suho, Lay, Baekhyun, Chen, Chanyeol, D.O., Tao, Kai, and Sehun, who were divided into two sub-units, EXO-K and EXO-M, one promoting in Korean and the other in Mandarin. Between 2014 and 2015, the group lost three of its members: Kris, Luhan, and Tao, respectively, preferring to focus on their individual careers in China.
b) Addition members
Red Velvet debuted on August 1, 2014, with the song “Happiness” and four members: Irene, Seulgi, Wendy, and Joy. On March 11, 2015, Yeri joined the group. The company, SM, waited on Yeri because of her age (she would’ve been 15 years old) and decided to add her in the group for the 2015 comeback of Ice Cream Cake when Yeri was 16.
NCT, an acronym for Neo Culture Technology, they are best known for being able to have an unlimited number of members and sub-units. Until 2020, they have 23 members. They added 3 members in 2018 then two members in 2020. The dataset I used here is from 1992 to 2020 but some of the data is not updated yet.
The number of members in the K-pop group can change, it can be addition or subtraction. Every group has its own reasons, so we can not generalize that.
Wait … but what if they only change the member (the idol) but the number is the same? I mean a substitutional member. Who knows.
I want to know K-pop activity status in pie charts.
a) Boy group
b) Girl group
If we compare the pie charts, the boy group is more active than the girl group with three-quarter of them are active groups. Even the number of girl groups is more than the boy group, 38.8% of them are inactive. Generally, these inactive groups are caused by disbanding. The disband itself can be caused by the member didn’t renew their contracts with the company, renewed their contracts with another company, they were a temporary group, preferring to focus on their individual careers, or so on.
Here’s the example of disbanded girl group: the well-known are 2NE1 and 4Minute which disbanded in 2016.
The number of hiatus groups is not quite different between the boy and girl groups. Hiatus means the group no longer promotes in Korea at all but are not disbanded. The hiatus status is only given to groups where both members and the company has stated it is a hiatus. Here’s the hiatus boy groups :
But what happens to K-pop idols who are on hiatus?
Viki Topalova said many things can happen to groups on hiatus. Members can come back as solo or sub-units to perform, they can promote themselves and their group by themselves in dramas, films, variety shows, TV shows, musicals, model jobs, article writing, etc. Or they can take a break, maybe a holiday or rest up. Being an idol is a tough job and injuries are pretty much guaranteed. However, the longer a group’s on hiatus the harder it will be for them to comeback. During the hiatus, fans could leave, newer groups could challenge the hiatus group’s position in the K-pop food chain, and so on. Also, a long hiatus could mean impending disbandment and possibly members leaving. Hiatuses are not always a good thing for the company, the group, and the fans.
The last one, I want to use the year in debut date to visualize the K-pop group’s debut generation. To do this, I changed the type of
Debut column from string to datetime. Next, I extracted year into
Debut year column.
Then, I used these three generations (according to The Pudding) :
- 1st generation : debut 1992–1999
- 2nd generation : debut 2000–2009
- 3rd generation : debut 2010–2020
To make it easier to understand, I made the visualizations.
a) Boy group
b) Girl group
We can see from the both pie chart, above 80% of K-pop groups debuted in 3rd generation. Let’s discuss why this can be happen :
a) 1st Generation
The 1st generation began with the debut of a 3-member group that is credited with inspiring K-pop as we know it today: Seo Taiji and Boys. Around this time, K-pop began to gain attention in East and Southeast Asia as a result of a growing regional interest in Korean culture called “Hallyu.” Today, Hallyu is a global phenomenon influencing food, beauty, music, technology, and fashion.
b) 2nd Generation
As K-pop’s popularity grew outside of South Korea, so did the size of its groups. While 1st generation group sizes typically ranged from 3–5 members, the 2nd generation debuted nine members or more so the new era is called “super-size”. The famous examples: Super Junior (2005) and Girls’ Generation (2007). It is only in the past five years that super-size groups have become the new normal in K-pop.
c) 3rd Generation
The 3rd generation — the current generation — brought with it the largest idol groups yet, among them the 23-member NCT(2016 to 2020). K-pop is now on the cusp of its 4th generation, which is likely to bring more experimentation with size. Now that the success of larger groups has been proven, it’s possible that companies are more willing to take on the financial risk of producing a super-size group, especially considering the popularity of three trends: international casting, subunits, and idol survival shows.
From this short explanation, we could know that in 1st generation is the beginning of “Hallyu”. The groups, the members, and the company is not much. Next, the 2nd generation tried to expand the business by adding the number of members. More groups debuted in this era (about 9%-10%). Then, 3rd generation tried to bring more members, and 80% of groups are born in this era. K-pop is worldwide now, bringing more international fans, the hype is everywhere. The companies always trying to use the existing opportunities.
After these explorations, we can see some insight behind the K-pop groups data from 1992 to 2020 :
- The number of girl groups is greater than boy groups. There are 147 boy groups and 152 girl groups in K-pop.
- A K-pop group can be managed by one or multiple companies and a company can have one or more girl groups and boy groups.
- The number of members in the K-pop group can change for any reason, it can be addition or subtraction. The girl group has more subtraction members but also more added members.
- The boy group is more active than the girl group. Generally, inactive groups are caused by disbanding.
- The number of hiatus groups is not quite different between the boy and girl groups.
- Above 80% of groups are debuted in the 3rd generation (2010–2020). The group has more members, being worldwide and more popular, bringing more international fans, and the hype is everywhere.
Visit my GitHub to see the full notebook.
Here’s the final look at the dataset after data exploration.