Vocaloid

Juliette Ruch
13 min readMay 9, 2022

--

As a self-proclaimed music enthusiast, I have gone through many phases as a listener, as I am sure many people have. Music was not just a pastime, but impacted my personal identity. From the ever-persistent emo phase, aligned with my super skinny jeans and elation over the My Chemical Romance reunion, to the undoubtedly stranger period of time as a freshman in high school, when I would actually listen to Jim Morrison’s solo poetry album, I can confidently say that I have explored quite a few of music’s odd ends. Sometimes, in the discovery of something so wildly different from anything one has ever heard, an unexpected love is discovered. Back in the year of 2014, when I was an 11-year-old trying (and failing) to navigate the new frontier of middle school in the thick of my initial emo awakening, came my discovery of Vocaloid. The music was unlike anything I had previously heard, and the combination of vocals and animation made it distinctly appealing.

This all leads to the present-day conundrum: I understand Vocaloid but struggle to explain it to others. My attempts usually result in a circular explanation of rambling examples about singing robots, “trust me it’s good, you should try to listen”, and “Hatsune Miku” is amazing, but not actually a real person”. These explanations are often received with half-smiles or expressions of confusion.

Aside from enjoying the music, I am fascinated by how the Vocaloid continually pushes past conventional methods of music creation and one’s perception of what music is. As a standard, music has always been attributed to the talents of the singers or musicians, but Vocaloid completely changes the game. Love it or hate it, like other technologies, Vocaloid is forcing people to realize that technology often alters what something “has to be”, even if people find it threatening.

The “Main 6” Vocaloids

Since most people have never heard of Vocaloid, it is necessary to define it. Vocaloid first originated as a music synthesizer program first developed and released by the Yamaha Corporation in 2003 (Universitat Pompeu Fabra Barcelona). “Vocaloids” themselves are essentially voicebanks pre-recorded by a single voice provider. The singer, or voice provider, records the vocal track, which is then put into a library of sound fragments. The software user can access the vocal tracks to create original music and content, without an instrument or providing the actual vocals. Users create the music by selecting and dragging short phrases, to align and create lyrics, or, edit them to say whatever they want. The user can also select different singing styles, applying techniques like “bouncing” or “swimming” to the vocals.

The popularity of the software, in part, can be attributed to how the companies added a “human” component: users apply the music to an animated “mascot”. The first Vocaloids released, known as Leon and Lola, only had stock images for their box art. Although various fan-created designs exist, neither Leon nor Lola have an official character design. The “Vocaloid” itself is actually the character created for the voicebank, singing the song created by the user.

Leon and Lola’s box art.

Vocaloids are not actual people, nor are they characters in an anime, another common misconception. While the characters’ designs take that of an anime art style, Vocaloid is not a scripted television series. To compare, it would be like selecting an actor or actress and creating a role tailored for them, instead of them reading a script. Vocaloid is often misinterpreted as a form of anime or even a television series, in association with the art style. The creators of these characters do, however, give them personalities and backstories, making them into somewhat developed characters. For example, popular Vocaloid personalities Rin and Len Kagamine have the same voice provider, but are known as a twin brother and sister duo, similar to an actor playing multiple roles in a single production.

Asami Shimoda provided the vocals for both Rin and Len Kagamine

Sometimes vocaloids are named and based around their voice provider themselves, particularly if the voice provider already has some amount of fame. For example, Japanese musician GACKT, former lead singer of visual kei band Malice Mizer, lent his voice to the popular Vocaloid known as Kamui Gakupo. Other examples are Gumi, named for her voice provider Megumi Nakajima, and Fukase, named for his voice provider Fukase Satoshi.

The existence of Vocaloid characters that are associated with the voicebanks and Vocaloid software has allowed for them to become sort of virtual celebrities, although they are not actual people. Commonly mistaken for the first Vocaloid due to her massive popularity, the most popular Hatsune Miku has become a global phenomenon; appearing in commercials, performing “live” in concert, and even appearing on David Letterman in 2014. Miku’s voice actually belongs to Japanese voice actress and singer Saki Fujita, who provided the vocals for her voicebank. Although it is her voice, Fujita has not gained nearly the amount of popularity and fame as the character created for her voicebank. Another layer to the Vocaloid phenomenon is that people will actually pay money to see these holograms perform. Miku Expo is a name given to a series of world tours starring the one and only Hatsune Miku, “performing” popular songs created by users of the Vocaloid software, along with elaborate choreography as a hologram projected onto a glass screen.

Hatsune Miku’s “live” performance

Vocaloid has also become more globalized as the software gained popularity, with existing Vocaloid getting voicebanks in different languages, and new vocaloids being introduced in a variety of languages beyond just English and Japanese. SeeU, initially created for the Vocaloid 3 software, is a well-known Korean Vocaloid. Bruno and Clara were the first Vocaloids to have Spanish voicebanks.

Music is ever-changing, and there will always be sour music elitists who dismiss new and unfamiliar styles of music, but whether it is considered “real” music or not, music is music. However, simply calling Vocaloid music does little to explain the concept. Vocaloid introduced the ability for songwriters to create songs with vocals, but without needing an actual vocalist, effectively simplifying the songwriting process and making it more accessible to those who might not have the ability or resources to get vocal tracks for their songs. It essentially removes the limits of both human singers and musicians.

With the advent of new developments in technology, the influence of technology in music has gradually increased, as it does in any industry. The Yamaha DX7, the first mass-produced digital synthesis, was first introduced in 1983, and the influence of such is evident in the popular music of that era (Yamaha). The existence and popularity of the synthesizer did not take away from the value of more traditional instruments but allowed for an entirely new sound that could not otherwise be created.

As it is not a threat to songs, Vocaloid is also not a threat to performers. Despite Vocaloid's capability to substitute synthesized vocals, in no way does Vocaloid pose any barrier or replacement to opportunities for real human singers. This could raise the question of “whether the real world needs to have assurance that something is “alive” or not so as to determine its acceptance” (Jackson and Dines 107). Instead, people need to see it as a separate genre; an actual human vocalist would not be able to create these sounds. Essentially, Vocaloid has allowed for vocals to act as a new instrument, something to push the limits of song creation further than ever before. Although some programs seek to create a more realistic, human-like Vocaloid, the unique sound that Vocaloids possess is also valued by their audience. Vocaloid allows for the ability to create sounds that would otherwise be impossible for a traditional vocalist to perform, paving the way for unprecedented developments in music. These virtual singers can hit the right notes every time, taking the focus away from what might be traditionally considered vocally skillful or impressive, essentially making the ability to obtain quality vocals more accessible. Therefore, the value of Vocaloid’s visual aspects are amplified more so than actual human performers. Although it takes a skillful producer to be able to create an objectively “good” song using Vocaloid, the admiration for the talent of a skilled vocalist is absent. Thus, the aspects of Vocaloid that make it impressive or appealing must go beyond just the vocals themselves, putting a higher emphasis on instrumentals and accompanying visuals such as face characters. “By perceiving the vocaloid software as a creative process in itself and as a natural progression of this theatrical tradition of illusion (rather than a more ominous prediction of the future”, we may see a platform for expression instead of a threat to creative freedom” (Whitely and Rambarran).

Oftentimes, the drawback of having these recognizable face characters associated with the voicebanks is that the actual composers do not receive credit. No, the song is not actually by Hatsune Miku or Megurine Luka, as they are not real people (which has already been established). The Vocaloids themselves are technically not singing robots, because that would insinuate that they themselves have some sort of artificial intelligence, which they do not. Essentially, the Vocaloid program can be used as an instrument: anyone can make music with it if they have the access and skills to do so. The majority of Vocaloid songs are made by average people who are fans of the software, which includes people of a variety of experiences from amateurs or professionals. The accessibility of the Vocaloid software has led to the conception of other similar programs, such as UTAU and CeVio, the former of which is entirely free to use.

Kasane Teto is a popular UTAUloid, a voice bank made for the UTAU software.

Like anything else, when something that has traditionally taken talent or skill to achieve becomes easier and more accessible to the masses, the parameters which are considered impressive within these practices shift to reflect said changes. Throughout history, there is no doubt that people have tested the limits of the human voice in music; Vocaloid allows for those limits to be broken.

While some may believe that Vocaloid is simply derived from the advancement in technology marrying the creation of music, Sheila Whiteley and Shara Rambarran in their novel The Oxford Handbook of Music and Virtuality, include several essays that explore the many facets, arguments, origins, and popularity of Vocaloid. Louise H. Jackson and Mike Dines, in their article, “Vocaloids and Japanese Virtual Vocal Performance”, clearly connect the origins of Vocaloid to Bunraku, a sophisticated Japanese form of puppet theater, dating back to the 17th century (102). The authors detail how the art of Bunraku involved several performers working with the puppet, to create movement, voice, and accompanying music. Jackson and Dines assert that the connection between Bunraka and vocaloids is clear, in that they are “parallel art forms in problematizing human, and in the latter case technological, emotions and sentiment. As a Bunraku play…mixes fact and fiction to create a work of complex heroics and revenge, vocaloids too combine the real and illusory to explore Japan’s increasing fascination with technology” (103). In essence, Vocaloids can be seen as a continuation of this art form that helps to uncover the evolution of and current relationship between the Japanese and technology (Jackson and Dines 103).

Traditional Bunraku Theater

Like Japan, the popularity of Vocaloid in America has historical origins, with the virtual pop stars of the 1950s, such as the Chipmunks and the Archies. In the article “Hatsune Miku, 2.0Pac and Beyond: Rewinding and Fast-Forwarding the Virtual Pop Star”, Thomas Conner outlines how “music group” The Chipmunks’ were the first virtual performers in the United States, Like some Vocaloid voice providers, singer-songwriter Ross Bagdasarian, recording as David Seville, experimented with vocal effects in his recording of the popular song “Witch Doctor”. David, nor Alvin and the Chipmunks who “sing” in the song, are actually real, like the Vocaloid performers of today. “Like Miku, whose anime-like image..has never been photo-realistic and has kept shy of the uncanny valley, the Chipmunks and the numerous cartoon bands to follow, honed to a safe, animated template of virtual production” (Conner 132–133). Animated bands like The Archies in the 1960s would soon follow. Don Kirshner, the music supervisor for another arguable industry created music band The Monkees, who once fired the singers when they actually wanted to write and perform their own music on the show. Kirshner then went on to create the Archies band, comprised of the characters from the original comics, which became a huge hit. In essence, this proves that the path to vocaloids, cross-culturally, was paved long before Miku uttered her first digital word (Conner 133).

Industries have been “creating” performers for decades, the music industry is no exception; the popularity of boy bands in the 90s, as well as K-Pop idol groups in the present day, have proven that people are not exactly concerned with the origins of celebrity musicians. In fact, the massive and intensely loyal fanbases that these groups have exemplified the effectiveness of music groups that are essentially manufactured. Even as “industry plant” has become Twitter’s new favorite buzzword to insult performers they do not like (whether it is true or not), people still adore artists sown from seeds planted by big music executives. Though, the ethicality of these practices has been questioned over the years. Looking back at the tragic stories of people such as Judy Garland who was essentially worked to death after years of being basically owned by MGM studios should act as more of a cautionary tale, yet similar practices still occur in today’s industry. With Vocaloid, it has now become possible to quite literally create an artist, allowing for the same controlled elements that ensure the perfect performer without the detrimental effects of being put in such a position often has on people. Vocaloids are not actually people, so issues of performer exploitation do not exist within its sphere. Not only this but the versatility and accessibility of Vocaloid promotes a new kind of creativity.

Vocaloid songs are often accompanied by not only visual aspects, but also complex storytelling elements. Popular Vocaloid producer Akuno-P’s, also known as MOTHY, series known as “The Evillious Chronicles” includes over 50 songs, along with several books, manga, and written material. All of the characters that appear within MOTHY’s chronology are represented by a variety of different vocaloids. Although artists in the past have previously done similar projects, such as Pink Floyd’s The Wall, which follows a single complex narrative throughout the entirety of the album, along with its accompanying movie, the flexibility of a Vocaloid to be able to portray entirely different characters contingent on whatever the creator may want them to represent is unmatched by any other performer, human or otherwise.

Vocaloid songs cover a wide variety of topics, some venturing into incredibly dark subjects including cannibalism, while others might be more lighthearted dancey songs about vegetable juice. The content created by artists using Vocaloid software is almost unlimited, transcending single genres or styles of music, along with the unrestricted ability for people to tell their own stories using recognizable characters that they can mold to fit new personas.

I have spent far too much time writing off certain genres of music because of varying preconceived opinions that stem from music elitism that affects pretty much everyone at least a little bit. When I was in my “I’m super-cool vintage grungy classic rock” phase, I would cringe at the Black Veil Brides songs I used to listen to when I was in middle school. Clearly, I have moved past this phase (as I literally saw them in concert like two weeks ago), and in my vast life experience, I found that it is so much easier to just enjoy things that interest you and banish the judgemental middle schooler that lives in your head. The concept of Vocaloid is foreign to many, but in the constantly evolving world of music, there is no predicting what could possibly become mainstream in a few years. I’m sure few people would have predicted the popularity of K-Pop groups in America, but we’ve seen groups such as BTS frequenting American awards shows in the past few years. In fact, popularity amongst the Vocaloid fandom has declined due to how commercialized it has become, so it is already too mainstream for some people.

“In the digital music age, it could be argued that all idols are virtual. The primary mode of listening to a song is through the use of speakers while being physically separated from the performer. The listener might try to recognize the source of the sound, or might interpret the song in the context of the artist’s previous endeavors…whatever the listener does, the artist will not be there” (Zaborowski 111). With the exception of live performance, the act of listening to music will always be somewhat artificial. From a psychological perspective, Zaborowski gets to the heart of the Vocaloid debate: humans are reluctant to embrace what they initially do not understand.

Like any advancement in technology, Vocaloid challenges music critics, aficionados, or avid fans to stop shunning the inevitable. In other areas, it is already clear what technology has essentially forced people to accept and how it has changed other aspects of human life, economics, and interaction. Companies like Sears and Kmart arguably went out of business due to their reluctance to incorporate online shopping into their company, arriving so late to the game that they never recovered. In an article from The New York Times, Craig Johnson, president of retail research and consulting firm Customer Growth Partners stated “There are generations of people who grew up on Sears and now it’s not relevant,” Johnson said. “When you are in the retail business, it’s all about newness. But Sears stopped innovating” (Corkery). Many bookstore chains, late to embrace the virtual book or tablet industry also went bankrupt, and their brick and mortar stores crumbled. As stated by Johnson, if an industry stops innovating, the company could certainly begin declining rapidly. Vocaloid is innovation; technology created and technologically driven.

There is no doubt that Vocaloid challenges one’s traditional perception of music. However, the thriving industry of vocaloids proves that it is not going anywhere. While it may be difficult to provide the exact total amount, many sources indicate it is in the billions. Social media has allowed it to thrive and to the dismay of many, that too is not going anywhere. Vocaloid, in some sense, has married the two: creators share the music they created with others, generating more fans. In 2012, a hologram of Tupac Shakur performed “live” at Coachella, performing alongside Dr. Dre and Snoop Dogg. In the future, technology may generate Freddie Mercury or even Michael Jackson, singing music generated from the original vocals. These songs may be created by any one of us but will be new music.

While not everyone may enjoy listening to Vocaloid, it is music.

--

--