Developing Interfaces to AI audio models: Internship Report at Qosmo

Matthias Pitscher
Qosmo Lab
Published in
5 min readMar 28, 2023

--

In 2019, I applied for an internship with Qosmo, who I got to know through a friend and colleague from the Interface Cultures department that had previously worked with them. After an interview and some initial paperwork, I was waiting for my visa documents to be processed so I could book a flight, but travel restrictions disrupted those plans for the next two years. In 2022, I expressed my interest in the internship again and went through the same application process, including an interview where I was asked about my skills as a web developer and my work as a media artist, as well as my personal interests. Eventually, I received an invitation to begin a 4-month internship with Qosmo, and was able to start at last. In this article I want to give you an insight to a unique internship experience and the challenges that come along when building cutting edge AI tools.

In my first month at Qosmo, I joined the team as a fully remote intern and took over front-end development for the neutone plugin. The release of version 1 was imminent and there were several bugs that needed to be fixed before. I received on-boarding and had several calls with the previous developer to go through the codebase and understand what needed to be done. I then worked closely with Andrew, one of the core developers, to find and fix the issues we were facing. Setting up the complex development environment and debugging someone else’s code was a challenge, but through this process I learned a lot about state management in React and how a VST audio plugin with a web front-end functions. Finally we were able to build and test the plugin in time for its release.

The majority of Qosmo’s employees are based in Japan, but work remotely half of the week. This means a lot of the communication happens in the slack channel, which is also actively used to share interesting articles and discuss current trends and various topics. Apart from that, there are weekly All Hands meetings and video calls for the projects that I was involved in. Sometimes this meant getting up early in Europe to join the team in Japan. Another bi-weekly video session is called “Brainstorming” where someone presents an interesting topic to spark a discussion. Topics range from broad questions like what is art to specific things like why does logo design become so homogeneous. In my session I talked about Collective Intelligence and how AI systems are powered by the work of countless amounts of people. So even if one can only join remotely, I found the environment to be interesting and engaging beyond just work-related tasks.

After some struggles with the embassy, because they were completely over-booked, I was finally able to pick up my visa and take a flight to Tokyo. A few days later I was welcomed to the office, which is in the beautiful and hip district of Naka-Meguro. Having lunch with colleagues is an absolute joy, as there are so many great choices from Thai food to Udon and Pizza.

Together with the designer Ise-san we started working on an interactive display for neutone and the music generation application for a conference in November. Again, I adapted the interface from the previous front-end engineer, which was built in React. This time the challenge was to separate the MIDI channels and add a visual representation of the notes, so visitors can see what the system is doing. The generated music was fed into Ableton as input, where neutone transforms the sound in real-time. The result was presented at the Inter BEE expo and got awarded Innovative Technologies 2022.

While hard work and short turnarounds were common, I was free to structure my time as I wish, so I could take a few days off to compensate for some longer days. The well-being of interns and the rest of the team is also of concern and openly talked about in the general meetings or in person. Outside of work-related events the team invited me to a games night at the office for my birthday (with the Mario Kart Tournament cut short, because of the wrong power adapter). Visiting temples, festivals or sharing amazing food together also happens for special occasions and is one of the reasons why it absolutely makes sense to do the internship in person.

Lastly I joined on building a prototype for a client who wanted to explore new voice synthesis models and the potential for recognizing and responding to affective speech patterns. The first stage of this project was to create a proof of concept in 3 weeks. I teamed up with the AI engineer Bogdan and we decided to build the interface with Gradio, a python library that creates a web front-end with pre-built UI components. By chaining open pre-trained models together we were able to make a quick demo of a speech-reactive system that we can easily adapt and explore for further creative possibilities.

Having spent the past three months in Tokyo, I’ve gained a deeper understanding of Qosmo’s work culture and daily life. I significantly improved my front-end development skills and explored the exciting world of music and audio processing using machine learning. Additionally, being in Japan has given me the opportunity to immerse myself in its rich culture and cuisine. I wouldn’t hesitate to repeat this experience and look forward to continuing my collaboration with the talented Qosmo team. I am truly grateful for this opportunity.

If you are interested to join Qosmo, they are always looking for new talents. Reach out to us here.

--

--

Matthias Pitscher
Qosmo Lab

Artist, researching new technologies and how they affect society.