Image for post
Image for post
Inside Music — YouTube Edition interface


Imagine listening to your favorite song in 360 ° audio. I am pleased to present Inside Music YouTube Edition, a web application that uses Artificial Intelligence to separate instruments and place them in a virtual room for an immersive experience. Try it in modern browsers with headphones, or in Virtual Reality devices. Link: Credits: Experiments with Google.


What if you could step inside a song? This is a simple experiment that explores that idea. See and hear the individual layers of music all around you to get a closer look at how music is made.

It’s built with WebVR and spatial audio. Thanks to Song Exploder and Google Creative Lab for open-sourcing their code we could adapt it to allow you to explore your own music in VR. Learn more on the original GitHub project and the YouTube edition code. And check out more WebVR Experiments here. Please send us any feedback to …

This year the 22nd ACM Conference on Computer-Supported Cooperative Work and Social Computing took place at the Hilton Hotel in Austin, Texas. Our laboratory presented our work about the ethical concerns of AI technologies in the public sector. The presentation was part of the Good Systems: Ethical AI for CSCW Workshop. You can find more details about our work on this link.

Image for post
Image for post
Kolina Koltai

A few of the other talks included:

“Ethics of AI: Learning from AI Failures”, this talk was given by Saleema Amershi from Microsoft Research who presented guidelines from Human — AI interaction that can be found here.

Image for post
Image for post

“The Ethical Operating System Toolkit”, this talk was given by Sam Woolley, UT-Austin who presented a toolkit to anticipate the long-term social impact and unexpected uses of the tech we create today. The toolkit can be found here. …

Antes que nada me presento, soy Carlos Toxtli una persona dedicada a la tecnología que pasa gran parte de su vida sentado a un escritorio mientras mi mente quiere conocer el mundo y ya no le son suficientes los documentales.

Todo surgió por un viaje en el que dí una charla en Zurich, Suiza y me dispuse a continuar 2 semanas y unos días mas laborando desde distintas latitudes y a su vez conociendo nuevos lugares. El problema comenzó al tratar de encontrar tours que se ajustaran con las fechas que visitaba. …

Multimodal video action recognition and localization methods for spatio-temporal feature fusion by using Face, Body, Audio, and Emotion features


Video blogs are every time more popular because of online streaming platforms. Anyone can post content no matter their video editing skills. Novice video bloggers have had to acquire these skills to publish quality content. Video editing is usually a time-consuming task that discourages users to publish periodic content. The most common format for individual video bloggers is the monologue. Monologues have fixed conditions, such as one person at a time and a fixed camera position. Monologues are a perfect setting for automatic video editing (AVE). In this article, we present AutomEditor, a system that automates monologue video editing. AutomEditor uses multimodal video action recognition techniques to detect video bloopers. AutomEditor extracts body skeleton, face, emotions, and audio features from video clips. Our model implements early feature fusion over recurrent neural networks and multi-layer perceptrons. The model was trained and evaluated by using the BlooperDB, a manually collected and annotated dataset. Our model got a 100\% accuracy in the validation set and a 90\% in the test set. We propose a blooper localization algorithm for untrimmed videos, based in the predictions frequency. We implement a web interface to visualize the blooper fragments. AutomEditor was able to successfully locate and visualize all the bloopers on test untrimmed videos. …

Image for post
Image for post

Some weeks ago I was asked to create a website where multiple people were collaborating. It was an interesting challenge since the changes in the content were happening constantly on a Google Docs document. Keeping track of all the changes and updating the static website URL that was provided to me, sounded like a very intensive back and forth process to keep it updated.

This is why, I developed Docs2Web, a very simple interface that allows individuals and teams to turn any Google Docs document as a modern website. If it is true, that Google Docs already has a feature that enables users to make the content public on the web, it makes hard for readers to read long documents on a monotonic format. …

Image for post
Image for post

Carlos Toxtli Hernandez, Claudia Flores Saviaga, Marco Maurier, Amandine Ribot, Temitayo Samson Bankole, Alexander Entrekin, Michael Cantley, Salvi Singh, Sumitra Reddy, Yenumula. V. Reddy.


Even though the advent of the Web coupled with powerful search engines has empowered the knowledge workers to quickly find the needed information, it still is a time-consuming operation. Presently there are no readily available tools that can create and maintain an up-to-date personal knowledge base that can be readily consulted when needed. While organizing the entire Web as a semantic network is a long-term goal, creation of a semantic network of personal knowledge sources that are continuously updated by crawlers and other devices is an attainable task. We created an app titled ExperTwin, that collects personally relevant knowledge units (known as JANs) from the Web, Email correspondence, and locally stored files, organize them as a semantic network that can be easily queried and visualized in many formats — just in time — when performing a knowledge-based task. The architecture of ExperTwin is based on the model of a “Society of Intelligent Agents”, where each agent is responsible for a specific task. Collection of JANs from multiple sources, establishing the relevancy, and creation of the personal semantic network are some of the many tasks performed by the individual agents. Tensorflow and Natural Language Processing (NLP) tools have been implemented to let ExperTwin learn from users. …

Hum2Song! is an AI-powered web application that is able to compose the musical accompaniment of a melody produced by a human voice.

System overview

These are the components of Hum2Song!

Image for post
Image for post

The list of steps to create this solution are listed below:

  • Learn how MIDI files are structured
  • Scraping the website (16k files)
  • Decide the features to use
  • Data preprocessing
  • Stratified sampling
  • Evaluate several NN architecture combinations (325 per condition).
  • Fine tuning the best options
  • Convert the best model to tensorflow.js
  • Implement an https site that allows voice recording
  • Implement my model and Google Magenta models
  • Clean the noisy transcribed data
  • Get the genre, a drum, a bass, a tonal scale, and chords progression from the melody. …

By Carlos Toxtli, Claudia Saviaga

Image for post
Image for post


Definición del problema

La capacidad de identificar, rastrear y encuadrar objetos en diferentes poses y fondos es importante en muchas aplicaciones de video en tiempo real. La detección , seguimiento, alineación y estabilización de objetos han sido áreas de investigación de gran interés en la visión por computadora y reconocimiento de patrones debido a la naturaleza desafiante ya algunos objetos que cuentan con múltiples facetas, y requieren de algoritmos suficientemente precisos para identificar, rastrear y enfocar a un objeto del resto. Un desafío adicional es procesar videos capturados por dispositivos móviles que a menudo son inestables y no dirigidos debido a la falta de equipos de estabilización en estos dispositivos. …

Por Claudia Saviaga, Carlos Toxtli

Image for post
Image for post

La exploración de vídeos automatizada es importante en aplicaciones de la vida real, por ejemplo la detección de material con derechos de autor es crucial en la actualidad debido a la gran cantidad de contenido que se sube a las redes sociales y plataformas de videos. En este articulo presentamos un novedoso método para la exploración de videos automatizada, capaz de detectar contenido que ha sufrido distintos tipos de distorciones tales como cambios en tamaño, iluminación y rotación entre otras. Esto con el fin de también ser capaces de detectar contenido que está siendo proyectado o se reproduce en pantallas. Para lograr esto empleamos LCS (Longest Common Subsequence) una técnica de algoritmos de cadenas que es usualmente empleada para textos, en conjunto con clasificadores de imágenes que generan un alfabeto dinámico que representa los objetos encontrados en cada frame. …

Image for post
Image for post

Los Bots han sido implementados en múltiples sectores, por ejemplo en el laboral, político, turístico, etc. Y el sector educativo no es la excepción como por ejemplo tenemos Botsify, Snatchbot, Ivy, por tan solo mencionar algunos.

Como parte del trabajo realizado en el Laboratorio de Interacción Humano Computadora de la Universidad de Virginia Occidental se estudia el impacto de los bots en distintas actividades humanas. Carlos Toxtli quien actualmente se desarrolla como investigador bajo la supervisión de la Doctora Saiph Savage ha enfocado su trabajo al desarrollo y estudio de bots en los sectores educativo, laboral, comercial y productivo.

Dentro de los trabajos relacionados a educación se cubren las siguientes…

Carlos Toxtli

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store