Stories by Solomon Weinstein on Medium

Image Processing 02

Solomon Weinstein — Tue, 21 Apr 2026 15:22:47 GMT

This is absolutely working. Here’s my Processing sketch that sends images:

import oscP5.*;
import netP5.*;
import java.io.File;

OscP5 oscP5;
NetAddress wekinator;

PImage img;

int w = 10;
int h = 10;

void setup() 
{
  size(400, 400);

  oscP5 = new OscP5(this, 12000);
  wekinator = new NetAddress("127.0.0.1", 6448);

  colorMode(RGB, 255);
}

void draw() 
{
  background(0);

  if (img != null) 
  {
    image(img, 0, 0, width, height);
  }
}

void keyPressed() 
{
  if (key == 'l') 
  {
    selectInput("Select an image", "FileSelected");
  }

  if (key == 's' && img != null) 
  {
    sendImageToWekinator(img);
  }
}

void FileSelected(File selection) 
{
  if (selection != null) 
  {
    img = loadImage(selection.getAbsolutePath());
  }
}

void sendImageToWekinator(PImage source) 
{

  source.loadPixels();

  float total = 0;
  for (int i = 0; i < source.pixels.length; i++) 
  {
    float r = red(source.pixels[i]);
    float g = green(source.pixels[i]);
    float b = blue(source.pixels[i]);
    total += (r + g + b) / 3.0;
  }
  float avgBrightness = (total / source.pixels.length) / 255.0;

  PImage small = source.copy();
  small.resize(w, h);
  small.loadPixels();

  OscMessage msg = new OscMessage("/wek/inputs");

  msg.add(avgBrightness);

  for (int i = 0; i < small.pixels.length; i++) 
  {

    float r = red(small.pixels[i]) / 255.0;
    float g = green(small.pixels[i]) / 255.0;
    float b = blue(small.pixels[i]) / 255.0;

    msg.add(r);
    msg.add(g);
    msg.add(b);
  }

  oscP5.send(msg, wekinator);
}

I load images by pressing the L key then send them to Wekinator by pressing S. I trained Wekinator on just four images for now, all of trees. The first is a bright image of one tree, the second a dark image of one tree, the third a bright image of lots of trees, and the fourth a dark image of lots of trees. I trained it so that brighter corresponds to a higher cutoff frequency of my lowpass filter, and more trees corresponds to a higher pitch.

It seems to be working relatively well. Because I only trained it on four images, the neural network is picking up on some factors that I didn’t intent it to. Generally, though, images with lots of trees give me a higher pitch, and brighter images give me a higher cutoff frequency. When I train it on more images, these extraneous factors should stop influencing it as much.

Next up is adding more information. I’ll train it to recognize a few more object types and image parameters, and I’ll add a few more sound output parameters as well.

Image Processing 01

Solomon Weinstein — Tue, 14 Apr 2026 14:58:13 GMT

I’ve moved on to image processing. I’m starting out with just two different things: object classification and a brightness value. I’ll add more once I get this working, but I’ll never know if it’s functioning how I want if I start out with 10 parameters.

I’m writing this as another Processing sketch. For now, I’m going to send a 301 element vector to Wekinator. The first element is the average brightness of the image, and the other 300 are color information for each pixel. I’m going to deal with square images scaled down to 10 by 10 pixels for the time being. It’s not very precise, but it’s less to deal with right now. The actual process is no different once I scale it up. So, each OSC message is one brightness value and RGB values for each of 100 pixels.

My understanding is that it doesn’t matter if Wekinator receives a message where every 3 values (R, G, and B) correspond to just one pixel. It will recognize patterns equally well regardless of whether it “knows” that these value aren’t all separate.

No finished code to show right now but it’s all in the works. I’ll aim to have it done by next class.

Another Update

Solomon Weinstein — Fri, 10 Apr 2026 15:05:07 GMT

I wrote a Chuck script to take Wekinator’s outputs and convert them to the frequency and lowpass cutoff frequency of a sine oscillator. I’m not getting any sound out of Chuck, though. Wekinator is outputting fine, but that doesn’t necessarily mean that Chuck is listening. I’m going to try using that Chuck script that checks if it it receiving OSC messages.

Chuck isn’t receiving OSC messages. I’m getting an “unable to create server instance” error. Not sure what’s up.

Ah, I think this is exactly the issue we talked about a few days ago. I closed my old Chuck script but it’s still sitting on port 12000. Let me try a different port.

That worked. Chuck is receiving OSC messages now. There’s probably something wrong with my original script.

I’m getting this error:

:7:22: error: ugen's of type 'DAC' have no input - cannot => from another ugen
[7] SinOsc p0 => LPF lpf => dac;
                         ^

The internet seems to think that Chuck is misinterpreting “dac” as something that can’t take input, and that replacing it with dac.left to specify channel 0 will fix it.

Now it can’t bind to port 12000. I think there’s a ghost Chuck running in the background. I checked and there isn’t. Gonna call it for now and see if I can figure it out later.

Update: I got it working. Just an audio driver issue I think. I also moved over to port 12001; something must have been sitting on port 12000.

Here’s the Chuck code:

OscRecv recv;
12000 => recv.port;
recv.listen();

recv.event("/wek/outputs, f f") @=> OscEvent osc;

SawOsc p0 => LPF lpf => dac;

0.2 => p0.gain;
300 => p0.freq;
1000 => lpf.freq;

fun float map(float mapme, float low1, float high1, float low2, float high2) {
    return (mapme - low1) / (high1 - low1) * (high2 - low2) + low2;
}

while (true) {
    osc => now; 

    while (osc.nextMsg() != 0) {
        osc.getFloat() => float x;
        osc.getFloat() => float y;

        map(x, 0.0, 1.0, 100.0, 1000.0) => p0.freq;

        map(y, 0.0, 1.0, 200.0, 5000.0) => lpf.freq;
    }
}

It’s all working when integrated too. I can set the audio parameters manually in Wekinator and move my cursor to a location that I like in the Processing sketch, train the model a few times, and run it. When I move my cursor in the Processing sketch, the audio coming out of Chuck changes in real-time.

Update

Solomon Weinstein — Tue, 07 Apr 2026 15:02:37 GMT

(This is all written in real-time as I work through this)

I’m making my way through the detailed walkthrough section of the Wekinator documentation.

I’m trying to write a basic program in Processing that takes mouse position as input and sends it to Wekinator (essentially just mimicking the first half of the example project that I played with a little while ago). I don’t know Java and I’ve never used Processing.

I’ve been looking through the documentation for the oscP5 library for a thousand years and I think I can figure it all out.

Pretty sure I wrote a sketch that will work. The visual component is working fine on the Processing end, at least. Just have to see if Wekinator likes it.

The sketch

Wekinator is not receiving any OSC messages. I don’t know why. My input port is set to 6448, the input types match the type Wekinator is expecting to receive, I have the correct number of inputs, and the input message format is correct. This is probably because I am learning Java completely on the fly. Still troubleshooting.

Alright, I’m getting somewhere. I downloaded the WekiInputHelper program that acts as an intermediate step between the inputs themselves and Wekinator receiving them. If I manually send an output from WekiInputHelper to Wekinator (on any port), then Wekinator receives it. I also confirmed that no other program is using port 6448, because WekiInputHelper throws an error when it tries to listen to a port that’s already in use. So the problem isn’t with Wekinator receiving inputs, it’s that my mouse movement isn’t actually sending them.

It started working. Don’t know why. Moving on.

I don’t really remember changing anything between it not working and working but here’s the final sketch:

import oscP5.*;
import netP5.*;

OscP5 oscobject; //makes new OscP5 object
NetAddress destination; // likewise

void setup() {
  size(200, 200); //mouse region
  oscobject = new OscP5(this, 12000); //listens to port 12000
  destination = new NetAddress("127.0.0.1", 6448); //sends to port 6448
}

void draw() {
  background(0);
  float x = map(mouseX, 0, width, 0, 1); //maps my mouse position from position in the window to 0-1 range
  float y = map(mouseY, 0, height, 0, 1);
  
  OscMessage msg = new OscMessage("/wek/inputs");
  msg.add(x); 
  msg.add(y);
  
  oscobject.send(msg, destination);
  
  fill(255);
  ellipse(mouseX, mouseY, 10, 10); //little bubble for my cursor
}

All the commenting is much more for me than for anyone reading it. It’s way easier to write what each line does here than reference the documentation every time I forget.

Wekinator is Cool

Solomon Weinstein — Fri, 27 Mar 2026 04:25:32 GMT

Wekinator is definitely the way to go here. I don’t see myself successfully writing an ML algorithm from scratch in any reasonable amount of time, and Wekinator solves that.

I downloaded Wekinator and started playing around with some of the test programs to get a sense of how it works. I paired a mouse location input with continuous sound generation output and trained it a number of times on various data.

What I’m aiming to do is basically a more complicated version of this. This takes a two-number array as an input, and I can write my parameters such that they fit as a series of numbers into an array as well. There are only so many objects that an image of the outdoors can contain, so I can assign each of these a number. Perhaps a second dimension of the array can indicate relative frequency of objects. Other image parameters, like brightness, general color composition, saturation, and depth of the scene, can be assigned a 0-n (n tbd) value and sit in the array as well. It is a much more complicated process to derive this input from an image than it is to derive an x-y pair from mouse location, but an array is an array. Once I have it, I can train Wekinator to map these parameters to parameters of sound synthesis.

I’m going to start looking into the exact machine learning process that the sound synthesis program I ran uses and compare it to others. I’ll decide which to use and figure out how to do it next.

Initial Project Overview

Solomon Weinstein — Fri, 13 Mar 2026 23:11:35 GMT

As a brief re-summary: I aim to make a keyboard that can take image input of what’s around me (specifically focusing on outdoor scenes) and, based on the image content, synthesize a set of sounds that I can use on the keyboard.

I imagine integrating this into a physical keyboard will be the last step, and relatively easier than the rest of the project. The majority of the work will involve writing a program that can analyze images and synthesize sounds.

My hope is for the image analysis to be fairly comprehensive. It would note objects, colors, textures, depth, spatial layout, and other qualities of images. Objects and textures (and most other qualities, in an implicit way) can be classified by a convolutional neural network (CNN). CNNs find objects and textures in images by detecting edges with convolutions and recognizing patterns in these edges to classify things. This paper from 2017 discusses the math and theory behind an image analysis CNN that is easily trainable, widely applicable, and requires very few parameters relative to other models that perform similarly. Writing this sort of model on my own would require a lot of learning, but it’s a saturated focus area and I’m confident that there are enough resources available for me to do it.

To create sounds, I want to create some sort of mapping from the parameters in the image to parameters of sound synthesis. Different image parameters could decide synthesis type; waveform shape; number of oscillators; filter types and parameter values; volume, spatialization, and filter envelopes; and force input interpretation. This approach would require a lot of fine-tuning on my part to ensure that the sounds the program outputs fit the proper mood in the image. These papers (a and b) discuss different strategies for sonification of images and other non-sound data forms. They both suffer from being reasonably old — 2004 and 2013, respectively. Or rather, this field is actively developing and a lot has changed in the past 13–22 years (if it makes me wince to call 2013 old, I’d better tread lightly). There are still useful tools in both papers, though.

The hardware implementation will happen later in the project. I have a Novation Launchkey 37 that I can use (it has aftertouch, and that adds an extra layer of expressiveness that I can map to), but the specific keyboard won’t matter too much.

Much farther in the future, I hope to integrate everything into a keyboard. It would have the computer inside and automatically push the sounds it synthesizes to different tracks on the keyboard. I would have to decide whether the keyboard should have a small detachable camera, or just take wireless input from a phone camera.

NIME and ICMA Papers

Solomon Weinstein — Fri, 13 Feb 2026 02:23:29 GMT

Papers I found interesting:

Synthetic Ornithology: Machine learning, simulations and hyper-real soundscapes

The Shadow Harvester: Sonifying the Body Through Light

The Emotional Characteristics of Rain Sound Effects

Brief exploration of the second paper above:

This paper interested me because of its relation to what I am considering creating as my project. The basic concept is the same: visual data is transformed into auditory output.

In this case, the visual data is the shadow of a violinist playing their instrument. The violinist is backlit by a bright light and their shadow is projected onto a field of sensors that detect the presence and intensity of the light. As the violinist plays, their physical movement, as picked up by what sensors are and are not blocked by their shadow, also controls different parameters in a Max patch, so the acoustic (violin) sounds are accompanied by changes to the digital sound.

The fact that it is the shadow specifically that controls these parameters is interesting, because it means not all changes in the violinist’s position will affect the digital sound. The shadow is a two-dimensional projection of a three-dimensional thing, so not all information is captured.

The goal of the project was for this to be a form of performance art. The shadow would be visible to audience members on a cloth screen, and the violinist would have to pay attention to both the sound they are producing on their instrument and the sound their movement is making.

Project?

Solomon Weinstein — Fri, 06 Feb 2026 16:32:51 GMT

I am a lover of all things outdoors. In the last few years, that has meant rock climbing, caving, white water kayaking, and much more, but the one thing that I’ve always done is hike.

I take hiking loosely. Recently it has skewed much more toward trail running, but I am equally happy running a trail in one day or backpacking it in five.

When I’m out there, I don’t tend to listen to music. But occasionally, upon stopping to take a rest, I get the urge to make music myself. Once or twice I’ve brought a little MIDI keyboard out and gotten frustrated at the mobile version of Garage Band. Part of the difficulty, too, is that I want to use different sounds depending on where I am and what is around me (the vibe, so to speak). That can take a long time to dial in.

So here’s the grand idea: I want to be able to take a picture or, perhaps, a short video of what’s around me, have a program that I create analyze colors, shapes, and overall trends based on parameters that I give it, and generate a series of sounds that it synthesizes. I don’t want to instruct a generative AI system to do this for me; I want to program it myself.

Initially, this can be a photo that I take with my phone, feed into a program manually, and then send to a keyboard, but down the line I think it would be very cool for it all to be integrated in the board. Camera, analysis, and all.