Stories by Xufuji on Medium

Why AI Music Recommenders Get Psychology Wrong — And How We Fixed It

Xufuji — Mon, 25 May 2026 09:47:22 GMT

The $32 billion music therapy market is booming. But the algorithms powering our playlists are stuck in 2006.

A few months ago, a user typed this into our system:

“I just ended a long relationship. I’m not crying anymore, but everything feels hollow.”

Then she hit play.

Most music apps would scan for the keyword “sad,” cross-reference her listening history with other users who liked sad songs, and serve up a playlist of breakup ballads. Spotify’s Discover Weekly does exactly this — it finds people whose tastes overlap with yours, then recommends what they loved. Apple Music’s algorithm is similar. YouTube Music, Amazon Music, Tidal — same architecture, different data lakes.

It’s called collaborative filtering. It powers roughly 80% of digital music discovery today. And from a pure engagement standpoint, it works beautifully. The problem? It has almost nothing to do with how human emotion actually functions.

Our user didn’t get a breakup playlist. She got something quieter, stranger, and — in her own words — “exactly what I needed but couldn’t name.” What she experienced wasn’t better curation. It was a fundamentally different way of thinking about the relationship between music and mood.

This is the story of why we built it.

The Mirror Problem

In 2006, when Netflix launched its famous recommendation prize, the goal was simple: predict what a user would rate highly based on what similar users rated highly. The underlying assumption — that taste is a stable, transferable property — became the DNA of modern recommender systems.

Music apps inherited this logic. They treat your emotional state as a coordinate in taste space, then find the nearest musical neighbors. Feeling melancholic? Here’s more melancholy. Feeling angry? Here’s rage metal. Feeling happy? Here’s upbeat pop.

The technical term is distance matching. Minimize the Euclidean distance between “user mood vector” and “song mood vector.” It’s elegant, scalable, and psychologically naive.

Because here’s what psychologists have known for decades: people don’t use music to reflect their emotions. They use music to regulate them.

A landmark 2013 study by Thoma, Ryf, Mohiyeddini, and colleagues found that music selection is overwhelmingly strategic. When people feel low, they don’t always want to wallow — they want a gentle path toward something lighter. When they feel anxious, they don’t need more tension. They need release. When they feel hollow — that specific, post-grief flatness — they need re-engagement, not reinforcement.

The “mirror” approach treats music as emotional wallpaper. But music is actually one of humanity’s oldest emotional technologies — a tool we’ve used for millennia to shift states, process trauma, and find our way back to equilibrium.

Collaborative filtering never got that memo.

What “Hollow” Actually Means

Let’s look at that user input again: “I’m not crying anymore, but everything feels hollow.”

Traditional sentiment analysis would tag this as “sad.” Maybe “sad” with a low intensity score. But “hollow” is not low-intensity sadness. It’s a distinct emotional state with specific psychological markers:

Energy: Low but not collapsed (she’s functional, not bedridden)
Valence: Negative, but dulled rather than acute
Stress: Moderate — there’s tension in the flatness
Depth: Surprisingly high — this is a profound emptiness, not boredom

Psychologists call this emotional flatness with residual depth — the aftermath of grief where the sharp pain has faded but the sense of meaning hasn’t returned. It’s the emotional equivalent of physical therapy: the acute injury has healed, but rehabilitation is just beginning.

A distance-matching system would find songs “close” to this state — probably more hollow, more empty, more flat. It would extend the condition rather than resolve it.

What she actually needed was gentle re-engagement — music that slightly raises energy, modestly improves valence, reduces stress, but maintains depth. Something that honors where she is while quietly suggesting where she might go. Not a mirror. A bridge.

Direction Matching: The Alternative

At Sonome, we built something we call directional matching.

Instead of asking “Which song is closest to this mood?” we ask: “Which song creates the right movement?”

The architecture works in three stages:

First, we map. We translate natural language into a four-dimensional emotional coordinate: Energy, Valence, Stress, and Depth. This isn’t a taxonomy of 12 basic emotions — it’s a continuous space that can represent states like “sad but calm,” “joyful yet tense,” or “nostalgic with underlying hope.”

Second, we prescribe. Based on the mapped coordinates, the system calculates a target emotional state — not an arbitrary “happy,” but a psychologically appropriate next step. For our hollow user, the target isn’t euphoria. It’s slightly more present, marginally more connected, gently less tense — while keeping the depth that makes the experience feel authentic rather than dismissive.

Third, we score by trajectory. Every track in our library gets a directional score: how effectively does it move the listener from current state to target state along each dimension? A folk ballad with quiet hope might score well. An aggressively upbeat pop anthem would score poorly — not because it’s “bad music,” but because the emotional jump is too steep. The direction is right, but the magnitude is wrong.

Think of it like emotional physical therapy. Small, deliberate movements that don’t strain the psyche.

Why This Feels Different

Users describe Sonome’s recommendations with words like “surprisingly fitting” or “exactly what I needed but couldn’t name.” They can’t articulate why it works better than other apps. The reason is invisible: most music apps reinforce where you are. Sonome recognizes where you are, then gently suggests where you might want to be.

There’s no explicit “therapy mode.” There’s no clinical interface. The psychology is embedded in the matching logic. Users just experience the result: music that seems to understand not just their present, but their potential next state.

This matters because the music therapy market is exploding. In 2026, it’s valued at $32.2 billion, growing at 9.4% annually, with mental health applications representing the largest segment at 31.4%. Yet the AI tools powering this space — Suno, Udio, Mureka, and dozens of others — are competing almost entirely on production quality: better vocals, faster generation, more genres.

Almost no one is competing on therapeutic intelligence.

The Engineering Problem No One Talks About

Making directional matching work at scale required solving a specific technical challenge.

Emotional analysis via large language models is powerful but slow. Running full psychological inference for every recommendation would make real-time interaction impossible. You’d be waiting seconds between tracks — an eternity in music streaming.

Our solution was to separate the heavy lifting from the real-time matching.

We pre-computed. We analyzed hundreds of genres and tracks through the lens of our four-dimensional model, distilling each into a compact “emotional signature.” The full psychological analysis might consume 50,000 tokens of reasoning. The pre-computed signature uses roughly 3,000.

At recommendation time, the system performs a fast local scan across these signatures. The deep psychological modeling happens once, offline. The real-time matching is lightweight directional math. The user gets millisecond-level responses powered by clinical-grade emotional reasoning.

It’s the difference between doing therapy in every session and having a therapist pre-write a playbook that a skilled nurse can execute instantly.

The Bigger Picture: AI and Emotional Agency

Directional matching represents something larger than music recommendation. It’s a shift in how we think about AI and human emotion.

Most emotional AI tries to detect what you feel. We think the more valuable challenge is understanding what you need to feel next. Detection is observation. Direction is care.

This philosophy extends beyond music. Any domain where AI interfaces with human emotion — wellness apps, creative tools, conversational agents, even productivity software — faces the same choice: mirror the user, or guide them?

Mirrors are safer. They don’t risk being wrong. They don’t risk being prescriptive. But they’re also inert — they extend the present rather than shape the future.

Guides are more useful. They acknowledge that human emotional states are not endpoints; they’re waypoints. The goal isn’t to perfectly represent where someone is. It’s to help them get somewhere better.

We’ve chosen to build a guide. The technology is complex — four-dimensional mapping, directional scoring, pre-computed psychological signatures. The idea is simple: good recommendations don’t just understand where you are. They know where you could go, and they find the gentlest path to get there.

What We’re Learning

Early usage data is revealing something unexpected. Users don’t just listen longer — they return differently. One user told us she opens the app not when she wants background noise, but when she feels “stuck” and doesn’t know what she needs. She’s treating it as an emotional tool rather than entertainment.

Another pattern: people use it for transitions. Morning-to-work. Work-to-evening. Evening-to-sleep. The directional logic seems to excel at liminal spaces — those in-between states where you know you need to shift but don’t know how.

We’re also seeing demand for what we call emotional journeys — not single songs but sequences that guide a user through a deliberate state change over 20–30 minutes. This is closer to clinical music therapy protocols than playlist curation, and it’s pointing toward a future where AI music tools might genuinely augment mental health care, not just entertain.

The Road Ahead

The AI music space in 2026 is dominated by generation speed and audio fidelity. Suno V5 is praised for “emotional vocal realism.” Udio competes on “audio fidelity.” Mureka differentiates with “multi-language support and DAW integration.” These are production battles — who can make the best-sounding audio from a text prompt.

We’re playing a different game. We believe the next frontier isn’t better sound. It’s better understanding — AI that grasps not just what you typed, but what you’re going through, and responds with genuine emotional intelligence.

The $32 billion music therapy market isn’t looking for faster beat generation. It’s looking for scalable emotional care. The technology to deliver that exists. What’s been missing is the psychological architecture to deploy it responsibly.

Directional matching is our attempt at that architecture. It’s not perfect. Emotional states are messier than four dimensions. Cultural differences in emotional expression — the Portuguese saudade, the Japanese mono no aware, the Chinese chou chang — resist simple mapping. And the line between “guiding” and “manipulating” is thin and requires constant ethical vigilance.

But the alternative — letting algorithms that don’t understand psychology shape our emotional diets — is no longer acceptable. We’ve spent two decades optimizing for engagement. It’s time to start optimizing for emotional wellbeing.

Our user with the hollow heart? She listened to one track. Then another. Then she closed the app and went for a walk — something she hadn’t felt like doing in weeks. She didn’t mention the genre. She didn’t mention the artist. She mentioned the feeling: “like someone understood exactly where I was, and knew I didn’t want to stay there.”

That’s not curation. That’s care, made scalable.

Sonome is an AI music platform that translates human stories into emotionally resonant compositions. If you’re building at the intersection of psychology and technology, we’d love to hear from you.

The Psychology Behind Sonome: Why We Match Emotions by Direction, Not Distance

Xufuji — Mon, 25 May 2026 09:29:30 GMT

How a Four-Dimensional Mood Map Is Changing the Way AI Understands Music Therapy

Most music recommendation systems get the psychology wrong.

They assume sadness calls for sad songs, anger calls for angry songs, happiness calls for upbeat tracks. It’s intuitive. It’s also incomplete.

At Sonome, we built something different. Instead of matching your current mood to a similar-sounding genre, we ask a deeper question: Where does this emotion want to go?

The Problem with “Like Matches Like”

Traditional recommendation engines operate on proximity. They plot songs and user preferences in vector space and find the nearest neighbors. If you’re feeling melancholic, they serve you more melancholy.

This works for discovery. It fails for emotional wellbeing.

Psychologists have known for decades that music doesn’t just reflect emotion — it regulates it. A 2013 study by Thoma et al. found that people strategically select music to shift their emotional state, not to reinforce it. Someone feeling low doesn’t always want to wallow; often, they want a gentle path toward something lighter. Someone anxious doesn’t need more tension — they need release.

The “distance matching” approach treats music as a mirror. We treat it as a bridge.

Building a Four-Dimensional Mood Space

The foundation of Sonome’s engine is a coordinate system that captures emotion in four dimensions:

Energy — from drained to electrified
Valence — from negative to positive affect
Stress — from calm to tense
Depth — from surface-level to profoundly felt

A user might input: “I just ended a long relationship. I’m not crying anymore, but everything feels hollow.”

Traditional sentiment analysis might tag this as “sad.” Our system maps it to something more precise: low energy (35), low valence (25), moderate stress (50), very high depth (80). It’s not acute grief. It’s the hollow aftermath — what psychologists call emotional flatness with residual depth.

This precision matters because the intervention changes completely.

Directional Matching: The Core Innovation

Here’s where Sonome diverges from every recommendation engine we’ve studied.

Instead of asking “Which song is closest to this mood?” we ask: “Which song creates the right movement?”

We calculate a target emotional state based on the user’s current coordinates. For that hollow, post-grief state, the target isn’t “happy.” That would feel jarring and false. The target is gentle re-engagement: slightly higher energy, modestly improved valence, reduced stress, maintained depth. Think of it as emotional physical therapy — small, deliberate movements that don’t strain the psyche.

Then we score every track in our library not by similarity, but by directional efficacy. A song gets high marks not because it resembles the user’s mood, but because it constructively shifts it along the needed axes.

A folk ballad with quiet hope might score well. An aggressively upbeat pop anthem would score poorly — not because it’s “bad,” but because the emotional jump is too steep. The direction is right, but the magnitude is wrong.

This is why we call it direction matching, not distance matching. We’re not minimizing Euclidean distance in mood space. We’re optimizing for therapeutic trajectory.

Why This Feels Different in Practice

Users describe Sonome’s recommendations with words like “surprisingly fitting” or “exactly what I needed.” They can’t articulate why it works better than other apps. The reason is that most music apps reinforce where you are. Sonome recognizes where you are, then gently suggests where you might want to be.

There’s no explicit “therapy mode.” There’s no clinical interface. The psychology is invisible, embedded in the matching logic. Users just experience the result: music that seems to understand not just their present, but their potential next state.

The Engineering Behind the Psychology

Making this work at scale required solving a specific technical challenge. Emotional analysis via large language models is powerful but slow and expensive. Running full psychological inference for every recommendation would make real-time interaction impossible.

Our solution was to pre-compute. We analyzed hundreds of genres and tracks through the lens of our four-dimensional model, distilling each into a compact “emotional signature.” The full analysis might consume 50,000 tokens. The pre-computed signature uses roughly 3,000.

At recommendation time, the system performs a fast local scan across these signatures. The heavy psychological lifting happens once, offline. The real-time matching is lightweight directional math. The user gets millisecond-level responses powered by deep psychological modeling.

The Broader Implication

Directional matching represents a shift in how we think about AI and emotion.

Most emotional AI tries to detect what you feel. We think the more valuable challenge is understanding what you need to feel next. Detection is observation. Direction is care.

This philosophy extends beyond music. Any domain where AI interfaces with human emotion — wellness apps, creative tools, conversational agents — faces the same choice: mirror the user, or guide them? Mirrors are safer. Guides are more useful.

We’ve chosen to build a guide. The technology is complex. The idea is simple: good recommendations don’t just understand where you are. They know where you could go, and they find the gentlest path to get there.

Sonome is an AI music platform that translates human stories into emotionally resonant compositions. This article describes the psychological architecture behind our recommendation engine.

Try： www.sonome.online

How Sonome Understands Emotions: A Technical Deep Dive

Xufuji — Sat, 23 May 2026 12:44:42 GMT

At the heart of Sonome lies a deceptively simple question: how do you teach a machine to feel what a human feels? Not just classify text into “happy” or “sad” buckets, but truly grasp the messy, contradictory, beautiful complexity of human emotion. This is the story of how we built the GlobalEmotionAnalyzer.

The Three-Layer Architecture

We designed the system around a hybrid approach that combines the depth of large language models with the reliability of structured engineering.

The top layer is DeepSeek, a powerful LLM that performs true semantic understanding across 18+ languages. We do not ask it to simply label text. Instead, we prompt it to analyze emotional complexity, detect contradictory feelings, and even write a short narrative capturing the emotional atmosphere in the user’s own voice. This is what we call “emotionNarrative” — a one or two sentence description that bridges raw analysis and artistic creation.

The middle layer is our normalization engine. LLMs are creative, which is both their strength and their weakness. They might return “joyful,” “elated,” or “blissful” when we need a single standardized tag. We built a synonym mapping table that funnels 30+ edge labels into 11 core emotions. Then we apply a hard whitelist check. If the model returns something completely unexpected, it falls back to “calm” rather than crash the downstream music pipeline.

The bottom layer is a multi-language keyword fallback system. If the API is unavailable or the user is offline, we can still perform analysis using keyword density matching across Chinese, English, Japanese, Korean, Spanish, Portuguese, and more. This ensures Sonome never goes silent.

Why Eleven Emotions?

We deliberately constrained the output space to eleven standard labels: happy, sad, angry, calm, romantic, energetic, nostalgic, mysterious, anxious, hopeful, and lonely.

Notice that “lonely” stands alone. Many systems would map it to “sad,” but we kept it separate because loneliness and sadness demand different musical languages. Sadness calls for heavy, sinking arrangements. Loneliness needs space, reverb, and quiet echoes. This distinction matters when your end product is music, not just a sentiment score.

Energy Level: The Hidden Dimension

Beyond the primary emotion, we ask the model to classify energy into three levels. Low energy captures the quiet contentment of eating ice cream on a balcony at sunset. High energy captures the raw explosion of winning a championship. Medium energy covers everything in between.

This dimension is crucial because two stories can both be “happy” yet require completely different musical treatments. One needs a gentle acoustic guitar. The other needs a full electronic drop.

Emotional Complexity: Simple, Mixed, or Conflicted

We reject the idea that humans feel one thing at a time. A story can be “sad but peaceful,” “angry yet hopeful,” or “happy with underlying anxiety.” We ask the model to classify this as simple, mixed, or conflicted.

For music generation, this changes everything. A simple emotion gets a straightforward arrangement. A mixed emotion layers secondary feelings as harmonic color. A conflicted emotion might require unconventional chord progressions that hold tension without resolving it.

From Analysis to Music

The final output is not just a label. It is a complete emotional profile: primary emotion, intensity score, energy level, scene keywords, dominant tone, complexity classification, and a narrative description. This profile feeds directly into our music generation pipeline.

The primary emotion determines the musical key and mode. Intensity controls arrangement density. Energy level sets the tempo. Scene keywords trigger sound effects and ambient textures. The narrative description becomes part of the lyric prompt. Together, they transform a user’s story into a personalized soundtrack.

What We Are Still Improving

The current keyword fallback uses simple counting. We are exploring weighted matching where certain words carry stronger emotional signals. The energy classification currently relies on hardcoded keyword lists. We are moving toward a matrix-based system that derives energy from emotion combinations. Scene keyword extraction could benefit from semantic expansion rather than exact string matching. And for longer stories, we want to analyze emotional arcs over time to create music with genuine dynamic structure.

The Philosophy Behind the Code

We believe emotion is not data to be extracted. It is a language to be understood. The GlobalEmotionAnalyzer is not a classifier. It is a translator. It takes the messy, human act of feeling and translates it into a structured vocabulary that creative AI can understand and respond to.

The goal is not perfect accuracy. The goal is genuine resonance. When a user writes about quitting a toxic job, feeling scared but relieved, we want the music to capture that exact emotional texture. Nervous but liberating. Standing at the edge of a cliff you chose to jump from.

That is what Sonome is building. Not just music from text. Music from feeling.

We welcome feedback, suggestions, and wild ideas. If you see ways to improve the emotional modeling, the multilingual handling, or the music mapping logic, we would love to hear from you.

www.sonome.online

Sonome Technical Deep Dive: The “Repetition Engine” Behind Viral Lyrics

Xufuji — Sat, 23 May 2026 08:39:31 GMT

https:// www.sonome.online

1. The Core Insight: Repetition Is Not a Bug, It’s the OS

Most lyric generators treat repetition as a creative choice. We treated it as a system requirement.

The problem we solved: amateur lyric generators produce text where every line is different. This violates the fundamental cognitive mechanism of pop music, which is predict-and-reward. Without repeated hooks, listeners cannot form memory anchors, which means no sing-along, which means no virality.

Our thesis is simple: looping is not decoration. It is the underlying operating system of popular music.

2. Three-Tier Validation Architecture

We built a validation pipeline that enforces repetition at three levels.

P0 is the RepetitionValidator. It enforces chorus internal repeat, 80% plus chorus consistency, Post-Chorus existence, and Hook length under 8 characters. Failure at this level triggers a hard reject and retry.

P1 is the LengthValidator. It enforces 24 to 28 total lines, section length compliance, and tag standardization. This also triggers hard reject and retry on failure.

P2 combines LiteraryScore and ConversationalScore. It checks for cliche avoidance, imagery density, and naturalness. This is a soft threshold, requiring scores of 40 and 50 respectively.

The key design decision is that repetition validation sits above literary quality. A poetic lyric with no hook fails P0. A simple lyric with strong repetition passes.

3. The “Slide-In” Mechanism: Engineering Emotional Continuity

We identified a structural gap most generators ignore: the transition from Verse to Chorus.

The Slide-In Rule states that the last line of Verse N must share at least one core keyword or image with the first line of Chorus N.

This creates what cognitive scientists call fluency. The listener’s brain predicts the chorus before it arrives, triggering dopamine release when the prediction confirms. It feels natural because it is neurologically pre-activated.

Here is an example. In Verse 1, we establish a scene with “the convenience store at 3 AM,” then seed the keyword with “I walk in, like walking into your heart.” In the Chorus, “heart like falling leaves” completes the slide-in because “heart” returns.

4. Post-Chorus: The TikTok Weapon

We made Post-Chorus mandatory, not optional.

The reason is straightforward. Three to five seconds of non-lexical vocables like “La la la” or “Oh-oh-oh,” or core-word repetition, creates what we call earworm space. The brain continues processing the melody during the instrumental gap. Short-video algorithms favor tracks with identifiable 3-second hooks. Our fallback template always includes it, and generators that skip it fail validation.

5. The 80% Chorus Rule: Memory Reinforcement Over “Creativity”

Here is a controversial design choice: the second Chorus must preserve 80% or more of the first Chorus lyrics.

The rationale comes from three angles. Neuroscience tells us memory consolidation requires identical repetition, not variation. Listener behavior shows people sing along to what they recognize, not what surprises them. Commercial reality on TikTok and Spotify shows skip rates spike when the expected chorus does not arrive.

We allow only micro-variations, such as changing “like the wind” to “like stars,” to prevent monotony without breaking recognition.

6. Hook Design: The 6–8 Character Constraint

We enforce a hard rule: the Chorus first line, which is the Hook, must be 6 to 8 Chinese characters or 3 to 5 English words.

This matters because of Miller’s Law, which states working memory holds 7 plus or minus 2 chunks. Shorter phrases have lower motor-planning cost for sing-along. Every Billboard number one chorus hook from 2020 to 2025 fits this constraint.

We actively block anti-patterns. A line like “The convenience store lights still glow at midnight” as a chorus opener is a verse line. It narrates. It does not anchor.

7. From “Survival” to “Explosion”: The Quant Trader’s Mindset

This architecture reflects a deliberate mindset shift.

In versions 1 through 3, our mindset was survival. The goal was do not produce garbage, using basic validation. From version 4 onward, we shifted to controlled explosion. The goal became every output must be structurally viral, with repetition as non-negotiable.

The same risk-control framework we applied to HFT systems, hard stops, position limits, and latency bounds, was translated to creative AI. Hard constraints on structure, soft optimization on style.

8. Technical Stack

The flow works as follows. User story and emotion go into language detection and emotion analysis. Then the System Prompt, which is DeepSeek V4 with embedded repetition rules, drives generation. The output goes through P0 RepetitionValidator, then P1 LengthValidator, then P2 Quality Scorer. If it passes, we output lyrics. If it fails, we send error feedback and retry up to 2 times. The final output uses Suno-compatible tags: Verse 1, Chorus, Post-Chorus, Bridge, and Outro.

We support 20 languages with localized literary terms, using unified English section tags for Suno API compatibility.

9. What This Means for Artists

Sonome does not write better lyrics. It writes structurally sound lyrics, the same way a quant does not predict the market but ensures the strategy survives all market regimes.

The output may not surprise you with poetic genius. It will surprise you with how quickly listeners remember it.

Bottom Line

We replaced the creativity-first lyric generation paradigm with a memory-first paradigm. The machine does not dream. It engineers recall.

https:// www.sonome.online

How I Turned 50 Random Diary Entries into a Personal Album

Xufuji — Mon, 18 May 2026 14:17:35 GMT

Your scattered thoughts already sound like something. You just haven’t heard them yet.

I have a Notes app folder called “3am”. It’s where I dump things I can’t say out loud. Most entries are two sentences. Some are just a time and a feeling.

02:47. Can’t tell if I’m tired or sad.

Rain on AC unit. Smells like my grandmother’s house.

She said “we need to talk” and I said “ok” and now I’m here.

I never reread them. What’s the point? They’re not stories. They’re not poems. They’re just residue.

But last month I had an idea. What if these fragments weren’t dead weight? What if they were raw material?

I decided to turn 50 of them into songs. Not by writing music myself — I can’t play a chord. By feeding them into an AI that builds songs from stories.

This is how I accidentally made the most personal album I’ve ever heard.

Step 1: The Archaeology

I scrolled through two years of notes. The rule was simple: pick whatever made me pause. Not the most dramatic. Not the most literary. The ones where I remembered the temperature of the room.

Some were bizarrely mundane. The coffee shop changed their playlist. New one was too happy. I left.

Some were fragments of grief I never processed. Dad’s voicemail still works. I call sometimes just to hear the greeting.

Some were just observations. Two people on the subway, both wearing AirPods, both crying. Different stops.

I ended up with 53 entries. I cut 3 that felt too raw to share with anything, even an algorithm. That left 50.

Step 2: The Translation

Here’s what I learned. AI doesn’t need your diary entry verbatim. It needs the scene behind it.

So I rewrote each fragment into a short story. Four to eight sentences. Each sentence is one line. Just enough to build a room, a body, a sound, a consequence.

This was an original entry:

02:47. Can’t tell if I’m tired or sad.

This is what I fed Sonome:

It’s 2:47 AM.

I’m staring at the ceiling.

The ceiling fan is clicking.

I should sleep but I don’t know if I’m exhausted or if something is wrong that I can’t name.

The rain started ten minutes ago.

I didn’t notice until the AC unit began rattling.

Now the two sounds are fighting and neither is winning.

I’m still awake.

That’s it. Eight lines. One story. Paste and generate.

Step 3: The Factory

I used Sonome for this. Full disclosure: I built it. But I built it because I needed exactly this tool and it didn’t exist.

The process per entry is one step. Paste the story into the text box. Hit generate. Wait about 90 seconds. Get back a song with lyrics, vocals, instrumentation.

I did this in five sessions of ten entries each. Why not all at once? Because listening to ten AI-generated songs about your own private grief is a lot. I needed walks between sessions.

What surprised me:

The AI invented details I hadn’t written. One entry about my childhood kitchen included a line about the clock above the stove that never worked. We did have that clock. I never mentioned it. The AI inferred it from the atmosphere.

It caught tone shifts I didn’t explicitly flag. An entry that started nostalgic and turned bitter became a song that started in major key and modulated to minor at the second verse. I didn’t ask for that.

Some songs were too accurate. I skipped one about my father’s voicemail because the AI made the vocalist sound too much like him. Technology shouldn’t be able to do that yet.

Step 4: The Curation

50 songs. About three hours of music. I needed to cut this down to something listenable.

I kept the ones where I forgot it was AI.

I kept the ones that made me feel seen.

I kept the ones with a lyric I wanted to quote.

I cut the ones that sounded AI-generic.

I cut the ones that made me feel exposed in a bad way.

I cut the ones where the chorus was trying too hard.

I ended up with 12 songs. A tight album.

The cut songs weren’t failures. They were drafts. I kept them in a folder called B-Sides because some of them might make sense in a year when I’m a different person.

Step 5: The Sequence

This was unexpectedly hard. I had to arrange 12 songs about my own life into an order that didn’t feel narcissistic.

What worked: chronological by emotional season, not by calendar date.

Ceiling Fan. Insomnia, not knowing what’s wrong.

Grandmother’s Kitchen. Nostalgia that turns into grief.

The Coffee Shop Left. Minor rejections, daily alienation.

Two People Crying. Recognizing strangers, recognizing yourself.

The Bench. Waiting for an ending.

Voicemail. The one about my father. I put it in the middle, not the end, because grief doesn’t resolve.

Airport at 5AM. Leaving, not arriving.

New Apartment Echo. Empty rooms, new beginnings that feel like endings.

Rain on AC Unit. The same entry as track two, but three years later, different weather.

Sunday Night Dread. The specific anxiety of Sunday at 11 PM.

The Last Shift. Working a job that doesn’t see you.

Invisible Rain. The subway man, who I realized was also me.

I called the album 3am after the Notes folder.

What It Sounds Like

I’m not going to describe the music in adjectives. Here’s what I’ll say.

I played it for a friend. She listened to the whole thing without speaking. At track 8 she said: This is you? I didn’t know you had this in me.

I said: I didn’t either. I just had the notes.

The Unexpected Part

I thought this would be a fun experiment. A blog post. A look what AI can do now demo.

Instead I found myself listening to my own album on repeat. Not because it’s perfect. Some transitions are jarring. Some lyrics are clunky. But because it’s the first time my unspoken thoughts have had harmony. They’ve had rhythm. They’ve existed outside my head in a form someone else can witness.

That’s not about AI. That’s about being heard. Even if the listener is yourself, three years later, finally understanding what you were trying to say.

How You Can Do This

You don’t need 50 diary entries. You need one moment you can’t forget.

Maybe it’s the text you didn’t send.

Maybe it’s the room you sat in after the phone call.

Maybe it’s the sound that made you realize something was over.

Write it down. Not as a poem. As a scene. What was the light? What was the temperature? What did your hands do?

Four to eight sentences. Each sentence on its own line. That’s all Sonome needs.

Paste it in. Hit generate. One step.

See what comes back.

It might be generic. It might be too accurate. It might be the first time you’ve heard yourself clearly.

Your Turn

I made one song from this album public. It’s the one about the subway. If you want to hear what an unwitnessed breakdown sounds like when it gets instruments:

sonome.online/s/VCO9foY0

The rest stay in my private playlist. Because some 3am thoughts are only for the person who wrote them.

But yours? Your Notes app, your journal pages, your scattered scraps of almost-feelings?

They’re already an album. You just haven’t pressed play yet.

www.sonome.online

Turn your story into a song. No musical skill required. Just something you need to hear outside your own head.

This article is part of the Invisible Rain series. Follow for the next one.

Invisible Rain

Xufuji — Mon, 18 May 2026 13:29:51 GMT

This morning on the subway, a man sat across from me. Thirty-ish, in a crisp suit, staring at his phone. Suddenly, tears started streaming down his face. Not sobbing — just silent, relentless tears. He fought them hard, wiping with the back of his hand, pretending to rub his eyes.

No one looked. Including me. The train pulled into his stop. He stood up, took a deep breath, and walked out with his back perfectly straight, like nothing had happened.

I don’t know what he was carrying. A layoff email. A doctor’s report. The weight of pretending to be okay for too long. All I know is that in this city, there are thousands like him — falling apart where no one can see, then standing up and walking on.

I couldn’t stop thinking about him. So I wrote it down. Then I realized — I didn’t just want to write it. I wanted to hear it. The hum of the train. The silence between stations. The sound of someone holding themselves together.

That’s when I used Sonome.

I pasted the story into a text box. Three minutes later, I had a song. Not a generic background track — something that actually felt like that subway ride. The cold fluorescent lights. The steel wheels. The moment when a stranger’s composure cracks, and nobody says a word.

The lyrics caught me off guard:

“You fall apart in morning trains and walk out clean.”

Exactly. That’s what he did. That’s what we all do.

What Sonome Actually Does

You write a story — a memory, a moment, something you can’t stop thinking about — and Sonome’s AI builds a song around it. It reads the emotional texture, not just the words. It knows the difference between “sad” and “numb,” between “lonely” and “resigned.”

The song it made for this story is called “Thismornin” (the system auto-truncated the title, which somehow fits the theme of things being cut short). Style tag: healing. Length: 3:26. Just long enough for a subway ride between two stations.

Here’s the link if you want to hear what an invisible breakdown sounds like when it becomes music:

sonome.online/s/VCO9foY0

Why This Matters

We live in an era where everyone is performing okay. Social media is a highlight reel. Work Slack is a performance. Even our Spotify playlists are curated to project a mood rather than process one.

But some moments don’t want to be curated. They want to be witnessed. Even if the only witness is an AI that turns your words into a melody you can cry to, alone, on a train.

Sonome doesn’t fix anything. It doesn’t give you advice, or toxic positivity, or a meditation app subscription. It just listens to what you wrote, and gives it back to you as sound. Sometimes that’s enough.

Try It

You have a story like this. Everyone does.

The email you didn’t send. The conversation that ended wrong. The person you passed on the street who looked like someone you used to know. The morning you sat in your car for ten minutes before walking into the office, because you weren’t ready to perform “fine” yet.

Write it down. Paste it into Sonome. See what comes back.

It might surprise you. It might be too accurate. It might be the first time that moment has been truly heard.

www.sonome.online

The man on the subway never knew I saw him. But now there’s a song about him. And maybe, in some small way, that means he was seen after all.