The Concept
As a musician, I view a playlist not just as a list of files, but as a narrative arc. As a data scientist, I see it as a time-series of feature vectors.
Traditional recommendation systems (like Spotify's algorithms) act as archivists: they dig through a finite catalog to find an existing song that mathematically resembles what you just heard. But what if the perfect next song doesn't exist yet?
GenRec4Music explores a paradigm shift from retrieval to creation. Instead of asking "What should I play next?", it asks: "What should I compose next?" It leverages generative AI (Suno) not to replace the artist, but to create a personalized, ephemeral soundscape tailored to the user's immediate listening context.
The Architecture: Translating "Vibe" into Code
The core challenge of this project isn't just generating music—it's translation. How do we convert a user's abstract listening history into a precise creative brief for an AI?
1. The Input: Listening as a Signal
We start with the user's recent history (e.g., the last 5 tracks). But we don't just look at metadata like "Genre: Rock." We need to understand the DNA of the sound.
2. The Decoder: Music4All-Onion Dataset
To bridge the gap between raw audio and text prompts, I utilize the Music4All-Onion dataset. This acts as the system's "Rosetta Stone." It allows me to map a track's ID to deep, pre-computed features:
- Acoustic Features: Mapping low-level signals (MFCC, Spectral Contrast) to textural descriptors (e.g., "lo-fi," "punchy," "distorted").
- Affective Features: Using Valence/Arousal scores to plot the emotional trajectory of the session (e.g., detecting a shift from "Energetic" to "Melancholic").
3. The Prompt Engineer (The Translation Layer)
A custom Python pipeline aggregates these features into a "Vibe Vector." This vector is then procedurally translated into a rich semantic prompt.
- Input: A user listening to Shoegaze and Post-Rock.
- System Translation: "Genre: Dream Pop. Atmosphere: Ethereal, Wall of Sound. Instruments: Heavily reverbed guitars, buried vocals. Tempo: 85 BPM. Mood: Introspective."
- Action: This prompt is fed into the Suno API.
4. The Output: Generative Audio
The system generates a brand new, 30-to-60 second track that continues the sonic thread of the user's session—a song that never existed until that specific moment.
The Evaluation: Measuring the Ghost in the Machine
How do you validate a song that has no "ground truth"? I employ a dual-metric approach to ensure the system isn't just hallucinating noise:
- Sonic Fidelity (The Vibe Check): I use CLAP (Contrastive Language-Audio Pretraining) embeddings to calculate the cosine similarity between the generated track and the user's historical preferences. This measures stylistic coherence.
- Lyrical/Thematic Alignment: An LLM-based evaluator analyzes the generated lyrics (if any) to ensure they match the sentiment extracted from the user's history (e.g., ensuring a "Sad" session doesn't generate "Happy" lyrics).
Why This Matters
We are moving from an era of Consumption to an era of Co-Creation.
This project is my attempt to merge the two sides of my identity. It treats music data not as cold statistics, but as a blueprint for expression. It imagines a future where the algorithm is no longer a gatekeeper deciding what you hear, but a collaborator helping you articulate a feeling through sound.
(Links to GitHub repo & Demo - To Be Continued)
