I Asked Generative AI Music Platform Suno to Write a Podcast Theme…Things Got Weird
AI, Music, and Coherence: What Are We Trying To Accomplish?
AI, Music, and Coherence: What Are We Trying To Accomplish?
In my last piece “The Transparent Artist: Music in the Age of AI Disassembly” I explore the threats AI tools, and generative AI music pose not only for artist but for society. A reader left a comment and asked if I had tried Suno.ai which is a music making generative AI engine. I had not, so I headed over to Suno to check it out.
From Suno.ai’s about us page:
Suno is building a future where anyone can make great music. Whether you’re a shower singer or a charting artist, we break barriers between you and the song you dream of making. No instrument needed, just imagination. From your mind to music.
I am inclined to write a 3,000 word think piece just on that, but I’ll spare you for now and focus on my original intent: asking Suno to make me a theme song for a hypothetical podcast — a technology podcast by two Silicon Valley dudes in their 20s — and analyze the result.
Before we dive in, I read Suno’s terms of service, and created a pro account so I could have the ability to share this song with you as we explore it’s weirdness.
Inspired by an episode of the podcast Candy Ears titled “AI Experiment #4: Theme Songs with Charlie Harding,” where they created hypothetical podcasts and had Suno generate theme songs, I decided to conduct my own experiment. In the Candy Ears episode, Charlie Harding provided a critique of each AI-generated song. Following a similar framework, I created a hypothetical podcast and asked Suno to compose a theme song. Here’s what I came up with:
Unplugged Minds
The prompt: A 30 second theme song for a technology podcast by two Silicon Valley dudes in their 20's.
Lyrics:
[Verse]
We’re the kings of code, the masters of bytes
In the heart of Silicon Valley, we reach new heights (oh yeah)
Tech is our gospel, innovation our muse
We’re the dynamic duo, ready to make the news (oooh)
[Verse 2]
From gadgets to AI, we explore it all
Unraveling the mysteries, breaking down the walls (yeah)
Join us on this journey, as we unravel the tech
We’re the voices in your ear, come on, hit play and check (ooh-yeah)
[Chorus]
Welcome to the Unplugged Minds (ooh-yeah)
Where ideas collide, sparks ignite (oh-oh-oh)
We’re the Silicon Valley dudes, here to guide (ooh yeah)
Through the world of tech, side by side (ooh-yeah)
Ooh yeah…
To begin my analysis I gave the song a listen from start to end.
On my first listen I thought, huh, this melodic phrasing is kind of different and interesting, and the form seems a little non-traditional, but it seems to be the style of a dance-pop tune. So, I flipped over a research paper I was reading and decided to sketch it out.
I used to be a professional drummer, and this is the type of chart I would make while sitting on the couch in a studio listening to the demo before I headed into the tracking room to lay down a take. It was the act of making this chart when I realized that something was a little off; a tad bit weird you could say.
Weirdness #1
The song, firmly rooted in C-minor, begins with a two-beat melodic pickup leading into VERSE 1. This pickup consists of four descending 8th-notes, starting on an Eb and then landing on a G which is the 5th of the of the opening C-minor chord. The basic chord progression for the song follows a i-VI-vii-i pattern.
As I sketched the first VERSE according to the provided lyrics, I noticed an awkwardness in its structure. Suno did not use the very common and standard 8 or 16 measure VERSE form. The 1st VERSE is an awkward 12-measures long. To my ear, the final 4-measures of the first VERSE sounds like a tag you might find at the end of a BRIDGE. However, we’re not at the BRIDGE yet — in fact, we’ll never reach a BRIDGE in this song. We’re still in the 1st VERSE.
The 12-measure VERSE gave me pause. I had to think about this weird construction. After a few more listens through the whole piece, I realized that Suno most likely confused itself with the initial 8th-note melodic descending pickup before the start of the first VERSE. It appears to me that Suno conceptualized those first four 8th-notes as if they were four whole measures, making the awkward 12-measure 1st-VERSE a more standard 16-measure VERSE.
Could this be the case?
In a typical song, you have an INTRO, followed by a VERSE, then a second VERSE, a (PRE)CHORUS, and finally, a CHORUS with variations added along the way such as a BRIDGE, a SOLO section, an OUTRO, etc. In our culture of immediacy, the (PRE)CHORUS is often skipped, and the song goes straight into the CHORUS. More on that evolution here.
To recap: the first part of a typical song structure is:
INTRO-V1-V2-(PRE)CHORUS-CHORUS…
Now, let’s take a closer look at how Suno deviates from this structure in the first verse:
[Verse]
We’re the kings of code, the masters of bytes
In the heart of Silicon Valley, we reach new heights (oh yeah)
Tech is our gospel, innovation our muse
We’re the dynamic duo, ready to make the news (oooh)
As mentioned above, Suno appears confused because it takes what should be the 1st stanza of the VERSE and jams “We’re the kings of” into the 8th-note melodic pick-up. But Suno further complicates this because at the start of measure 9, it decides to drop a four-on-the-floor kick.
Adding the drums here is a big mistake as it makes it feel like we are starting VERSE-2 but we are not! Measure 9 is the continuation of the first VERSE!
Adding one more level of analysis to this, in measure 3 the Ab-major chord does not last all four beats. The Eb major of the fourth measure is anticipated, which is not weird in itself, however this is never repeated again for the entire song. This lends validity to my claim that Suno is confused by its use of the 8th note pick up into the first measure and is treating these first four measures as an intro rather than part of the 1st VERSE.
This is a total mess! It’s not a delightful kind of weird, but rather an awkward and disjointed weirdness.
Nine measures into this song — 14-seconds of the 120-second piece, and Suno has created multiple incoherences.
Weirdness #2
I’ve already mentioned that Suno goes sideways in measure 9. The introduction of the four-on-the-floor beat indicates to most humans we have entered VERSE 2. However, the lyrics, ‘We’re the dynamic duo, ready to make the news (oooh),’ is the last line of the 1st VERSE !
When we finally get to VERSE 2 another oddity in Suno’s composition is revealed. The melody line shifts up a major 3rd, with the range expanding from Bb below middle-C to the Eb, to now the Eb to G above middle-C. But the melody reaches all the way up to the Bb an octave higher than we saw in the 1st VERSE. These changes in melody and range screams CHORUS to me.
Here is the song and lyrics again for a refresher. I added the main anchor notes for the melody in VERSE 2 for reference.
[Verse 2]
From (G) gadgets to AI (Eb), we (F) explore it all (Bb — octave)
Un(F)raveling(G) the mysteries (Eb), breaking (F) down the walls (Eb) (yeah)
Join us on this journey, as we unravel the tech
We’re the voices in your ear, come on, hit play and check (ooh-yeah)
I gotta say, this is just weird, man. The combination of the melody, range, and drum groove in the context of the song thus far — <shakes head> — it’s just terribly wrong. This unexpected shift is yet another example of Suno’s struggle to create a coherent song structure.
Weirdness #3
We have now finally left VERSE world and have entered the CHORUS. But something weird happens again, which should not surprise you at this point. The drums vanish from the CHORUS!
WHAT?!
Ok, but, there is more. Suno decides to repeat the intro 8th-note melodic pick up, with a slight alteration in the melody, like this: F-G-F-Eb-D-D-D. Where that D is now acting as the 5th of the G-minor (see chart above for reference).
[Chorus]
Wel(F)come(G) to(F) the(Eb) Un(D)plugged(D) Minds (D) (ooh-yeah)
Where ideas collide, sparks ignite (oh-oh-oh)
We’re the Silicon Valley dudes, here to guide (ooh yeah)
Through the world of tech, side by side (ooh-yeah)
But those vanishing drums! The feel, the groove; what has happened here?! It’s just all wrong.
My body, my mind, my entire being — the 25 years of professional studio and touring work, the ~40 years of intensive listening to music, all scream at me: this is a (PRE)CHORUS! Only a disembodied entity that has never experienced the movement of a body to the rhythm of a beat could make a decision like this.
You can see if you scroll up to the chart I made, that I wrote Half-X(?) at the start of the CHORUS. Meaning that it feels like we are going into a half-time feel for the chorus which is really counterintuitive and not typical of the vast majority of pop songs. Going from an at-tempo feel with back beat on 2+4, right into half-time with no drums for the CHORUS is not something you hear often. I am sure it exists though.
I searched Google, Youtube, then a variety of AI platforms asking if they could identify a pop song that has this sort of song construction. Every example they provided, once I listened, did not in fact contain this type of song construction. After checking the 5th example given by one of these generative AI chat bots I stopped because each one was wrong.
Do you know of a pop song that follows this formula that Suno used? If so, drop it in the comments as I want to give it a listen!
Weirdness #3.5: Lyrical Oddities
Let’s take a brief aside and look at the lyrics really quick:
Lyrics:
[Verse]
We’re the kings of code, the masters of bytes
In the heart of Silicon Valley, we reach new heights (oh yeah)
Tech is our gospel, innovation our muse
We’re the dynamic duo, ready to make the news (oooh)
[Verse 2]
From gadgets to AI, we explore it all
Unraveling the mysteries, breaking down the walls (yeah)
Join us on this journey, as we unravel the tech
We’re the voices in your ear, come on, hit play and check (ooh-yeah)
[Chorus]
Welcome to the Unplugged Minds (ooh-yeah)
Where ideas collide, sparks ignite (oh-oh-oh)
We’re the Silicon Valley dudes, here to guide (ooh yeah)
Through the world of tech, side by side (ooh-yeah)
So it didn’t hit me until now, but “Unplugged Minds” seems like the opposite of what we want from two dudes bringing you a podcast on the plugged-in world of technology and AI!
Also, “hit play and check;” ‘check’ what?!
The rhyme scheme is repeated in each stanza except “Minds,” and “ignite” in the CHORUS. That is the first and only time the rhyme scheme changes. Is that intentional? Is that good, bad, weird? I am non-pulsed by it.
Overall, these lyrics are cheesy, which can be an aesthetically appropriate choice if that is what the podcast hosts are going for, a tongue-in-cheek expression of their tech podcast. But as a composer, I would initially go with a more literal, serious interpretation, or playful, rather than tongue-in-cheek humor.
These lyrics contribute to the overall weirdness of the song and further highlights Suno’s struggle to create a coherent and intuitive song.
Weirdness #4
OK, let’s talk about this INSTRUMENTAL-OUTRO.
Here is the song one more time so you don’t have to scroll up to listen again. The INSTRUMENTAL-OUTRO starts at 47 seconds:
Now that we’ve refreshed our memory of the outro, let’s examine why it doesn’t quite fit with the rest of the song.
First of all, I asked for a theme song, and making a third of the song an instrumental outro/loop is a poor compositional choice. Additionally, there is a style mismatch.
If we ignore the particulars of the song and focus on the overall-vibe, we can broadly classify the vibe of this song as dance pop (Suno classifies it as electronic upbeat pop). However, the style/vibe of this instrumental section complicates this as it features a repetitive, umpa umpa umpa electro-esque vibe that feels out of place in this type of dance-pop song.
On a style/vibe level, this instrumental outro does not match the rest of the piece. It feels like an afterthought rather than a coherent musical idea, making it a bizarre and unsatisfying conclusion to the song.
Weirdness #5
OK, this brings us to the final weirdness I am calling out. That is the tempo!
The tempo is 148 bpm. Those of you who are DJs or work in composing dance music should perk up at this tempo choice. 148 bpm combined with the four-on-the-floor style is unusually fast for a dance-pop song. Typically, dance-pop songs range from 115 to 130 bpm, with some exceptions reaching up to 140 bpm.
To check myself on this, I searched for dance-pop songs at 148 bpm on various music platforms such as Beatport, and I couldn’t find any. In fact, I was not able to find any dance-pop song at this tempo, because it is weird.
I even thought, maybe, just maybe you can argue this song falls into the style of dance known as hardstyle. The problems with associating this song with hardstyle stem from the fact that hardstyle starts at 150 bpm and above, and the actual timbres of the synths used are not the ones you hear in a dance-pop song like this.
So, overall, a very weird tempo choice. It is not something a human would typically do when composing in this style!
Closing thoughts
The main takeaway from this experience is that generative AI text-to-music engines are not there yet. Once you start drilling down into the song construction, you immediately begin to unearth the incoherencies in the songs.
This reminds me of a similar feeling from my teenage years. You remember that period, when you think you know everything, and you are like, ‘this is easy, I can do that, what is the big deal?!’ Then you are tasked with actually doing that, and as soon as you are faced with doing it for real, you get that sinking feeling in your stomach. That feeling, it is a good feeling. That is what makes the world a better place. The unfortunate situation is, generative AI does not have feelings.