Spotify has bigger plans for the technology behind its new AI DJ feature after seeing positive consumer reaction to the new feature. Launched just before the company’s Stream On event in LA last week, the AI DJ will curate a personalized selection of music combined with spoken commentary delivered in a realistic-sounding, AI-generated voice. But under the hood, the feature leverages the latest AI technologies and big language models, as well as generative language – all of which will be built on top of Spotify’s existing investments in personalization and machine learning.
These new tools don’t necessarily have to be limited to a single feature, Spotify believes, which is why it’s now experimenting with other uses of the technology.
Although the highlight of Spotify’s Stream On event was the mobile app’s overhaul, which now focuses on TikTok-like Discovery feeds for music, podcasts and audiobooks, the AI DJ is now a key part of the streaming service’s new experience. Introduced to Spotify’s premium subscribers in the US and Canada in late February, the DJ aims to get to know users well enough to play whatever you want to hear with the press of a button.
With the app’s overhaul, the DJ appears at the top of the screen under the music subfeed for subscribers, serving as both a relaxing way to stream your favorite music and a means of getting free users to upgrade.
To create the commentary that accompanies the music the DJ is streaming, Spotify says it used the knowledge base and insights of its own in-house music experts. Using OpenAI’s generative AI technology, the DJ can then scale their commentary to the end users of the app. And unlike ChatGPT, which attempts to find answers by distilling information from the wider internet, Spotify’s more limited database of musical knowledge ensures that the DJ’s commentary is both relevant and accurate.
The actual music selections that the DJ chooses come from their existing understanding of a user’s preferences and interests, and reflect what would previously have been programmed into personalized playlists such as Discover Weekly and others.
The AI DJ’s voice, meanwhile, was generated using technology Spotify acquired from Sonatic last year and is based on the voice used by Spotify’s Head of Cultural Partnerships Xavier “X” Jernigan, the host of Spotify’s now-defunct morning show podcast “The get-up”. Surprisingly, the voice sounds incredibly realistic and not robotic at all. (During Spotify’s live event, Jernigan spoke alongside his AI double, and the differences were hard to tell. “I can listen to my voice all day,” he joked).
“The reason it sounds so good – that’s actually the goal of Sonatic technology, the team that we acquired. It’s about the emotion in the voice,” Spotify’s head of personalization, Ziad Sultan, told TechCrunch after Stream On wrapped. “When you listen to the AI DJ, you will hear where the pause for breathing is. You hear the different intonations. You hear enthusiasm for certain types of genres,” he says.
A natural-sounding AI voice isn’t new, of course – Google wowed the world years ago with its own human-sounding AI creation. However, the implementation within Duplex drew criticism as the AI dialed companies on behalf of the end user, initially without disclosing that it was not a real person. There shouldn’t be any similar concerns with Spotify’s feature, even calling it “AI DJ”.
To make Spotify’s AI voice sound natural, Jernigan went into the studio to produce high-quality voice recordings while working with voice technology experts. There he was instructed to read different lines with different emotions, which are then fed into the AI model. Spotify wouldn’t say how long this process will take or detail the specifics, noting that the technology is evolving and calling it their “secret sauce.”
“From this high quality input, which has many different permutations, [Jernigan] Then you don’t have to say anything anymore – now it’s purely AI-generated,” says Sultan about the generated voice. Despite this, Jernigan sometimes shows up in Spotify’s writers’ room to provide feedback on how he read a line to ensure he continues to have input.
Photo credit: Spotify screenshot
But while the AI DJ is built using a combination of Sonantic and OpenAI technology, Spotify is also investing in internal research to better understand the latest AI and big language models.
“We have a research team working on the latest language models,” Sultan tells TechCrunch. In fact, a few hundred are working on personalization and machine learning. In the case of the AI DJ, the team uses the OpenAI model, Sultan notes. “But in general we have a large research team that understands all the possibilities of large language models, generative voice and personalization. It’s quick,” he says. “We want to be known for our AI expertise.”
However, Spotify may or may not use its own internal AI technology to power future developments. It may decide that it makes more sense to work with a partner, as is now the case with OpenAI. But it’s too early to tell.
“We publish articles all the time,” says Sultan. “We will invest in the latest technologies – as you can imagine, LLMs are one such technology in this industry. So we will continue to develop the know-how.”
With this fundamental technology, Spotify can move into other areas with AI, LLMs and Generative AI technology. The company does not yet want to say which areas these could be in terms of consumer products. (We’ve heard that a ChatGPT-like chatbot is among the options being experimented with. But nothing is decided on launch as it’s one experiment among many).
“We haven’t announced any specific plans as to when we might expand to new markets, new languages, etc. But it’s a technology that’s a platform. We can make it and hope to share more as development progresses,” says Sultan.
According to Spotify, early consumer feedback for AI is promising
The company didn’t want to develop a full suite of AI products because it wasn’t sure how consumers would react to the DJ. Would people want an AI DJ? Would you agree to the feature? None of this was clear. After all, Spotify’s voice assistant (“Hey Spotify”) had been shut down due to lack of acceptance.
But there were early signs that the DJ feature could do well. Spotify had tested the product internally among employees before launch, and usage and re-engagement metrics were “very, very good.”
Public acceptance so far is in line with what Spotify has seen internally, Sultan says. This means there is potential to develop future products based on the same underlying fundamentals.
“People spend hours a day with this product… it helps them choose, it helps them discover, it tells them the next music to listen to and why… so the reaction – if you check different social media, you I’ll see that it’s very positive and emotional,” says Sultan.
Additionally, Spotify shared that users spent 25% of their time listening to the DJ on the days they tuned in, and that more than half of first-time listeners use the feature again the very next day. These metrics are early days, however, as the feature isn’t 100% rolled out in the US and Canada yet. But they show promise, the company believes.
“I think it’s an amazing step in building a relationship between truly valuable products and users,” says Sultan. However, he warns that the challenge will be “finding the right application and then building it correctly”.
“In this case, we said this is an AI DJ for music. That’s why we created the author’s room. We put it in the hands of users so that it does exactly what it is intended to do. It works great. But it’s definitely fun to dream about what else we could do and how soon we could do it,” he adds.