Don’t blame Zack Zalon for all of the job losses at iHeartMedia earlier this year.
Fingers began pointing Zalon’s way after the radio broadcaster implemented a technological shift to artificial intelligence to help its radio station clusters operate more efficiently. Subsequently, a large number of iHeart employees were let go.
Zalon is CEO and co-founder of Super Hi-Fi, an AI company that designs digital music solutions for the iHeartRadio streaming platform. That relationship drew scrutiny from some radio industry observers who speculated the broadcast giant’s infrastructure overhaul included the use of Super Hi-Fi’s MagicStitch technology, an “audio stitching” program capable of creating “human-like” segues between online music tracks in playlists.
“We are dealing only with the iHeartRadio streaming people,” Zalon said. “We are working on the innovation side, which is streaming-based. Not terrestrial radio.”
iHeartMedia’s massive reorganization included the creation of AI-enabled Centers of Excellence, according to a company press release at the time. The broadcaster pointed to the improvement of its technology backbone, in addition to strategic technology and platform acquisitions like Jelli, a programmatic ad platform: RadioJar, a cloud audio playout company; and Stuff Media, a podcasting firm.
Super Hi-Fi was not mentioned by name in the iHeartMedia announcement.
Bridging a Gap
“We are working with iHeart on a very deep level to bridge that gap between broadcast and digital. There is a lot of roadmap stuff to improve the audio experience,” Zalon told Radio World.
The radio business “seems like an underdog right now,” he said, “when actually radio is still the number one form of music consumption in America. Radio has a lot of great experiences and resources.”
However, it seems “broadcasters just don’t know how to view streaming and whether it is a threat or not. And streaming media people think radio is old technology and not all that valuable,” he said.
Zalon says broadcasters and media companies have been reaching out to him during the COVID-19 pandemic in search of opportunities to add efficiencies to technical operations via Super Hi-Fi’s technology platform.
“Broadcasters are searching for a way forward that brings together broadcast and digital and drives revenue and loyalty. Broadcasters have been talking to us about inserting our technology into the broadcast stack for the purposes of efficiency. And when I say efficiency I mean using the resources they could free up for the artistry of radio. Focusing on the curation, the production and the human voice, which makes radio so effective,” Zalon said.
Broadcasters are realizing, Zalon said, that some broadcast technology could be more efficient if AI assisted them with things like placement decisions in their automation.
“Programmers are just lining things up in automation systems really, and that isn’t necessary anymore when AI can do it for you automatically. AI can make a lot of presentation decisions,” Zalon said.
“But AI isn’t a job killer. There hasn’t been a single service we have integrated into, iHeart included, that hasn’t utilized more human resources after figuring this out. When streaming audio works you need more people to curate music. You need more people to work with advertisers to inject commercials in the system. And produce those commercials.”
[Related: “Is Artificial Intelligence Friend or Foe?”]
AI is not a replacement for people yet, he said, but an “enabler of human capabilities that has never existed before.” But Zalon does envision a day when computer-generated voices sound as real as a human voice and pop up on iHeartRadio streams.
Where the Energy Is
Zalon said Super Hi-Fi’s primary focus remains enabling new audio streaming experiences and bridging the gap between what he thinks are “silos of broadcast radio and digital” that haven’t been bridged.
“We want to enhance experiences by taking the concepts of broadcast and engineering solutions. Steaming audio is where it’s going. Streaming media is fantastic. The sound quality is incredible. The personalization options are amazing. That is where all the energy is moving toward. We are interested in bridging the silos. Radio services will ultimately all be streaming when 5G is in the car.
“And when 5G is in the car what will be the point of connecting to a broadcast tower? Streaming is a technology not a technique. As technology evolves we think the technique should evolve as well. I think broadcasters are beginning to recognize that,” he said.
Zalon’s background is steeped in digital music experience, including building one of the earliest consumer digital music platforms, Radio Free Virgin, which was part of Richard Branson’s Virgin Group. At other points he has helped launch and design digital music services for CBS Radio, Sony Music, AOL Radio, Muve Music and Yahoo Launchcast.
Zalon handles the strategic direction of Super Hi-Fi, which he launched in 2018 with co-founder and Chief Technology Officer Brendon Cassidy. The AI company, based in Los Angeles, works with a variety of companies and has about 35 employees.
Digital music streaming’s lack of flow and production quality has always been an issue, Zalon said, with too many dead gaps in the music and a lack of emotion.
Super Hi-Fi and iHeartRadio announced its partnership in 2018 with a goal of creating intelligent audio transitions in the iHeartRadio app. MagicStitch is also deployed by Peloton and the recently launched Sonos Radio. And it just announced a partnership with Octave Group, which provides retail music entertainment in locations like Starbucks.
The patented MagicStitch system adds things like transitions, sonic leveling and gapless playback to the iHeartRadio digital stream, Zalon said.
“Radio is our inspiration. And I think one day radio owners will realize they hold the keys to digital listening experiences. They just haven’t activated them correctly. They have not seen them as assets but instead as liabilities. We see that totally the other way around,” Zalon said.
“Radio broadcasters have the tools and experience to create these incredible professional-sounding broadcast streams to make the digital music experience exciting. They have the tools to make the digital media experience stickier and more valuable than what is in the marketplace right now.”
Personalized and Scaleable
Super Hi-Fi has developed a technology that can deliver that vision, Zalon said, via MagicStitch and its ability to be more than just a playlist with long gaps of silence.
The AI system consists of a layer of cloud services, APIs and components/reference implementations for major mobile and desktop environments, according to a press release. The results are personalized and scalable listening experiences (see sidebar at end of this article).
MagicStitch, to borrow a broadcast term, takes the dead air out of audio streaming, Zalon said during a recent demonstration of the digital platform. The technology “stitches” together transitions between songs as if done by a real human DJ.
“Our research is focused on understanding audio content to the same depth as a human. When we were building CBS Radio’s digital platform, we all thought the gaps in the music were terrible. Pandora was around at the time. They all sounded the same if you close your eyes. We thought what if we were to smartly use radio techniques to stitch songs together to improve the experience. Then we started thinking about segues and how many of different combinations there could be and how to that figure out algorithmically.
“Well, we soon figured out it wasn’t possible at that time. The number of segue calculations were literally in the trillions. So went on building these music services but they still didn’t sound quite right.”
Zalon said he and Dawson realized it was impossible to write enough algorithms to solve the segue problem and instead began to focus on training artificial intelligence to do what radio DJs do. “For the AI to be smart enough to have the dexterity of a trained human DJ,” he said.
“Our belief is that it’s the techniques of radio, the music transitions, the voice branding and all of those other elements of radio that makes the digital product stand out.”
Music services like Spotify and Apple Music use a “cross-fade” function to help cut down on the gaps between tracks, Zalon says, but the problem is the platforms still don’t recognize the subtleness of the human touch.
“It’s not all mechanical. MagicStitch in real time calculates what it thinks is the perfect segue for any two tracks you might play back to back in a playlist. And uniquely for those two songs. MagicStitch reaches back to our cloud server and gets back the proper instruction and then aligns it down to the correct thousandth of a second. It considers rhythmic elements and lets the previous song play out the right way. Whatever it takes to make it sound radio worthy,” Zalon said.
However, MagicStitch does more than segues, Zalon says; it can also brand the digital stream much like radio does with the human voice.
“Music transition is the core of what we do. The next step was training MagicStitch to understand branding elements and the human voice with that same level of depth. It uses radio techniques like interview snippets that don’t step all over the music in an inappropriate way to build a personality into a streaming service,” he said. “Now we can assign the branding component based on listener preferences and interject voice them like broadcast radio does.”
MagicStitch can layer multiple elements into the stream, such as audio liners, commercials and branding messages, he said.
“It’s capable of delivering a seamless layered stream experience to a smart speaker,” Zalon said.
And the AI system gets smarter each time it performs a song segue, Zalon said. “The platform has a feedback loop so it is digesting a lot of machine learning advances all the time and understanding content better. So as the data grows and the more calculations you add MagicStitch can represent in creative ways,” Zalon said. “It essentially gets smarter with each audio transition.”
MagicStitch currently completes a billion streaming song transitions across multiple services each month, according to Super Hi-Fi data.
Comment on this or any story to [email protected] with “Letter to the Editor” in the subject field.
More From Zack Zalon
We asked Zalon further questions about how MagicStitch software works and about the company’s technology in general.
Radio World: What physical signal parameters are being measured and assessed about a particular music track to define the way that Super Hi-Fi handles that track?
Zack Zalon: For starters, I’ll share that we are gathering a tremendous amount of data on the audio files. Yes, we are collecting countless features, but we are also gathering some very unique attributes from our machine learning services, as well as from over 1 billion data points from commercial usage that we collect every month.
The amount of data that we collect on each file is actually larger (in storage terms) than the source file itself. There are literally millions of data points that we collect, and then the trick is to train the AI to actually use these data points.
RW: Exactly how is the “human touch” of a segue developed for each track?
Zalon: For us, the key is not data per se, it is the idea of context. Yes, we need data, and a lot of it. But the data for us is a means to an end. What we’re working toward is a perfect contextual understanding of the audio file so we can automatically make really artful, human-like decisions about how to handle that content.
How does a quiet song transition into another quiet song? How does that same song properly transition into a higher-energy song? Does having a female singer make a difference, does it change the way a listener will react to a specific song transition? Should it be different if there is an advertisement that comes afterward? Should there be talking over the song?
These are the questions that we have been tackling, and then working backward to modify the service to ensure that it understands — comprehends — the content with enough depth to be able to make the right choices, all day every day.
RW: Does the Super Hi-Fi algorithm analyze different segments of an audio track differently?
Zalon: More specifically, we are collecting all of the data points you asked about earlier, though we use LUFS as a measure, not LKFS. But we also have designed and developed dozens of proprietary analysis tools and associated proprietary data points to measure. Existing tools weren’t giving us the broad-based view of the content that we needed for the AI to work properly. Please note that we aren’t just looking at music files, we are also analyzing spoken word, sound effects, advertising (of numerous types), sonic logos, etc.
So using traditional music analysis techniques wouldn’t be sufficient. Also to be specific, we analyze the entire file, not just any one section, and we analyze the difference of each data point so we can build a richer base of understanding regarding that file, how it changes over time, and how it relates to the other files that we may be stitching around it.
RW: Does the AI system do any audio correction or modification of the tracks?
Zalon: We do not do any audio correction or modification. In fact, we don’t actually deliver any files. Our customers deliver the files, what we do is to send them a set of presentation instructions in real time that they use to create their experiences. Everything for us is about placement, as though it is being mixed by a human DJ at a broadcast radio station. But it is actually AI making all of the calculations and sending those to our customers as they are requested.
RW: What really differentiates your AI from a cloud-based automation solution? There seem to be automation systems that can do the same right now. They have been stitching audio, liners and segues for decades. Is MagicStitching simply automation for the cloud?
Zalon: Today’s radio automation systems have some of these capabilities, like an Auto Jock, but they are very different from Super Hi-Fi. These radio system do a great job of automating for a linear terrestrial broadcast, using specific human annotation points — such as segue points — added in on a very select number of content files, be they music, voice liners, or advertising.
Super Hi-Fi is built for the scale and breadth of today’s largest digital streaming services, where the number of content options are virtually limitless, and the number of personal experiences are just as broad. With our AI, the data is all analyzed and annotated with no human intervention, so our system understands an incredibly wide array of music features on literally tens of millions of content files. Each decision — whether it be a song segue, a voice liner, a podcast snippet, or an advertisement — is calculated in real time based on each specific set of content options and for each unique listener. This provides enormous flexibility and control, and allows large streaming music services to start delivering radio-like listening experiences without limiting the kind of unique, personalized experiences that consumers have come to expect.
So, in a way, the outputs of the experiences are somewhat similar. We are very influenced by how radio uses production techniques to create differentiation and to build amazing branded services. We’re just coming at it from a very different direction and for use in a very different way.
The best example of this is in a comparison of scale: On a broadcast radio station, you can expect there could be perhaps 10 “transition” moments per hour (segues, liners, etc.), which adds up to around 7,200 per month. Super Hi- Fi is currently generating over 1 billion transitions per month for our customers. That’s the equivalent of us powering 138,000 broadcast radio stations, 24/7, all in real time. Today’s radio automation systems are fantastic at what they do, but they just aren’t built for the same use case.
RW: Are you collaborating at all with RCS, a company owned by iHeartMedia? RCS has a cloud solution for radio automation.
Zalon: We have a ton of respect for RCS, they’re definitely top of their field. But again they are focused on radio automation, and that’s not what we do. We are enabling unique, radio-like experiences for digital music streaming services, and so our technologies are very different from one another. That said, there’s no reason why we couldn’t collaborate with them; in some ways I imagine we’re each very complimentary to what the other does.
RW: You talk a lot about creating efficiencies with MagicStitch. What specifically do you add to the “broadcast stack”?
Zalon: When we talk of efficiencies, we are generally referring to the breadth of streaming music services. Imagine the difficulty of having to manually tag all 51 million music files that exist on today’s services. Imagine having to program the transition technology to handle hundreds of millions of listeners, and trillions of possible content combinations. It’s just not achievable without the kind of efficiencies that our AI provides. Now, I imagine that there are efficiencies available to radio broadcasters as well.
As an example I can state with confidence that we’re gathering vastly more data on each piece of content than any human would be able to assess. So that’s one specific example. But as to where we add value to the broadcast stack, I would guess that it would be different for each radio service, based specifically on their individual goals.
RW: If Super Hi-Fi AI can make placement and presentation decisions, what specific decisions does it make? Could the AI replace the need for radio broadcasters to schedule music and promos, or even commercials?
Zalon: Super Hi-Fi makes presentation and production decisions, but it doesn’t program music. I would guess that a radio broadcaster could use some automated programming technology, but humans seem to do a much better job of that. Our technology takes what has already been programmed and automates the presentation so it sounds amazing, with all of the segues perfectly designed for just that set of content, without human intervention.
RW: That said, talk of efficiencies typically means jobs losses in any business field. Where can Super Hi-Fi AI save broadcasters money? Can you give examples?
Zalon: I really can’t yet, as we don’t have any of those specific examples to give. Right now our customers are using Super Hi-Fi for next-generation streaming services, and in each of those cases our customers added employees. In other words they are using the efficiencies of our platform to grow listeners and revenue, not to drive cost savings.
Now, I imagine radio broadcasters could use our tools to save time and money, eliminating the need for anyone to add data to content or to align content in their radio automation services. But I think Super Hi-Fi is a more attractive option for broadcasters who want to use what they are already amazing at — incredible radio listening experiences — and to apply that to the next generation of listening. In other words, to take what they’re already doing but to do it across a new generation of listening platforms for a new generation of listeners. That’s where Super Hi-Fi really starts adding huge value.
RW: And those computer-generated voices you mention. When are those coming? Years or months? And how close are you to a solution?
Zalon: Great question. We aren’t a text-to-speech company, though we definitely keep our eye on the space. Amazon is doing some amazing things with their Polly service, and there are some very cool products that are in the early stages of commercial deployment. But let’s not forget that Bill Gates said in 1995 that the computer voice services would be amazing in five years, but here we are 25 years later and it still sounds computer generated. So it wouldn’t surprise me if it took another 25 years.