Descript Is a Powerful Audio Tool

The fields of radio broadcasting and podcasting are not quite siblings, but in many respects they’re not much farther apart than cousins. Equipment can overlap, and sometimes the same software is used.

Designers of products for podcasters often assume that an operator will be less experienced or skilled. It’s not necessarily the case; but innovations can make new software and hardware more intuitive for people not accustomed to technology.

One innovative piece of software is Descript.

Designed for MacOS (10.11) or Windows 10 (and later), it’s an all-in-one offering that provides video editing, audio editing, auto transcription and many features. It’s marketed for podcasting, video editing, screen recording and transcription applications.

Though the basic version is free, additional transcribing can cost $12 and up per month, with a discount for a year prepaid. There are “Creator,” “Pro” and “Enterprise” levels.

Descript’s website says its users include familiar media names such as Audible, WNYC, ESPN and iHeartMedia.

Editing on the fly

If someone ever designed editing software for your six-year-old child or for your grandma, this might be it, because it is intuitive — easy to use, smooth in editing.

It probably wouldn’t work well to replace traditional craft editing software like Adobe’s Audition or Pro Tools or even Audacity. But it certainly can speed up podcast editing, whether video and/or audio. And it has many features that allow you to add music to the background or still video or even pre-recorded video from a file.

Whether you’re working with a video recording or just audio, the software is based on converting the recording to text, then editing based on a script.

This is where the magic happens. Once it’s in script form, you can edit it like any Word document. The audio/video editing will automatically follow what you’ve done with the text script.

This image from the Descript website shows advanced editing including insertion of additional content into edit, including video with picture-in-picture effect, still images, background music additions and editing changes.

If you didn’t like something you said and you remove the text, the audio/video editing performs the matching function.

Of even more interest is changing what you’ve said. Descript will take those words and edit them in rearranged form.

On testing this, I found that the “realism” of the delivery post-edit will depend on just how the word was spoken originally. So this can be a little “hit or miss,” though it’s still cool.

In my testing with ingested content and recordings, it did an impressive job, and my edits often produced a natural-sounding delivery.

There’s a function to “de-ummm,” “de-ahhh,” “de-errr.” “de-like” and “de-kinda,” for speakers who throw those fillers in their delivery, and it’s done automatically. It’s called “remove filler words.” There’s even a “shorten word gaps” function for cleaning up excessive pauses automatically.

Though not offering all the features of craft audio and video editors, Descript does have features that we’re familiar with including non-destructive multi-track editing, titles, transitions and key-frame animation, audio mixing and mastering (“rubber-banding audio levels”). And it allows you to export the project to pro applications like ProTools, Adobe Audition or Premier, and Final Cut.

Another function of Descript allows for intuitive multi-user collaboration of editing, so multiple people can work on it at the same time.

For fun, I transferred into Descript some footage from a comedy show that included a singer with music and people speaking at the same time. The software did an admirable job in deciphering the spoken word with music under it. That’s not easy for speech-to-text conversion to do.

Some of the text conversion was funny or strange — Diet Sprite turned into Diet Striding — but when you consider that the text is simply your guideline for editing, it doesn’t hurt the editing aspect.

The video and audio editing take on a different feel with Descript. In fact, you more or less edit the text to edit the audio and video content.

By dragging a still image between the text, it edits itself into the video. Clicking on the image allows you to adjust the length of its appearance in the video. The same holds true for dragging audio and video content into the text script. It then places itself into the audio/video edit.

Since Descript works off the script, it provides timing marks and allows you to adjust edits based on time. For back-timing and producing an exact length show, this can help simplify time compression or expansion (to meet a timed window).

There are auto functions to remove background noise, clean up audio, auto-level the sound and even process the sound. The video aspect of the software allows for titles and “lower thirds” (adding names, titles, etc.), plus effects and transitions.

[Check Out More Product Evaluations in Our Products Section]

Impressive, scary

A unique AI feature is Overdub. This opens a Pandora’s box of possibilities, good and bad.

Your voice file is sampled and this allows for something that is NOT possible with craft editing: You can type within the script under “overdub” new words or things that had not been said during the recording, which are then injected using AI in your own voice.

Yes, the computer generates your own voice and reasonably matches your true voice.

Here I’m testing the speech recognition aspect and creating overdub audio edits. Video is from a webcam.

Be aware that Descript actually pulls in your voice to their server and creates the sampling, which they say takes two to 24 hours to sample. You also must read and record a disclaimer indicating this is, in fact, your voice and that they are allowed to do this.

Is it convincing? Well, to a degree.

Like most AI sampling, it is a human voice that has been sampled and converted, but what I call “emotional inflection” is not, at least to this point, possible with the AI voices and sampling I’ve experienced.

People who do voice work will understand that emotion and inflection in the delivery of a script are critical to “the sell” of the copy, and at this point only a human truly understands the meaning of his/her words to convey that emotion.

Maybe someday the AI will recognize the meaning of the words and the true meaning of the sentence and somehow modify that delivery. But for now the jobs of voice-over people seem safe in that respect.

Of course, this means there is no need to go back in the studio to record new lines, as the AI overdub can be used for corrections. But you can feel the potential for a person’s voice to be “stolen,” despite well-intended precautions.

It’s interesting to see the many possibilities and uses for this software. It is a unique and interesting way to edit both audio and video.

To really comprehend exactly what it’s capable of doing, I recommend downloading it and playing with the software.You might find it helps change the way some of your talent edits and how you get content to the web or social media.

It certainly will make you wonder what’s next.

There’s a plethora of training videos and explanations about Descript on YouTube; and its website contains a lot of useful, impressive information.