Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now


Yamaha Vocaloid: Mr. Roboto Lives!

As I had been fascinated with text-to-speech applications for a long time, the prospects of text-to-singing were quite magnetic.

(click thumbnail)I first heard about Vocaloid singing synthesizer technology only days prior to the NAB show a year and a half ago, where I was delivering a panel talk on new audio software and techniques for radio production. As I had been fascinated with text-to-speech applications for a long time, the prospects of text-to-singing were quite magnetic.

Vocaloid from Zero-G Ltd. and Yamaha is software that creates up to 16 tracks of virtual singers, with total control over nuances such as volume, vibrato (the subtle pitch warble in actual singing voices), brightness, pitch bend and gender.
Product CapsuleTHUMBS UP:

Extremely creative concept

Considerable control over vocal characteristics

Exports to WAV for use in other programs


Still sounds very synthetic

Restrictive licensing

Requires effort to coax a good performance

PRICE: $329.95 or
This is by no means HAL 9000 droning through “Bicycle Built for Two.” There is a lot going on here, which makes it fascinating and irresistible to the mind of the warped production person. A computerized vocalist adds new dimension to comedy bits, perhaps vocal backgrounds in advertising, and you finally have that choir you always wanted to sing, “The traffic today is crappy!”

Keep in mind that the singing truly does sound synthetic at this point in the product’s evolution, and it takes considerable effort to get useful results out of Vocaloid.


The software was created and perfected by Yamaha, while the voice component was created by U.K.-based Zero-G Ltd. Zero-G also did the packaging and marketing, and now offers three Vocaloid packages featuring different male and female singers. The version I was given to work with was Leon, a virtual male soul vocalist based on the voice of a well-known session singer in the U.K.

Vocaloid singing synthesis is created in much the same way as concatenated speech. A word is entered as text and compared to existing words in a dictionary or analyzed by standard pronunciation rules. Phonemes, or small slices of spoken sounds, are retrieved from memory and assembled smoothly into syllables to form the desired words.

The difference here is that Vocaloid also works on the pitch and dynamics of each syllable, and does so without sounding “chipmunk-y” or harsh.

The process begins with the Sequence window (Fig. 1). The notes you wish your virtual vocalist to sing are drawn in with the Pencil tool, piano-roll fashion. As always with programs such as these, a little knowledge of music is necessary; after all, you need to know what pitches you want your computer to perform.

You may draw the note durations freehand, or use the Grid function to lay down preset lengths.

What you get is a screen full of notes and the uninspiring lyrics, “Ooh-ooh-ooh.” Now we add the lyrical content. Double-click each “ooh” in order, then type in each lyric you want generated.

Once done, click the Phoneme Transformation button (the one with the squashed-together “a” and “e”: æ). This converts your text into the data Vocaloid needs to create the performance.

Now and again, Vocaloid may come across a word not in its dictionary. Much as in MS Word, you may add new or customized words into the database, along with a pronunciation to which the program may refer.

Once the text is transformed properly, start your playback. Then make a disgusted face and wonder to yourself what you ever saw in this software to begin with: the singer sounds flat and lifeless.

You be the vocal coach

Remember, effort is needed to get satisfactory results from Vocaloid. All we have done up to this point was teach Vocaloid our song. Now we need to buff up its interpretation.

Click open the Icon Palette (Fig 2). Here we have the opportunity to modify the attack of some notes, including bendups, or sliding up to the correct pitch, and accents. You may also decide on a level of vibrato, from a gentle Peabo Bryson warble to an over-the-top Andy Williams showstopper.

While you are here, set the dynamics from pianissimo (triple-soft) all the way to fortissimo (triple-loud). Real singers do this, so should your Vocaloid performance. Grab an icon, drop it into the track and that’s it.

Finally, bang open the Control Track and start making changes here to gender expression and brightness. As a voice goes higher and louder, both elements tend to change.

You may dig deeper into the Vocaloid library and make changes directly to the primary qualities of the singer, including resonance and harmonics – basically you may reshape your singer’s nasal and thoracic cavities for different timbres. One virtual vocalist can take on infinite characteristics.

Now that you have your lead singer in place, you can generate a second, third and fourth singer and beyond. A Mixer window can open up to set level and pan positions of all voices.

Probably the least complicated way to use a Vocaloid performance in existing music is to figure out the tempo of the music bed you are using – easily done in many audio editors. Calculate the beats per minute (bpm) and apply that tempo to your Vocaloid performance.

Render and export the completed Vocaloid project as a WAV file, which may then be opened up in a multitrack editor and mixed with the existing music bed.

If you have software capable of doing MIDI sequencing and audio, such as Cubase, Cakewalk, Power Tracks or other such programs, Vocaloid may be used as a plug-in for the host program. The entire production may take place in one environment instead of two or more.

I mentioned earlier that Vocaloid requires lots of tweaking. It is not possible to simply type in your text and expect immediate results.

One project I tried was recreating the three-part vocal harmonies of “Walk Away Renee” by the Left Banke. These lyrics were entered to match the pitches in the Sequence window: “Just walk away Renee, you won’t see me follow you back home.”

What came out of the speakers had an almost stoned Californian affectation: “Just Walk A-weh Ren-neh, you won’t see me fol-leau yeu back heaum.”

Clearly a little editing was needed on the syllabic level.


The licensing agreement accompanying Vocaloid may throw you at first, as there are numerous restrictions to its use that may require separate licensing above and beyond the purchase of the product. These include animated cartoons and mobile ringtones, neither of which affects broadcast production to a great degree.

Of particular interest to us is the section requiring additional licensing on a commercially released recording crediting the singer as not being human, but a machine or a specific technology.

This means you can probably do all of the on-air routines featuring “Fred the Singing Computer” you want, but the moment you commit them to a CD and sell them – or even make them available for download on a sponsored webpage – you are subject to additional licensing through Yamaha.

Similarly, it is unlikely a Vocaloid performance is appropriate for the lead line in an advertising jingle … yet. It is still a new technology finding a foothold. Besides, there is a stipulation in the licensing that prohibits use that could be “harmful to the moral rights” of the original singer whose voice constitutes the phoneme library. For all we know, such uses might include fur dealers, adult video stores, online betting or tobacconists. Check on this before you run with any great ideas from the sales force.

The professional and hobbyist music industry is keeping a little distance between itself and Vocaloid, with other reviewers commenting that the software still sounds synthetic and not very believable.

For me, that is the charm. As much work as it takes to pull a worthwhile piece of production out of Vocaloid’s virtual craw, I get a plasticky, almost Jetson-esque quality to a rendered audio file that I can’t get any other way.

Creative minds will find ways to use this product for morning comedy bits and contests. How about starting a rumor that Michael Bolton has actually been retired since 1994 and that a look-alike robot has been standing in for him at concerts? Oh, you don’t believe us? Well, just listen to this (insert typical mushy Bolton ballad recreated in Vocaloid)!

Vocaloid may not be everybody’s ideal software. It won’t make your production flow any faster and won’t make your GM rich. But it is one more creative tool in the arsenal, and since Yamaha and Zero-G are constantly adding new vocal templates and upgrades, it has nowhere to go but up.

Try the demo, but don’t be disappointed if it doesn’t give you exactly what you want the first time.