VOCALOPEDIA

ボーカロイド百科事典
Open Menu

GUIDE

✰ = Source Link for citation, click to see where tidbits were taken from.


TABLE OF CONTENTS

1
What Is Vocaloid?
2
History
2 a.
The Elvis Project
2 b.
The Daisy Project
2 c.
The Initial Start
2 d.
The First Generation of Voices
3
How Does It Work?
4
Products
4 a.
Mainline Products
4 b.
Derrivative Products
4 c.
Voice Banks

WHAT IS VOCALOID?

Vocaloid (ボーカロイド, Bōkaroido) is a singing voice synthesizer software product. Its signal processing part was developed through a joint research project led by Kenmochi Hideki at the Pompeu Fabra University in Barcelona, Spain, in 2000 and was not originally intended to be a full commercial project.Backed by the Yamaha Corporation, it developed the software into the commercial product "Vocaloid" that was released in 2004.


The software enables users to synthesize "singing" by typing in lyrics and melody and also "speech" by typing in the script of the required words. It uses synthesizing technology with specially recorded vocals of voice actors or singers. To create a song, the user must input the melody and lyrics. A piano roll type interface is used to input the melody and the lyrics can be entered on each note. The software can change the stress of the pronunciations, add effects such as vibrato, or change the dynamics and tone of the voice.


VOCALOID and its succeeding versions are voice synthesizing applications that have been sold commercially since 2004. VOCALOID began from the simple concept of synthesizing the human vocals for singing. From the start this was never an easy task for the programmers behind the software, but from that beginning VOCALOID™ has grown into a worldwide phenomenon, spawning new musicians, albums, figurines and even concerts.


Vocaloid has become a unique subculture of user-created content (music, illustrations, videos, software, games, cosplay, events) based on a shared love of voice synthesizers and their mascots. Its common for all vocal synthesizers to be reffered to as "Vocaloids" [ex; Kasane Teto being commonly grouped in with them, though she is an UTAUloid], however, this is simply just a catch-all since Vocaloid is the most commonly known synthesizer.


HISTORY

"The Elvis Project"

In the 20th century, the most successful vocal synthesizing attempt had been "Queen of the Night" from Mozart's opera The Magic Flute; this had been made in 1984 by Yves Potard and Xavier Rodet using the CHANT synthesizer.
Jordi Bonada, a senior researcher at the Music Technology Group at Pompeu Fabra University in Barcelona joined the university in 1997. Bonada worked on a research project as requested by YAMAHA which contained some "interesting" ideas. Bonada was known to have set about recording not just a song from a singer, but various ranges and pitch in an attempt to build a model that any song could be built from. The project was codenamed "Elvis" and lasted two years. It did not become a product at the end of its development. This was due to the fact this particular project was too large due to being based on spectral morphing techniques and each song required a professional singer behind it.


While it did not become a product, the "Elvis Project" helped establish that a series of phonetics in a wide range of pitches would help build a synthesizer based on any model.


"Daisy"

YAMAHA agreed to help them start a fresh new project; it was at this point that Kenmochi Hideki joined. The first initial ideas came from him in Japan in 2000, with most of the research done at the Pompeu Fabra University and the development of the core signal processing libraries created in C++. YAMAHA itself was responsible for the product design and development of the actual product. It was pure collaborative research, and they did not think about selling at that time.


At the time, synthesizers would take days to produce good-quality results, but the vocal would always sound inhuman and obviously generated by a machine or computer. The price was expensive as well. This meant that while all other parts of the music production were by then fully able to be recreated in a DAW, producing a good-quality vocal performance meant hiring a human vocalist. So, the aim of the project was to provide a fast, low-cost way of getting uncanny human-like vocals to give producers full control of music production.


The VOCALOID™ project was originally codenamed "Daisy Project" ("DAISYプロジェクト" or "でいじぃぷろじぇくと"), a name taken from the song "Daisy Bell" and was at a prototype stage in March 2002. (EpR [1]) was developed as the first voice model and it allowed the researchers to transform vocal timbres in a natural manner while preserving subtle detail. At first, "Daisy" could only say vowels like "ai (love)". Four months later, "Daisy" began to support consonants, with the first "complete word" being "asa" (morning).


Because YAMAHA itself could only provide limited vocals, they licensed the software out to various 3rd party studios. The first studio to join this project was Crypton Future Media, who were contacted in May 2002. YAMAHA then attempted to find English studios to support an English version, but the majority of responses to contact were negative. The first studio to enter development was Zero-G, joining in the fall of 2002, with PowerFX also joining that year. Thus, both English and Japanese voicebanks began development.


"Daisy" dropped as a name due to conflicts with copyrighting - despite attempts to change the name (such as translating it into Japanese), they ultimately could not register it. The only 4 known vocals for "Daisy" were: LEON, LOLA, HANAKO, and TARO. LEON and LOLA were the only ones ever to be shown to the public, releasing as official voicebanks for the final VOCALOID software.


"Daisy" dropped as a name due to conflicts with copyrighting - despite attempts to change the name (such as translating it into Japanese), they ultimately could not register it.


The only 4 known vocals for "Daisy" were: LEON, LOLA, HANAKO, and TARO. LEON and LOLA were the only ones ever to be shown to the public, releasing as official voicebanks for the final VOCALOID software. HANAKO and TARO would go on to be released as "MEIKO" and "KAITO".


The Initial Start

YKenmochi reported the name of the software was very hard at the time to decide and "VOCALOID" had fallen into 3rd place as a choice of name. The name "VOCALOID" was chosen 2 or 3 weeks before its announcement, after the 2nd choice name failed due to a copyright conflict with a software in Belgium, "VOCALOID" being a portmanteau of the words "Vocal" and "Android" ("vocal android"). Kenmochi chose to announce the technology on February 26, 2003, a day before his birthday.


The original design of VOCALOID™ was to act as a replacement singer for a real singer. Many reviewers at the time of LEON and LOLA's release thought that "VOCALOID" was a bold effort, as human speech was a complex thing to recreate. VOCALOID was regarded as the first of its kind to tackle singing vocals.


The First Generation of Vocals

The first VOCALOIDs, LEON and LOLA, made their debut appearance and initial release at the NAMM Show on January 15, 2004. LEON and LOLA were then released in Japan by the studio Zero-G on March 3, 2004, both of which were sold as a "Virtual Soul Vocalist". They were also demonstrated at the Zero-G Limited booth during Wired Nextfest and won the 2005 Electronic Musician Editor's Choice Award. Zero-G later released MIRIAM, with her voice provided by Miriam Stockley, in July 2004. Later that year, Crypton Future Media, Inc. also handled the release of the first Japanese VOCALOID, MEIKO. It was during this time period between MIRIAM and MEIKO's respective releases that the first rival software Cantor was released and aimed to compete with VOCALOID, known only in the western hemisphere by LEON, LOLA, and MIRIAM.


It is notable that back in 2004, VOCALOID was released towards the end of the "FLASH golden age" (FLASH黄金時代), a period known for the rise of flash-based productions in Japanese websites (1998-2002/2005, end date arguable) and the birth of video sharing sites such as YouTube and Nico Nico Douga.


Though LEON, LOLA, MIRIAM, and MEIKO experienced good sales (MEIKO gaining sales of 3,000 in her first year in particular), KAITO initially failed commercially and sold just 500 units. Despite this, the software was overall successful and was followed by the VOCALOID2 engine.


At the closing of the VOCALOID era, it was confirmed that 3 groups had joined production of the software. These companies were: Crypton Future Media, Zero-G Ltd., and PowerFX. However, PowerFX, having been introduced to the software via LEON and LOLA's demonstrations at the 2002 NAMM Show, did not produce any vocals for this version for VOCALOID, making their entrance at the beginning of the VOCALOID2 era. However, it is known they had a vocal in development as early as 2003 that was intended for the engine under the name of "JODIE" as well as a male vocal "RONIE". JODIE and RONIE would go on to be released as "Sweet Ann" and "BIG AL".


TECHNOLOGY

How Does It Work?

Vocaloid's singing synthesis technology is generally categorized into the concatenative synthesis in the frequency domain, which splices and processes the vocal fragments extracted from human singing voices, in the forms of time-frequency representation. The Vocaloid system can produce the realistic voices by adding vocal expressions like the vibrato on the score information. Initially, Vocaloid's synthesis technology was called "Frequency-domain Singing Articulation Splicing and Shaping" (周波数ドメイン歌唱アーティキュレーション接続法, Shūhasū-domain Kashō Articulation Setsuzoku-hō) on the release of Vocaloid in 2004, although this name is no longer used since the release of Vocaloid 2 in 2007. "Singing Articulation" is explained as "vocal expressions" such as vibrato and vocal fragments necessary for singing. The Vocaloid and Vocaloid 2 synthesis engines are designed for singing, not reading text aloud, though software such as Vocaloid-flex and Voiceroid have been developed for that. They cannot naturally replicate singing expressions like hoarse voices or shouts.


The main parts of the Vocaloid 2 system are the Score Editor (Vocaloid 2 Editor), the Singer Library, and the Synthesis Engine. The Synthesis Engine receives score information from the Score Editor, selects appropriate samples from the Singer Library, and concatenates them to output synthesized voices. There is basically no difference in the Score Editor and the Synthesis Engine provided by Yamaha among different Vocaloid 2 products. If a Vocaloid 2 product is already installed, the user can enable another Vocaloid 2 product by adding its library. The system supports three languages, Japanese, Korean, and English, although other languages may be optional in the future. It works standalone (playback and export to WAV) and as a ReWire application or a Virtual Studio Technology instrument (VSTi) accessible from a digital audio workstation (DAW).


The Score Editor is a piano roll style editor to input notes, lyrics, and some expressions. When entering lyrics, the editor automatically converts them into Vocaloid phonetic symbols using the built-in pronunciation dictionary. The user can directly edit the phonetic symbols of unregistered words. The Score Editor offers various parameters to add expressions to singing voices. The user is supposed to optimize these parameters that best fit the synthesized tune when creating voices. This editor supports ReWire and can be synchronized with DAW. Real-time "playback" of songs with predefined lyrics using a MIDI keyboard is also supported.


Each Vocaloid license develops the Singer Library, or a database of vocal fragments sampled from real people. The database must have all possible combinations of phonemes of the target language, including diphones (a chain of two different phonemes) and sustained vowels, as well as polyphones with more than two phonemes if necessary. For example, the voice corresponding to the word "sing" can be synthesized by concatenating the sequence of diphones "#-s, s-I, I-N, N-#" (# indicating a voiceless phoneme) with the sustained vowel ī. The Vocaloid system changes the pitch of these fragments so that it fits the melody. In order to get more natural sounds, three or four different pitch ranges are required to be stored into the library. Japanese requires 500 diphones per pitch, whereas English requires 2,500. Japanese has fewer diphones because it has fewer phonemes and most syllabic sounds are open syllables ending in a vowel. In Japanese, there are basically three patterns of diphones containing a consonant: voiceless-consonant, vowel-consonant, and consonant-vowel. On the other hand, English has many closed syllables ending in a consonant, and consonant-consonant and consonant-voiceless diphones as well. Thus, more diphones need to be recorded into an English library than into a Japanese one. Due to this linguistic difference, a Japanese library is not suitable for singing in eloquent English.


The Synthesis Engine receives score information contained in dedicated MIDI messages called Vocaloid MIDI sent by the Score Editor, adjusts pitch and timbre of the selected samples in frequency domain, and splices them to synthesize singing voices. When Vocaloid runs as VSTi accessible from DAW, the bundled VST plug-in bypasses the Score Editor and directly sends these messages to the Synthesis Engine.


PRODUCTS

Mainline Products

VOCALOID, at this current moment, has released 6 mainline synthesizers that are able to purchase from the official website, as well as several online and physical stores.


- VOCALOID (also known as "VOCALOID 1" or just "V1")


- VOCALOID 2 (or just "V2")


- VOCALOID 3 (or just "V3")


- VOCALOID 4 (or just "V4")


- VOCALOID 5 (or just "V5")


- VOCALOID 6 (or just "V6")


Derrivative Products

YAMAHA has released many spin-off and derrivative products for VOCALOID.


- VOCALOID-Flex


- iVOCALOID


- VocaloWitter


- VOCALOID First


- VOCALOID Editor for Cubase


- Unity with VOCALOID


- VOCALOID-AI


- NetVocaloid


- Mobile VOCALOID


- VOCALOID for Education


- VOCALOID for Education II


- eVOCALOID


- VOCALOID Keyboard


- VOCALOID-board


- Anizon VOCALOOP


- Charlie


- VocaListener


Voice Banks

As of this current moment, VOCALOID is home to 342 Voice Banks. This is including SE, NEO, updated releases, mobile releases, private licenses, derrivative products, and cancelled voices. Looking to keep this article brief, there is a complete list of voice banks available on a seperate page you can find here.


top