HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Do we know how to make a "voice print" and use it for computer generated voice?

Submitted by: @import:stackexchange-cs··
0
Viewed 0 times
generatedknowmakevoiceprintforcomputerhowanduse

Problem

Well everything is in the title.

After all Siri voice was made from someone, so can I
use software to make a computer generated voice sound
like my voice or anybody else?

Solution

See the Voice transformation project from the The Center for Speech Technology Research at the University of Edinburgh.

Thesis: Transforming Voice Quality and Intonation by Ben Gillet.

It doesn't do exactly what you are asking, but something similar; it tries to change the voice of a recording to another voice.

Abstract:

Voice transformation is the process of transforming the characteristics of speech uttered by a source speaker, such that a listener would believe the speech was uttered by a target speaker. In this thesis two aspects of the transformation problem are addressed: voice quality and intonation

The voice quality transformation component of our system has two main parts corresponding to the two components of the source-filter model. The first component transforms the spectral envelope as represented by a linear prediction model. The transformation is achieved using a Gaussian mixture model, which is trained on aligned speech from source and target speakers. The second part of the system predicts the spectral detail from the transformed LPC parameters. A novel approach is proposed, which is based on a classifier and residual codebooks. The system has some similarities with earlier work by Kain, however the work reported here is not restricted to speech spoken in a monotone and with mimicked prosody. Also, on the basis of a number of performance metrics it outperforms existing systems.

We also present a new method for the transformation of pitch contours from one speaker to another based on a small linguistically motivated parameter set. The system performs a piecewise-linear mapping using these parameters. A perceptual experiment, clearly demonstrates that the presented system is at least as good as the existing technique for all speaker pairs, and that in many cases it is much better and almost as good as using the target pitch contour.

Context

StackExchange Computer Science Q#17926, answer score: 3

Revisions (0)

No revisions yet.