Voice Recognition Tech Knows You Better Than Your Therapist

Voice Recognition Tech Knows You Better Than Your Therapist

When a new song blasts onto our radios, we can often tell who’s singing despite not being told. Similarly, we can identify our friends and families by their voices alone.

But just how unique are our voices? And with over 7.3 billion people on the planet, what are the chances of one of them having the exact same voice as you?

There’s tons of vocal variety to start with

Your exact voice is a product of many different factors, influenced largely by your genetics, age, and sex.

The size and shape of your larynx (also known as your voice box); the size, shape, and tension of your vocal cords; and the pressure of the air that passes through them from the lungs all influence the sound of your voice.

Voice recognition has its work cut out for it. With a million tiny variations it’s safe to say your voice is unique.

On top of this, the shape and size of your body is also an important factor, with the size of your chest cavity, the shape of your mouth opening, and the position, size, and shape of your tongue, palate, and lips all influencing your final, characteristic sound.

Then there’s the behavioral component of speech—the bit that other people mimic when doing impersonations of you. This is determined by the movements of your jaw and how you use your mouth and tongue to articulate words, colloquially termed your ‘accent.’

So with tiny variations in any of those factors capable of changing the pitch, volume, tone, or timbre of your voice, it’s safe to say your voice is unique.

Do you understand the words coming out of my mouth?

Speech recognition technology that turns speech into text or audio has been around for a while. Arguably the most famous speech recognition technology is Apple’s Siri, released in October 2011. While the current system is admittedly still frustrating (“I said Dan, not Dad, dammit! Oh, hi Dad”), the technology is rapidly improving.

Hey Siri?

Did you know Siri has a pretty good sense of humor, if you have an Apple iphone or ipad, trying asking Siri one of these questions to elicit a sassy response.

Siri Say: “Beatbox”
Star Wars fan? Say: “Siri, I am your father”
What is zero divided by zero?
When is the world going to end?
What came first, the chicken or the egg?
OK Glass
Where is Elvis Presley?
I’m naked
What are you wearing?
I’m drunk
What is the meaning of life?
Do you follow the three laws of robotics?
I’m sleepy
What does Siri mean?

Researchers are now working on creating computers smart enough to engage in fluent conversations with users via artificial neural networks—computer systems modeled after the human brain.

While it may still be a while before we’re casually conversing with our computers, the keyboard is definitely going out of fashion. Soon we’ll instruct our TVs to “play Saw VI,” and they’ll quip back, “Seriously, haven’t you had enough of this torture porn yet?”

Soon we’ll instruct our TVs to “play Saw VI,” and they’ll quip back, “Seriously, haven’t you had enough of this torture porn yet?”

Emotionally aware computers are on the way

Today’s technology goes beyond simply recognizing what words are being said and instead focuses on how they’re said.

Our voices are like a canvas for our emotions. While humans can (with varying degrees of success) intuitively gauge emotions, researchers are attempting to define our emotions based on the spectral properties of our voices.

Sadness, for example, is characterized by lower intensity, less pitch variation, and more energy under 2000 Hz.

Call centers can now analyze the spectral properties of your voice, allowing them to recognize emotions, like identifying when you’re frustrated

Armed with this kind of knowledge, we could build a spectral library of emotions and move one step closer to digitally decoding our emotions.

Israeli company Beyond Verbal already develops applications and devices that analyze your emotions in real time.

The Empath app, for example, will decode your emotional state from a 30-second voice recording.

Using voice recognition,  some call centers now analyze the spectral properties of your voice, allowing them to recognize emotions, identify frustrated customers and direct them to an actual human customer service representative or alert a manager.High Agitation on line 12! (This is what you look like to an actual call center manager)

Technology like this has endless practical applications—monitoring a driver’s speech to detect fatigue or even creating emotionally aware computers that could apologize to you for freezing.

Some call centers now analyze the spectral properties of your voice, allowing them to recognize emotions, identify frustrated customers and direct them to an actual human customer service representative or alert a manager.

While this may seem a little pointless (“frustrated customers” on a telephone help line presumably includes everybody), the possibilities are endless.

So with all this voice technology, can our unique voices be used as a fingerprint?

Speaker recognition technology falls into the field of biometrics, the science of characterizing a person by their physical traits; fingerprints are the most common example.

Using sophisticated statistical algorithms to model the acoustic pattern of a voice, this branch of biometrics aims to match a voice sample with an individual.

Once only the realm of a skilled dialectician, automated speaker recognition systems are being increasingly used by law enforcement agencies, including the FBI.

With all this voice  recognition technology, can our unique voices be used as a fingerprint?
Yeh it’s me, Agent Hanratty youse damn robot

Speaker recognition technology is already reliable enough that the Australian Tax Office uses a voiceprint system for user authentication. The system, developed by US-based company Nuance Communications (the same company behind Apple’s Siri), matches a customer’s vocal password with one stored in their database. The process is a true-or-false binary decision problem; or, if you’re not so into data processing lingo, it’s one of pattern matching.

“Your fake sick voice might fool your boss, but it won’t fool speaker recognition technology.”

The system records your vocal waveform, including frequencies outside the range of the human ear, which encapsulates the duration, intensity, and pitch of your voice. It essentially turns your voice into a mathematical equation: your voiceprint.

The technology does have its downsides though—it is susceptible to error due to background noise and recording quality variation, and it relies on quality pre-recorded voice samples.

One large positive, however, is that since the technology actually works by using your voice to model the shape of your vocal tract, it’s reliable even if you have a cold and sound horribly nasal. Your fake sick voice might fool your boss, but it won’t fool speaker recognition technology.

Taking a sick day might be harder with voice diagnostics

Researchers are paying increasing attention to voice’s potential use as a medical diagnostic tool by studying the vocal changes that accompany various disorders—and not just the croaky voice you get when you’ve got a cold.

Researchers are paying increasing attention to voice’s potential use as a medical diagnostic tool by studying the vocal changes that accompany various disorders—and not just the croaky voice you get when you’ve got a cold.
Unless you’re naturally croaky

One example is the importance of voice quality in the diagnosis and treatment of rheumatoid arthritis, an inflammatory joint disease. A 2015 study at the Medical University of Bialystok in Poland found that 44.44% of rheumatoid arthritis (RA) patients had coinciding voice quality disorders, and the percentage was higher in those suffering from more severe RA.

And a team at the Rhenish University of Applied Sciences Cologne is using voice recognition for sleepiness detection in air traffic controllers, depression, alcoholization, and more.

Noticeable voice changes may also accompany attention deficit hyperactivity disorder (ADHD). In fact, Berlin-based company AudioProfiling currently uses voice recordings and properties such as speech rhythm or the spacing of syllables to profile children with more than 90% accuracy.

Other promising areas of research include whether or not early voice changes could help detect the first stages of serious neurological diseases like Alzheimer’s and Parkinson’s disease.

Hopefully one day we’ll know enough to perform diagnostic tests cheaply over the telephone. Unfortunately, these kinds of tests only work for those lucky enough to have a voice.

Engineering voices

We all recognize the synthetic voice of Stephen Hawking. His voice characterizes him in pop culture more than his intellect, but does it really reflect him as a person?

Researchers like speech scientist Rupal Patel are now reverse engineering unique synthetic voices for people who have lost (or never had) the ability to speak.

“Our voices help define our identities, so giving somebody a unique synthetic voice that fits with their vocal identity is a wonderful thing.”

Our voices help define our identities, so giving somebody a unique synthetic voice that fits with their vocal identity is a wonderful thing.

People who have lost the ability to talk properly may be unable to produce consonants and vowels, but they can often still generate vibrations and control pitch and volume (often called “source characteristics”) using their voice box.

By mixing the source characteristics of a person and the filter characteristics of a similar individual, speech scientists can reverse engineer that person’s voice and create a synthetic approximation of what they would really sound like. If computers now have the ability to talk, so too should all humans.

In a world where even our computers talk to us, we’re bombarded with a huge variety of voices, both real and artificial. Speech and speaker recognition technology is rapidly improving, and voiceprints are already being stored en mass around the world. But as with any super cool technological advance, it comes with the inevitable question of whether the downsides—in this case, privacy—are worth it.

“Our voice betrays an abundance of information about us, whether we want it to or not.”

Technology is constantly being developed that takes advantage of patterns in our speech, including for law enforcement and important medical diagnoses, but it’s a slippery slope. One day soon, we may have mobile apps that reveal the truth of a man’s answer to “does my bum look big in this,” detect lying spouses, or even help bosses cheat us out of hard-earned sick days. And nobody wants that.

[TheChamp-FB-Comments]