Sentimental analysis using speech


Sentiment analysis  refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.The most common of the explored and easily available are using facial, text or speech recognition. Few of the very common api can be found here.Sentimental analysis has been there for very long time in speech recognition, known as emotional analysis. It involves the measurement of pitch, shrillness,  keywords in speech, pace of speech etc as your input parameters for analyzing basic sentiments through speech and voice.


There are also many indirect ways adopted such as speech to text conversion, followed by the sentence tone analysis, example of which can be this.This particular version uses both the speech to text as well as the tone analyser  from IBM Watson.This particular version considers only the word used in the speech, neither the pitch or shrillness is considered.Another issues is the accuracy of Watson speech to text conversion; where as CMUSphinx  and Google speech to text api are better performer, google product even considers the dialect.There is a very crude custom built command line interface tool built by me  with google speech to text and the Watson tone analyser available here.

The problem with the approach mentioned above is that pitch is a major contributor for the expression of the emotion, which is completely lost in the speech to text conversion.The above approach can be enhanced by also recording the speech as a signal, which when further exposed to the signal processing can explain the emotion masked in the pattern of the signal.

One way to improve the above mechanism can be emotion classification of the speech by deploying the statistics on fundamental frequency, energy contour, duration of silence and voice quality.Further to improve the quality many techniques are used, for example using log frequency power coefficients, exposed to the classifiers (may be Hidden Markov, etc).Other can be training a neural network over a large database of phoneme balanced words.

Thus to get a better sentimental approach it is important to have the spectro-temporal analysis of the speech along with the phrase analysis from the speech.


  •  the real time dialogue exchange should have the correct sampling at correct time
  • adjusting the power gradient as per the enviornmental noise for spectral analysis
  • adjustment for the differences in the different dialect and languages
  • giving correct weight to the variables if considered for multi mode approach


Different approaches have there own challenges, for instance the speech to text model with word extraction would not imply the right sentiment in case someone saying “oh right” may have different emotion behind them, they may be in agreement with you, or may be angry.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s