RIOT project log

Sentimental analysis using speech

Introduction

Sentiment analysis  refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.The most common of the explored and easily available are using facial, text or speech recognition. Few of the very common api can be found here.Sentimental analysis has been there for very long time in speech recognition, known as emotional analysis. It involves the measurement of pitch, shrillness,  keywords in speech, pace of speech etc as your input parameters for analyzing basic sentiments through speech and voice.

Methodologies

There are also many indirect ways adopted such as speech to text conversion, followed by the sentence tone analysis, example of which can be this.This particular version uses both the speech to text as well as the tone analyser  from IBM Watson.This particular version considers only the word used in the speech, neither the pitch or shrillness is considered.Another issues is the accuracy of Watson speech to text conversion; where as CMUSphinx  and Google speech to text api are better performer, google product even considers the dialect.There is a very crude custom built command line interface tool built by me  with google speech to text and the Watson tone analyser available here.

The problem with the approach mentioned above is that pitch is a major contributor for the expression of the emotion, which is completely lost in the speech to text conversion.The above approach can be enhanced by also recording the speech as a signal, which when further exposed to the signal processing can explain the emotion masked in the pattern of the signal.

One way to improve the above mechanism can be emotion classification of the speech by deploying the statistics on fundamental frequency, energy contour, duration of silence and voice quality.Further to improve the quality many techniques are used, for example using log frequency power coefficients, exposed to the classifiers (may be Hidden Markov, etc).Other can be training a neural network over a large database of phoneme balanced words.

Thus to get a better sentimental approach it is important to have the spectro-temporal analysis of the speech along with the phrase analysis from the speech.

Challenges

Conclusion

Different approaches have there own challenges, for instance the speech to text model with word extraction would not imply the right sentiment in case someone saying “oh right” may have different emotion behind them, they may be in agreement with you, or may be angry.