Sentimental analysis using speech

chanakaya96

10 months ago

Introduction

Sentiment analysis refers to the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information.The most common of the explored and easily available are using facial, text or speech recognition. Few of the very common api can be found here.Sentimental analysis has been there for very long time in speech recognition, known as emotional analysis. It involves the measurement of pitch, shrillness, keywords in speech, pace of speech etc as your input parameters for analyzing basic sentiments through speech and voice.

Methodologies

There are also many indirect ways adopted such as speech to text conversion, followed by the sentence tone analysis, example of which can be this.This particular version uses both the speech to text as well as the tone analyser from IBM Watson.This particular version considers only the word used in the speech, neither the pitch or shrillness is considered.Another issues is the accuracy of Watson speech to text conversion; where as CMUSphinx and Google speech to text api are better performer, google product even considers the dialect.There is a very crude custom built command line interface tool built by me with google speech to text and the Watson tone analyser available here.

The problem with the approach mentioned above is that pitch is a major contributor for the expression of the emotion, which is completely lost in the speech to text conversion.The above approach can be enhanced by also recording the speech as a signal, which when further exposed to the signal processing can explain the emotion masked in the pattern of the signal.

One way to improve the above mechanism can be emotion classification of the speech by deploying the statistics on fundamental frequency, energy contour, duration of silence and voice quality.Further to improve the quality many techniques are used, for example using log frequency power coefficients, exposed to the classifiers (may be Hidden Markov, etc).Other can be training a neural network over a large database of phoneme balanced words.

Thus to get a better sentimental approach it is important to have the spectro-temporal analysis of the speech along with the phrase analysis from the speech.

Challenges

the real time dialogue exchange should have the correct sampling at correct time
adjusting the power gradient as per the enviornmental noise for spectral analysis
adjustment for the differences in the different dialect and languages
giving correct weight to the variables if considered for multi mode approach

Conclusion

Different approaches have there own challenges, for instance the speech to text model with word extraction would not imply the right sentiment in case someone saying “oh right” may have different emotion behind them, they may be in agreement with you, or may be angry.