A lot of potential exists for using AI to develop systems that can accurately identify emotions in people’s faces or voices. The most promising techniques for doing so involve using machine learning or deep learning algorithms to extract features from faces or voices that are indicative of emotional state. Once these features have been extracted, it is possible to train a classifier to map them to specific emotions.
One promising approach is to use a convolutional neural network (CNN) to extract features from faces. CNNs are particularly well suited for this task because they are able to learn complex patterns from data. Research has shown that CNNs can learn to extract features from faces that are highly effective at predicting emotions.
Another approach is to use a deep learning algorithm called a long short-term memory (LSTM) network. LSTM networks are specially designed to deal with sequential data, such as audio data. LSTMs have been shown to be very effective at modelling the temporal dynamics of emotional speech.
Both of these approaches hold promise for accurately identifying emotions in people’s faces or voices. However, there is still some work to be done in terms of refining the algorithms and making them more robust. Additionally, it is important to note that emotions are complex phenomena that can be difficult to identify. As such, it is likely that no single AI system will be able to perfectly identify all emotions all the time. Nevertheless, AI systems hold great promise for improving our ability to interpret and understand emotions.
References:
https://www.sciencedirect.com/science/article/abs/pii/S0952697805001068
https://ieeexplore.ieee.org/document/7893554
https://www.sciencedirect.com/science/article/pii/S1877050916311008
https://tensorflow.dev/tutorials/text/word_embeddings