Emotion Recognition
This guide demonstrates how to perform Emotion Recognition with Phonexia Speech Platform 4 Virtual Appliance. You can find a high-level description in the Emotion Recognition article. The technology can detect and classify emotion from media files.
For testing, we'll be using the following 8 media files. You can download them all together in the audio_files.zip archive.
| Filename | Channel | Happy | Neutral | Sad | Angry |
|---|---|---|---|---|---|
| Barbara.wav | 0 | 79.9% | 19% | 0.6% | 0.5% |
| David.wav | 0 | 0.1% | 0.2% | 0% | 99.7% |
| 1 | 9.9% | 63.9% | 0.1% | 26% | |
| Jack.wav | 0 | 4.7% | 92.7% | 0% | 2.6% |
| Jack_Keith.wav | 0 | 5.7% | 85.6% | 0% | 8.7% |
| 1 | 0.3% | 5.6% | 9.8% | 84.3% | |
| Jiri.wav | 0 | 0.4% | 1.4% | 0% | 98.1% |
| Juan.wav | 0 | 13.3% | 47.6% | 38.6% | 0.5% |
| Laura_Marek.wav | 0 | 0.3% | 95.1% | 4.3% | 0.4% |
| 1 | 0.1% | 99.9% | 0% | 0.1% | |
| Steve.wav | 0 | 0.2% | 76.2% | 23.4% | 0.1% |
At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding on how to integrate Emotion Recognition in your own projects.
Prerequisites
Follow the prerequisites for setup of Virtual Appliance and Python environment as described in the Task lifecycle code examples.
Run Emotion Recognition
To run Emotion Recognition for a single media file, you should start by sending
a POST request to the
/api/technology/emotion-recognition
endpoint. file is the only mandatory parameter. In Python, you can do this as
follows:
import requests
VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/emotion-recognition"
media_file = "Barbara.wav"
with open(media_file, mode="rb") as file:
files = {"file": file}
start_task_response = requests.post(
url=MEDIA_FILE_BASED_ENDPOINT_URL,
files=files,
)
print(start_task_response.status_code) # Should print '202'
If the task has been successfully accepted, the 202 code will be returned
together with a unique task ID in the response body. The task isn't
processed immediately, but only scheduled for processing. You can check the
current task status by polling for the result.
Polling
To obtain the final result, periodically query the task status until the task
state changes to done, failed or rejected. The general polling procedure
is described in detail in the
Task lifecycle code examples.
Result for Emotion Recognition
The result field of the task contains channels list of independent results
for each channel, identified by its channel_number. The speech_length field
indicates the duration of speech used for recognition of the emotions.
The scores list contains probability values for different recognized emotions.
For our sample data, the task result should look as follows:
{
"task": {
"task_id": "123e4567-e89b-12d3-a456-426614174000",
"state": "done"
},
"result": {
"channels": [
{
"channel_number": 0,
"speech_length": 13.5,
"scores": [
{ "emotion": "HAPPY", "probability": 0.85 },
{ "emotion": "NEUTRAL", "probability": 0.1 },
{ "emotion": "SAD", "probability": 0.05 },
{ "emotion": "ANGRY", "probability": 0.0 }
]
},
{
"channel_number": 1,
"speech_length": 13.5,
"scores": [
{ "emotion": "HAPPY", "probability": 0.75 },
{ "emotion": "NEUTRAL", "probability": 0.15 },
{ "emotion": "SAD", "probability": 0.1 },
{ "emotion": "ANGRY", "probability": 0.0 }
]
}
]
}
}
Full Python Code
Here is the full example on how to run the Emotion Recognition technology. The code is slightly adjusted and wrapped into functions for better readability. Refer to the Task lifecycle code examples for a generic code template, applicable to all technologies.
import json
import requests
import time
VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/emotion-recognition"
def poll_result(polling_url, polling_interval=5):
"""Poll the task endpoint until processing completes."""
while True:
polling_task_response = requests.get(polling_url)
polling_task_response.raise_for_status()
polling_task_response_json = polling_task_response.json()
task_state = polling_task_response_json["task"]["state"]
if task_state in {"done", "failed", "rejected"}:
break
time.sleep(polling_interval)
return polling_task_response
def run_media_based_task(media_file, params=None, config=None):
"""Create a media-based task and wait for results."""
if params is None:
params = {}
if config is None:
config = {}
with open(media_file, mode="rb") as file:
files = {"file": file}
start_task_response = requests.post(
url=MEDIA_FILE_BASED_ENDPOINT_URL,
files=files,
params=params,
data={"config": json.dumps(config)},
)
start_task_response.raise_for_status()
polling_url = start_task_response.headers["Location"]
task_result = poll_result(polling_url)
return task_result.json()
# Run Emotion Recognition
media_files = [
"Barbara.wav",
"David.wav",
"Jack.wav",
"Jack_Keith.wav",
"Jiri.wav",
"Juan.wav",
"Laura_Marek.wav",
"Steve.wav",
]
for media_file in media_files:
print(f"Running Emotion Recognition for file {media_file}.")
media_file_based_task = run_media_based_task(media_file)
media_file_based_task_result = media_file_based_task["result"]
print(json.dumps(media_file_based_task_result, indent=2))