Skip to main content

Gender Identification

This guide demonstrates how to perform Gender Identification with Phonexia Speech Platform 4. You can find a high-level description in the Gender Identification article. The technology can identify gender in media files or in voiceprints. This guide will show you how to do both.

For testing, we'll be using the following 17 media files in various languages. You can download them all together in the audio_files.zip archive.

filenamegenderfilenamegenderfilenamegender
Adedewe.wavmaleLenka.wavfemaleTatiana.wavfemale
Dina.wavfemaleLubica.wavfemaleThida.wavfemale
Fadimatu.wavfemaleLuka.wavmaleTuan.wavmale
Harry.wavmaleNirav.wavmaleXiang.wavmale
Juan.wavmaleNoam.wavmaleZoltan.wavmale
Julia.wavfemaleObioma.wavfemale

At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding on how to integrate Gender Identification in your own projects.

Prerequisites

Follow the prerequisites for setup of Virtual Appliance and Python environment as described in the Task lifecycle code examples.

Run Gender Identification from file

To run Gender Identification for a single media file, you should start by sending a POST request to the /api/technology/gender-identification endpoint. file is the only mandatory parameter. In Python, you can do this as follows:

import requests

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/gender-identification"

media_file = "Adedewe.wav"

with open(media_file, mode="rb") as file:
files = {"file": file}
start_task_response = requests.post(
url=MEDIA_FILE_BASED_ENDPOINT_URL,
files=files,
)
print(start_task_response.status_code) # Should print '202'

If the task has been successfully accepted, the 202 code will be returned together with a unique task ID in the response body. The task isn't processed immediately, but only scheduled for processing. You can check the current task status by polling for the result.

Polling

To obtain the final result, periodically query the task status until the task state changes to done, failed or rejected. The general polling procedure is described in detail in the Task lifecycle code examples.

Result for Gender Identification from file

The result field of the task contains information about individual input media channels which can be identified by their channel_number. The speech_length field shows how much speech was used for producing the probabilities which are shown separately for male and female gender in the scores object.

The following JSON shows the result of a successful Gender Identification task for the Adedewe.wav file which shows that the gender was correctly identified as male with probability = 0.99434.

{
"task": {
"task_id": "db414429-2b56-46b2-bf53-51e2a813b6da",
"state": "done"
},
"result": {
"channels": [
{
"channel_number": 0,
"speech_length": 30.88,
"scores": {
"male": {
"probability": 0.99434
},
"female": {
"probability": 0.00566
}
}
}
]
}
}

Run Gender Identification from voiceprints

Gender identification can be performed on voiceprints extracted from media files with the Voiceprint Extraction technology. To run Voiceprint Extraction, follow the instructions in the Speaker Search technology guide.

For testing, we'll be using voiceprints extracted from the test recordings used in the Run Gender Identification from file section. You can find the voiceprints in the voiceprints.zip archive.

To run Gender Identification for a set of voiceprints, you should start by sending a POST request to the /api/technology/gender-identification-voiceprints endpoint. The list of voiceprints is the only mandatory field in the request body. In Python, you can do this as follows (each voiceprint is stored in a separate .vp file):

import requests

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address
VOICEPRINT_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/gender-identification-voiceprints"

voiceprint_files = [
"Adedewe.vp",
"Dina.vp",
"Fadimatu.vp",
"Harry.vp",
"Juan.vp",
"Julia.vp",
"Lenka.vp",
"Lubica.vp",
"Luka.vp",
"Nirav.vp",
"Noam.vp",
"Obioma.vp",
"Tatiana.vp",
"Thida.vp",
"Tuan.vp",
"Xiang.vp",
"Zoltan.vp",
]

voiceprints = []
for voiceprint_file in voiceprint_files:
with open(voiceprint_file) as f:
voiceprints.append(f.read())

start_task_response = requests.post(
url=VOICEPRINT_BASED_ENDPOINT_URL,
json={"voiceprints": voiceprints},
)
print(start_task_response.status_code) # Should print '202'

Polling

To obtain the final result, periodically query the task status until the task state changes to done, failed or rejected. The general polling procedure is described in detail in the Task lifecycle code examples.

Result for Gender Identification from voiceprints

The result field of the task contains the following output (shortened for readability). The voiceprint_scores come in the same order as the input voiceprints. Notice that the result for Adedewe.vp is exactly the same as when identified from media file.

{
"task": {
"task_id": "26776c99-ac92-4df0-a0f3-3beb1498f4ee",
"state": "done"
},
"result": {
"voiceprint_scores": [
{
"speech_length": 30.88,
"scores": {
"male": {
"probability": 0.99434
},
"female": {
"probability": 0.00566
}
}
},
{
"speech_length": 19.52,
"scores": {
"male": {
"probability": 0.00115
},
"female": {
"probability": 0.99885
}
}
},
{
"speech_length": 23.84,
"scores": {
"male": {
"probability": 0.1778
},
"female": {
"probability": 0.8222
}
}
},
...
]
}
}

Full Python Code

Here is the full example on how to run the Gender Identification technology with both files and voiceprints as input data. The code is slightly adjusted and wrapped into functions. Refer to the Task lifecycle code examples for a generic code template, applicable to all technologies.

The scores_from_file.json and scores_from_voiceprints.json files contain the results of the Gender Identification. Notice that the results are identical except for the filename extensions (wav vs vp) and the extra channel_number information in scores_from_file.json.

import json
import requests
import time

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000" # Replace with your address

MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/gender-identification"
VOICEPRINT_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/gender-identification-voiceprints"


def poll_result(polling_url, polling_interval=5):
"""Poll the task endpoint until processing completes."""
while True:
polling_task_response = requests.get(polling_url)
polling_task_response.raise_for_status()
polling_task_response_json = polling_task_response.json()
task_state = polling_task_response_json["task"]["state"]
if task_state in {"done", "failed", "rejected"}:
break
time.sleep(polling_interval)
return polling_task_response


def run_media_based_task(media_file, params=None, config=None):
"""Create a media-based task and wait for results."""
if params is None:
params = {}
if config is None:
config = {}

with open(media_file, mode="rb") as file:
files = {"file": file}
start_task_response = requests.post(
url=MEDIA_FILE_BASED_ENDPOINT_URL,
files=files,
params=params,
data={"config": json.dumps(config)},
)
start_task_response.raise_for_status()
polling_url = start_task_response.headers["Location"]
task_result = poll_result(polling_url)
return task_result.json()


def run_voiceprint_based_task(json_payload):
"""Create a voiceprint-based task and wait for results."""
start_task_response = requests.post(
url=VOICEPRINT_BASED_ENDPOINT_URL,
json=json_payload,
)
start_task_response.raise_for_status()
polling_url = start_task_response.headers["Location"]
task_result = poll_result(polling_url)
return task_result.json()


# Run Gender Identification from media files
media_files = [
"Adedewe.wav",
"Dina.wav",
"Fadimatu.wav",
"Harry.wav",
"Juan.wav",
"Julia.wav",
"Lenka.wav",
"Lubica.wav",
"Luka.wav",
"Nirav.wav",
"Noam.wav",
"Obioma.wav",
"Tatiana.wav",
"Thida.wav",
"Tuan.wav",
"Xiang.wav",
"Zoltan.wav",
]

media_file_based_results = {}
for media_file in media_files:
print(f"Running Gender Identification for file {media_file}.")
media_file_based_task = run_media_based_task(media_file)
# The files are mono-channel, so we access the result in the first channel (index 0)
media_file_based_task_result = media_file_based_task["result"]["channels"][0]
media_file_based_results[media_file] = media_file_based_task_result
print(f"The result for {media_file} is: {media_file_based_task_result}")

# Save the results to a file
with open("scores_from_file.json", "w") as output_file:
json.dump(media_file_based_results, output_file, indent=2)


# Run Gender Identification from voiceprints
voiceprint_files = [
"Adedewe.vp",
"Dina.vp",
"Fadimatu.vp",
"Harry.vp",
"Juan.vp",
"Julia.vp",
"Lenka.vp",
"Lubica.vp",
"Luka.vp",
"Nirav.vp",
"Noam.vp",
"Obioma.vp",
"Tatiana.vp",
"Thida.vp",
"Tuan.vp",
"Xiang.vp",
"Zoltan.vp",
]

voiceprints = []
for voiceprint_file in voiceprint_files:
with open(voiceprint_file) as f:
voiceprints.append(f.read())

print(f"Running Gender Identification for {len(voiceprint_files)} voiceprints.")
voiceprint_based_task = run_voiceprint_based_task({"voiceprints": voiceprints})
voiceprint_based_task_result = voiceprint_based_task["result"]["voiceprint_scores"]

# Map the results to input voiceprint names
results_per_voiceprint = {
filename: result
for filename, result in zip(voiceprint_files, voiceprint_based_task_result)
}

print(f"The results are: {results_per_voiceprint}")

# Save the results to a file
with open("scores_from_voiceprints.json", "w") as output_file:
json.dump(results_per_voiceprint, output_file, indent=2)