Authenticity Verification

Experimental feature

Please note that Authenticity Verification is an experimental feature.
It is under development and may change in the future.

This guide demonstrates how to perform Authenticity Verification with Phonexia Speech Platform 4.

The Authenticity Verification technology enables you to determine if audio data is genuine. The technology is designed to offer various operations for verifying audio authenticity.

Replay Attack Detection high level description.
Audio Manipulation Detection high level description.

We encourage you to read the documentation for each operation to learn more about their features and capabilities.

Model versions

Note that all example results were acquired by the specific model versions and may change in future releases.

Replay Attack Detection: beta:1.0.0
Audio Manipulation Detection: beta:1.0.0

In the guide, we'll be using the following media files. You can download them all together in the audio_files.zip archive.

filename	replay attack	audio manipulation
Bridget.wav	no	2 anomalies
Graham.wav	no	1 anomaly
Hans.wav	yes	no anomaly

At the end of this guide, you'll find the full Python code example that combines all the steps that will first be discussed separately. This guide should give you a comprehensive understanding of how to integrate Authenticity Verification into your own projects.

Prerequisites

Follow the prerequisites for setup of Virtual Appliance and Python environment as described in the Task lifecycle code examples.

Run Authenticity Verification (default scenario)

To run Authenticity Verification for a single media file, you should start by sending a POST request to the /api/technology/experimental/authenticity-verification endpoint. file is the only mandatory parameter. By default, all available operations are performed.

In Python, you can do this as follows:

import requests

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000"  # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/experimental/authenticity-verification"

media_file = "Bridget.wav"

with open(media_file, mode="rb") as file:
    files = {"file": file}
    start_task_response = requests.post(
        url=MEDIA_FILE_BASED_ENDPOINT_URL,
        files=files,
    )
print(start_task_response.status_code)  # Should print '202'

If the task has been successfully accepted, the 202 code will be returned together with a unique task ID in the response body. The task isn't processed immediately, but only scheduled for processing. You can check the current task status by polling for the result.

Polling

To obtain the final result, periodically query the task status until the task state changes to done, failed or rejected. The general polling procedure is described in detail in the Task lifecycle code examples.

Result for Authenticity Verification (all operations)

The result field of the task contains a dictionary of results for each operation of Authenticity Verification. Each operation result contains a channels list of independent results for each channel from the input media file.

For our sample data, the task result should look as follows:

{
  "task": {
    "task_id": "330f9d36-04e2-4b78-b4da-79bdd61aa7db",
    "state": "done"
  },
  "result": {
    "replay_attack_detection": {
      "channels": [
        {
          "channel_number": 0,
          "score": -2.017822027206421
        }
      ]
    },
    "audio_manipulation_detection": {
      "channels": [
        {
          "channel_number": 0,
          "segments": [
            {
              "score": 2.4159951210021973,
              "start_time": 0,
              "end_time": 0.9675
            },
            {
              "score": 2.4164342880249023,
              "start_time": 5.16,
              "end_time": 6.129875
            }
          ]
        }
      ]
    }
  }
}

All operations return a log-likelihood ratio score, a real number ranging from -infinity to +infinity. The decision threshold for all of them is 0. Suspicious files should have a score higher than 0, while genuine files should have a score lower than 0. The replay_attack_detection operation has a single score per channel within the channels list. Theaudio_manipulation_detection has a score per segment, and by default, it only returns suspicious segments.

Each operation has a typical score range established with an evaluation dataset. While rare, scores may occasionally fall outside the typical range. Typical score ranges may change over time in future model versions. Consult the technology documentation for details: Replay Attack Detection documentation and Audio Manipulation Detection documentation.

Run Authenticity Verification (advanced usage)

The following scenarios can be used to fine-tune the Authenticity Verification process.

Specify required operations

As mentioned earlier, the default scenario always includes all available operations. You can specify which operations you want to run by sending the POST request with the requested_operations query parameter in the form of a list of strings (any of audio_manipulation_detection, and replay_attack_detection). For example, you can use the following snippet to run only Replay Attack Detection:

import requests

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000"  # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/experimental/authenticity-verification"

media_file = "Bridget.wav"

with open(media_file, mode="rb") as file:
  files = {"file": file}
  start_task_response = requests.post(
      url=MEDIA_FILE_BASED_ENDPOINT_URL,
      files=files,
      params={"requested_operations": ["replay_attack_detection"]},
  )
print(start_task_response.status_code)  # Should print '202'

Although the requests library allows you to pass the requested_operations parameter as a list, the HTTP request looks different. Multiple values for a single query parameter are passed by repeating the parameter key, e.g., ?id=1&id=2&id=3.

Raw segments for Audio Manipulation Detection

By default, Audio Manipulation Detection applies additional logic to the output segments to produce a more user-friendly result. If you want to build your own detection logic, you can enable raw segmentation. The result will then include all segments, even if they are considered genuine, and the segments will be more granular (typically a few hundred milliseconds).

import json
import requests

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000"  # Replace with your address
MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/experimental/authenticity-verification"

media_file = "Bridget.wav"
config = {
    "audio_manipulation_detection": {
        "raw_segmentation": True
    },
}

with open(media_file, mode="rb") as file:
  files = {"file": file}
  start_task_response = requests.post(
      url=MEDIA_FILE_BASED_ENDPOINT_URL,
      files=files,
      data={"config": json.dumps(config)},
  )
print(start_task_response.status_code)  # Should print '202'

Full Python code

Here is the full example of how to run the Authenticity Verification technology in the default configuration. The code is slightly adjusted and wrapped into functions for better readability. Refer to the Task lifecycle code examples for a generic code template, applicable to all technologies.

import json
import requests
import time

VIRTUAL_APPLIANCE_ADDRESS = "http://<virtual-appliance-address>:8000"  # Replace with your address

MEDIA_FILE_BASED_ENDPOINT_URL = f"{VIRTUAL_APPLIANCE_ADDRESS}/api/technology/experimental/authenticity-verification"


def poll_result(polling_url, polling_interval=5):
    """Poll the task endpoint until processing completes."""
    while True:
        polling_task_response = requests.get(polling_url)
        polling_task_response.raise_for_status()
        polling_task_response_json = polling_task_response.json()
        task_state = polling_task_response_json["task"]["state"]
        if task_state in {"done", "failed", "rejected"}:
            break
        time.sleep(polling_interval)
    return polling_task_response


def run_media_based_task(media_file, params=None, config=None):
    """Create a media-based task and wait for results."""
    if params is None:
        params = {}
    if config is None:
        config = {}

    with open(media_file, mode="rb") as file:
        files = {"file": file}
        start_task_response = requests.post(
            url=MEDIA_FILE_BASED_ENDPOINT_URL,
            files=files,
            params=params,
            data={"config": json.dumps(config)},
        )
        start_task_response.raise_for_status()
    polling_url = start_task_response.headers["Location"]
    task_result = poll_result(polling_url)
    return task_result.json()


# Run Authenticity Verification
media_files = ["Bridget.wav", "Graham.wav", "Hans.wav"]

for media_file in media_files:
    print(f"Running Authenticity Verification for file {media_file}.")
    media_file_based_task = run_media_based_task(media_file)
    media_file_based_task_result = media_file_based_task["result"]
    print(json.dumps(media_file_based_task_result, indent=2))

Prerequisites​

Run Authenticity Verification (default scenario)​

Polling​

Result for Authenticity Verification (all operations)​

Run Authenticity Verification (advanced usage)​

Specify required operations​

Raw segments for Audio Manipulation Detection​

Full Python code​