Vector Database Integration

Vector databases are specialized database systems designed to efficiently store, index, and query high-dimensional vector embeddings. When combined with Phonexia's voiceprint-extraction microservice, they enable powerful speaker identification capabilities at scale, allowing you to perform fast similarity searches across millions of voiceprints.

What is a vector database?

A vector database is optimized for storing and querying vector embeddings - numerical representations of data in high-dimensional space. Unlike traditional databases that work with structured data (rows, columns, keys), vector databases excel at similarity search: finding vectors that are "close" to a query vector based on distance metrics.

Key features

High-dimensional vector storage: Store vectors with hundreds or thousands of dimensions
Similarity search: Find nearest neighbors using various distance metrics (cosine, L2, inner product)
Scalability: Handle millions to billions of vectors efficiently
Fast approximate nearest neighbor (ANN) search: Sacrifice minimal accuracy for significant speed improvements
ACID compliance: Ensure data integrity and reliability (in databases like PostgreSQL with pgvector)

Use cases for Speaker Identification

Vector databases unlock several powerful use cases when integrated with Speaker Identification technology:

1. Large-scale speaker search

Search for a specific speaker across millions of recordings in milliseconds. Instead of comparing a voiceprint against every entry in your database, vector databases use specialized indexes to reduce search time dramatically.

Example scenario: A law enforcement agency needs to find all occurrences of a suspect's voice across a database of 10 million recordings. With a vector database, this search can be completed in under a second.

2. Speaker clustering and grouping

Identify groups of recordings from the same speaker, even when their identity is unknown. This is useful for:

Organizing large audio archives
Fraud detection (finding coordinated fraud attempts)
Customer behavior analysis

3. Real-time speaker verification

Build authentication systems that verify speaker identity in real-time by comparing incoming voiceprints against stored reference voiceprints in a vector database.

pgvector: PostgreSQL vector extension

pgvector is an open source extension that adds vector similarity search capabilities to PostgreSQL. It's an excellent choice for production deployments because:

Familiar infrastructure: Use PostgreSQL, which you may already have deployed
ACID compliance: Get all PostgreSQL benefits (transactions, backups, replication)
Multiple indexing strategies: Choose between HNSW (better recall) or IVFFlat (faster builds)
Rich query capabilities: Combine vector search with traditional SQL queries and JOINs

Distance metrics

pgvector supports multiple distance functions. For Phonexia voiceprints, cosine distance is recommended:

<=> - Cosine distance (recommended)
<-> - L2 distance (Euclidean)
<#> - Inner product (negative)
<+> - L1 distance (Manhattan)

Installation and setup

Step 1: Install pgvector extension (docker)

docker run -d \
  --name pgvector-db \
  -e POSTGRES_PASSWORD=mysecretpassword \
  -e POSTGRES_DB=voiceprints \
  -p 5432:5432 \
  pgvector/pgvector:pg17

Step 2: Enable the extension

Connect to your PostgreSQL database and enable the extension:

CREATE EXTENSION vector;

Step 3: Create tables for voiceprints

Create a table to store speaker voiceprints. The vector dimension depends on your Phonexia model:

CREATE TABLE speakers (
    id SERIAL PRIMARY KEY,
    speaker_id VARCHAR(255) UNIQUE NOT NULL,
    vector_voiceprint VECTOR(128),
    speech_length_seconds FLOAT,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

Step 4: Search for similar voiceprints

First, you need to have a vector voiceprint to search for. You can obtain it from gRPC API. See Using vector voiceprints with gRPC API section for details. Then, you can search for similar voiceprints using the following query:

SELECT speaker_id, vector_voiceprint <=> wanted_vector_voiceprint AS distance
FROM speakers
ORDER BY distance ASC
LIMIT 10;

Using vector voiceprints with gRPC API

Phonexia's Speaker Identification microservice provides two ways to obtain vector voiceprints optimized for vector databases:

Method 1: Extract with vector voiceprint enabled

Extract both standard voiceprint and vector voiceprint in a single request:

import grpc
from phonexia.grpc.technologies.speaker_identification.v1 import (
    speaker_identification_pb2 as sid_pb2,
    speaker_identification_pb2_grpc as sid_pb2_grpc
)
from phonexia.grpc.common import core_pb2

# Create channel and stub
channel = grpc.insecure_channel('localhost:8080')
stub = sid_pb2_grpc.VoiceprintExtractionStub(channel)

# Read audio file
with open('audio.wav', 'rb') as f:
    audio_data = f.read()

# Configure extraction with vector voiceprint enabled
config = sid_pb2.ExtractConfig(enable_vector_voiceprint=True)

# Create request
audio = core_pb2.Audio(content=audio_data)
request = sid_pb2.ExtractRequest(
    audio=audio,
    config=config
)

# Extract voiceprint
response = stub.Extract(iter([request]))

# Get vector voiceprint
vector_voiceprint = response.result.vector_voiceprint.values
speech_length = response.result.speech_length.seconds

print(f"Extracted vector {vector_voiceprint}")
print(f"Speech length: {speech_length} seconds")

Method 2: Convert existing voiceprint

Convert an existing standard voiceprint to vector format:

import grpc
from phonexia.grpc.common.core_pb2 import Voiceprint
from phonexia.grpc.technologies.speaker_identification.v1 import (
    speaker_identification_pb2 as sid_pb2,
    speaker_identification_pb2_grpc as sid_pb2_grpc
)

# Create channel and stub
channel = grpc.insecure_channel('localhost:8080')
stub = sid_pb2_grpc.VoiceprintConversionStub(channel)

# Load existing voiceprint
with open('david_1.vp', 'rb') as f:
    voiceprint_data = f.read()

# Create conversion request
voiceprint = Voiceprint(content=voiceprint_data)
request = sid_pb2.ConvertRequest(
    voiceprints=[voiceprint]
)

# Convert to vector format - this returns a streaming response
responses = stub.Convert(iter([request]))
vector_voiceprint = list(responses)[0].vector_voiceprints[0].values

print(f"Converted vector {vector_voiceprint}")

Conclusion

Integrating pgvector with Phonexia's voiceprint-extraction microservice enables powerful, scalable speaker identification capabilities. By leveraging vector databases, you can:

Perform sub-second searches across millions of voiceprints
Build sophisticated speaker recognition systems
Scale to meet enterprise and law enforcement requirements
Maintain data integrity with ACID compliance

For more information, see pgvector documentation.

What is a vector database?​

Key features​

Use cases for Speaker Identification​

1. Large-scale speaker search​

2. Speaker clustering and grouping​

3. Real-time speaker verification​

pgvector: PostgreSQL vector extension​

Distance metrics​

Installation and setup​

Step 1: Install pgvector extension (docker)​

Step 2: Enable the extension​

Step 3: Create tables for voiceprints​

Step 4: Search for similar voiceprints​

Using vector voiceprints with gRPC API​

Method 1: Extract with vector voiceprint enabled​

Method 2: Convert existing voiceprint​

Conclusion​