Vector Database Integration
Vector databases are specialized database systems designed to efficiently store, index, and query high-dimensional vector embeddings. When combined with Phonexia's voiceprint-extraction microservice, they enable powerful speaker identification capabilities at scale, allowing you to perform fast similarity searches across millions of voiceprints.
What is a vector database?
A vector database is optimized for storing and querying vector embeddings - numerical representations of data in high-dimensional space. Unlike traditional databases that work with structured data (rows, columns, keys), vector databases excel at similarity search: finding vectors that are "close" to a query vector based on distance metrics.
Key features
- High-dimensional vector storage: Store vectors with hundreds or thousands of dimensions
- Similarity search: Find nearest neighbors using various distance metrics (cosine, L2, inner product)
- Scalability: Handle millions to billions of vectors efficiently
- Fast approximate nearest neighbor (ANN) search: Sacrifice minimal accuracy for significant speed improvements
- ACID compliance: Ensure data integrity and reliability (in databases like PostgreSQL with pgvector)
Use cases for Speaker Identification
Vector databases unlock several powerful use cases when integrated with Speaker Identification technology:
1. Large-scale speaker search
Search for a specific speaker across millions of recordings in milliseconds. Instead of comparing a voiceprint against every entry in your database, vector databases use specialized indexes to reduce search time dramatically.
Example scenario: A law enforcement agency needs to find all occurrences of a suspect's voice across a database of 10 million recordings. With a vector database, this search can be completed in under a second.
2. Speaker clustering and grouping
Identify groups of recordings from the same speaker, even when their identity is unknown. This is useful for:
- Organizing large audio archives
- Fraud detection (finding coordinated fraud attempts)
- Customer behavior analysis
3. Real-time speaker verification
Build authentication systems that verify speaker identity in real-time by comparing incoming voiceprints against stored reference voiceprints in a vector database.
pgvector: PostgreSQL vector extension
pgvector is an open source extension that adds vector similarity search capabilities to PostgreSQL. It's an excellent choice for production deployments because:
- Familiar infrastructure: Use PostgreSQL, which you may already have deployed
- ACID compliance: Get all PostgreSQL benefits (transactions, backups, replication)
- Multiple indexing strategies: Choose between HNSW (better recall) or IVFFlat (faster builds)
- Rich query capabilities: Combine vector search with traditional SQL queries and JOINs
Distance metrics
pgvector supports multiple distance functions. For Phonexia voiceprints, cosine distance is recommended:
<=>- Cosine distance (recommended)<->- L2 distance (Euclidean)<#>- Inner product (negative)<+>- L1 distance (Manhattan)
Installation and setup
Step 1: Install pgvector extension (docker)
docker run -d \
--name pgvector-db \
-e POSTGRES_PASSWORD=mysecretpassword \
-e POSTGRES_DB=voiceprints \
-p 5432:5432 \
pgvector/pgvector:pg17
Step 2: Enable the extension
Connect to your PostgreSQL database and enable the extension:
CREATE EXTENSION vector;
Step 3: Create tables for voiceprints
Create a table to store speaker voiceprints. The vector dimension depends on your Phonexia model:
CREATE TABLE speakers (
id SERIAL PRIMARY KEY,
speaker_id VARCHAR(255) UNIQUE NOT NULL,
vector_voiceprint VECTOR(128),
speech_length_seconds FLOAT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
Step 4: Search for similar voiceprints
First, you need to have a vector voiceprint to search for. You can obtain it from gRPC API. See Using vector voiceprints with gRPC API section for details. Then, you can search for similar voiceprints using the following query:
SELECT speaker_id, vector_voiceprint <=> wanted_vector_voiceprint AS distance
FROM speakers
ORDER BY distance ASC
LIMIT 10;
Using vector voiceprints with gRPC API
Phonexia's Speaker Identification microservice provides two ways to obtain vector voiceprints optimized for vector databases:
Method 1: Extract with vector voiceprint enabled
Extract both standard voiceprint and vector voiceprint in a single request:
import grpc
from phonexia.grpc.technologies.speaker_identification.v1 import (
speaker_identification_pb2 as sid_pb2,
speaker_identification_pb2_grpc as sid_pb2_grpc
)
from phonexia.grpc.common import core_pb2
# Create channel and stub
channel = grpc.insecure_channel('localhost:8080')
stub = sid_pb2_grpc.VoiceprintExtractionStub(channel)
# Read audio file
with open('audio.wav', 'rb') as f:
audio_data = f.read()
# Configure extraction with vector voiceprint enabled
config = sid_pb2.ExtractConfig(enable_vector_voiceprint=True)
# Create request
audio = core_pb2.Audio(content=audio_data)
request = sid_pb2.ExtractRequest(
audio=audio,
config=config
)
# Extract voiceprint
response = stub.Extract(iter([request]))
# Get vector voiceprint
vector_voiceprint = response.result.vector_voiceprint.values
speech_length = response.result.speech_length.seconds
print(f"Extracted vector {vector_voiceprint}")
print(f"Speech length: {speech_length} seconds")
Method 2: Convert existing voiceprint
Convert an existing standard voiceprint to vector format:
import grpc
from phonexia.grpc.common.core_pb2 import Voiceprint
from phonexia.grpc.technologies.speaker_identification.v1 import (
speaker_identification_pb2 as sid_pb2,
speaker_identification_pb2_grpc as sid_pb2_grpc
)
# Create channel and stub
channel = grpc.insecure_channel('localhost:8080')
stub = sid_pb2_grpc.VoiceprintConversionStub(channel)
# Load existing voiceprint
with open('david_1.vp', 'rb') as f:
voiceprint_data = f.read()
# Create conversion request
voiceprint = Voiceprint(content=voiceprint_data)
request = sid_pb2.ConvertRequest(
voiceprints=[voiceprint]
)
# Convert to vector format - this returns a streaming response
responses = stub.Convert(iter([request]))
vector_voiceprint = list(responses)[0].vector_voiceprints[0].values
print(f"Converted vector {vector_voiceprint}")
Conclusion
Integrating pgvector with Phonexia's voiceprint-extraction microservice enables powerful, scalable speaker identification capabilities. By leveraging vector databases, you can:
- Perform sub-second searches across millions of voiceprints
- Build sophisticated speaker recognition systems
- Scale to meet enterprise and law enforcement requirements
- Maintain data integrity with ACID compliance
For more information, see pgvector documentation.