Sean's Blog

Postgres as Vector DB - a benchmark

There is a flood of vector databases - which ones are actually useful? IMO extending a relational DBMS with ACID compliance and existing datasets, is for most use cases the ideal choice. Using a dedicated vector DB like (Chroma, Vespa, Turbopuffer, LanceDB, Milvus etc.) only makes sense for narrow use cases where no complicated meta-data filters are needed (e.g. just simple RAG) and data synchronisation is no issue.

So let's have a look how we can store and search vectors using Postgres - there are three notable extensions for Postgres: pgvector, pgvectorscale and vectorchord.

PGVector

Standard extension to store vectors with common medium scale ANN indices (HNSW & IvfFlat). Integration package for Python: https://github.com/pgvector/pgvector-python/

Store vectors (float 32 bit):

CREATE TABLE items (id bigserial PRIMARY KEY, embedding vector(1024));

Store half-precision (float 16 bit) vectors:

CREATE TABLE items (id bigserial PRIMARY KEY, embedding halfvec(1024));

HNSW

Hierarchical navigable small world (HNSW) is a popular graph based ANN index - delivering good retrieval and QPS performance but at the cost of longer index build times and using more RAM for it (did crash for 1 million vectors on my 16GB RAM machine).

IvfFlat

Inverted File Flat (IvfFlat) is less RAM hungry than HNSW. With caveats: IvfFlat performance degrades as vectors are deleted and added (because centroids are not updated), thus needing regular index rebuilds (if new vectors get inserted regularly).

"As data gets inserted or deleted from the index, if the index is not rebuilt, the IVFFlat index in pgvector can return incorrect approximate nearest neighbors due to clustering centroids no longer fitting the data well"

IvfFlat Binary (with reranking)

Build IvfFlat index on binarized vectors -> on query: overfetch Kx10 nearest neighbors, then rerank using float16/32 vectors to get more accurate final K-NN.

-- Mean threshold binarization (performs better than pgvector's binarizer)
CREATE OR REPLACE FUNCTION binary_quantize_mean(vec halfvec) 
RETURNS varbit AS $
    SELECT string_agg(
        CASE WHEN val >= mean_val THEN '1' ELSE '0' END, 
        ''
    )::varbit
    FROM unnest(vec::real[]) AS val,
         (SELECT AVG(v) AS mean_val FROM unnest(vec::real[]) AS v) AS stats;
$ LANGUAGE sql IMMUTABLE PARALLEL SAFE;

-- Step 1: Add a stored generated binary column
ALTER TABLE table_name 
ADD COLUMN vector_bin bit(512) 
GENERATED ALWAYS AS (binary_quantize_mean(vector)::bit(512)) STORED;

-- Step 2: Create index on the pre-computed binary column
CREATE INDEX IF NOT EXISTS table_name_idx_ivfflat_bin 
ON table_name 
USING ivfflat (vector_bin bit_hamming_ops) 
WITH (lists = 100);

-- Query top 100 NN using reranking
SELECT t.id
FROM (
  SELECT id
  FROM table_name
  ORDER BY vector_bin <~> binary_quantize_mean(%s::halfvec(512))::bit(512)
  LIMIT 1000
) c
JOIN table_name t USING(id)
ORDER BY t.vector <=> %s::halfvec(512)
LIMIT 100;

PGVectorScale

DiskANN

A promising ANN index that uses RAM and disk (needs fast disk - SSD / NVMe) to scale to billions of vectors, promising low RAM usage while still providing decent QPS. The problem is that the pgvectorscale index building implementation is single core right now - leading to very long index generation times.

PGVectorScale supports pre-filtering using bitfields with manual meta-data table setup (complicated / bad dev ux).

VectorChord

VChordRQ

A custom ANN index with excellent performance (combining IVF ANN index with RaBitQ quantization). Supports pre-filtering (easy to use).

VChord can be configured to not copy all vectors into the index (which is the default and pgvector also does), reducing disk usage - it can be enabled (slightly degrading performance).

VChord also supports efficient range filters (limiting results by distance instead of K nearest neighbors).

VChordG (DiskANN)

A novel addition (not prodution ready yet): custom implementation of DiskANN index combined with RaBitQ quantization.

Benchmark

The benchmark should reflect realistic usage - right now it just measures index build and query times.

In the future I want to extend it:

  • measure insertion perormance after initial index is built
  • use real data embedding vectors -> simulate data distribution shift (insert many vectors after index was built)
  • simulate realistic complex SQL queries involving categorical and range filtering
  • benchmark vector scales: 100K, 1M, 10M, 100M, 1B

ANN Benchmark Results

For 450K text embeddings 1024D float32 vectors - measure the recall@100.

Method Query Latency (ms) Retrieval Recall Speedup vs Baseline Index Build Time (s) Index Size (MB)
Baseline (Brute Force) 1400.93 100.00% 1.00x - -
VectorChord (vchordrq) 468.64 100.00% 2.99x 1383.62 2229
pgvectorscale (DiskANN) 6.39 2.00% 219.22x 550.29 254
pgvector (HNSW) 611.54 100.00% 2.29x 1235.13 3555
pgvector (IVFFlat) 411.62 100.00% 3.40x 968.53 3561

Note: At 450K vectors, all approximate indices show strong speedups. HNSW, IVFFlat, and VectorChord achieve ~100% recall with 2-3.5x speedups. DiskANN has the fastest build time and best speed up (200x) but with significantly lower recall (2%).

Optimizing Vector & Index Storage

When things scale up one should strive for efficient vector storage using:

  • Vector Dimensionality Reduction
    • Product Quantization (PQ): reduces both dimensionality and bit representation
    • Matryoshka Embeddings: superior performance vs PQ as it learns the reduction in training and not post-training
    • PCA / UMAP
  • Scalar Quantization (Reduce bit representation per dimension)
    • FP16: half precision from FP32
    • binary: reduce to single bit
      • In pgvector: binary quantization will reduce any positive value to 1, and any zero or negative value to 0
      • Mean-based thresholding: binarize each dimension using its corpus-wide mean as the threshold, ensuring ~50/50 bit distribution per dimension for better information preservation and more discriminative binary codes.

We can see that using IvfFlat index on binary representation with a top-K factor of 10x (overfetching), then reranking in higher precision (float32 or float16) results excellent recall, low vector storage costs (float16) and very fast retrieval latencies.

A good trade-off seems to be using the first 512D (Matryoshka dims.), storing them as float16 and using ivf+binary(L500,P93,10x) index with float16 reranking delivers 13.59 ms retrieval latency with a 93% recall. Float16 for 512D vector storage is 322.3MB (4x reduction compared to 1024D float32) and index size is only 26.4MB (~33x reduction compared to vchordrq 1024D float16 index).

Though the reranking can lead to complications with queries using metadata filtering, so just using vchordrq index with 512D and float16 can also make sense.

The table below was computed on an 8 core Intel server using 300K CLIP Matryoshka text embeddings.

Dim Storage Index Latency (ms) Recall (%) Build (s) Storage (MB) Index (MB)
256 float32 vchordrq 0.77 36.0 30.55 322.3 379.1
256 float16 vchordrq 0.65 49.0 18.78 161.1 200.5
256 float32 ivfflat(L500,P31) 48.94 88.0 8.59 322.3 337.1
256 float16 ivfflat(L500,P31) 50.60 88.0 10.14 161.1 158.4
256 float32 hnsw+binary(ef1000,10x) 9.25 83.0 228.73 322.3 156.2
256 float16 hnsw+binary(ef1000,10x) 7.56 82.0 233.40 161.1 156.2
256 float32 ivf+binary(L500,P93,10x) 13.54 83.0 2.06 322.3 17.3
256 float16 ivf+binary(L500,P93,10x) 12.96 82.0 2.23 161.1 17.2
256 float32 exact-binary(10x) 117.95 82.0 0.00 322.3 0.0
256 float16 exact-binary(10x) 87.77 83.0 0.00 161.1 0.0
256 float32 exact-binary(1x) 39.80 35.0 0.00 322.3 0.0
256 float16 exact-binary(1x) 38.73 34.0 0.00 161.1 0.0
256 float32 exact 49.61 88.0 0.00 322.3 0.0
256 float16 exact 39.24 88.0 0.00 161.1 0.0
512 float32 vchordrq 25.56 96.0 40.21 644.5 844.3
512 float16 vchordrq 25.57 97.0 26.32 322.3 397.8
512 float32 ivfflat(L500,P31) 29.54 96.0 17.79 644.5 783.9
512 float16 ivfflat(L500,P31) 27.04 97.0 14.50 322.3 337.1
512 float32 hnsw+binary(ef1000,10x) 15.13 94.0 209.95 644.5 160.4
512 float16 hnsw+binary(ef1000,10x) 7.73 94.0 212.17 322.3 160.4
512 float32 ivf+binary(L500,P93,10x) 21.15 93.0 3.87 644.5 26.3
512 float16 ivf+binary(L500,P93,10x) 13.59 93.0 3.72 322.3 26.4
512 float32 exact-binary(10x) 57.11 92.0 0.00 644.5 0.0
512 float16 exact-binary(10x) 106.79 94.0 0.00 322.3 0.0
512 float32 exact-binary(1x) 32.77 49.0 0.00 644.5 0.0
512 float16 exact-binary(1x) 45.30 47.0 0.00 322.3 0.0
512 float32 exact 324.18 96.0 0.00 644.5 0.0
512 float16 exact 47.51 96.0 0.00 322.3 0.0
1024 float32 vchordrq 29.76 100.0 68.99 1289.1 1465.4
1024 float16 vchordrq 28.08 100.0 38.46 644.5 882.5
1024 float32 ivfflat(L500,P31) 29.09 100.0 39.74 1289.1 2347.7
1024 float16 ivfflat(L500,P31) 27.11 100.0 33.48 644.5 784.0
1024 float32 hnsw+binary(ef1000,10x) 12.63 100.0 201.75 1289.1 181.0
1024 float16 hnsw+binary(ef1000,10x) 13.32 100.0 200.80 644.5 181.0
1024 float32 ivf+binary(L500,P93,10x) 16.60 100.0 6.61 1289.1 44.9
1024 float16 ivf+binary(L500,P93,10x) 18.17 100.0 6.52 644.5 44.9
1024 float32 exact-binary(10x) 59.38 100.0 0.00 1289.1 0.0
1024 float16 exact-binary(10x) 62.07 100.0 0.00 644.5 0.0
1024 float32 exact-binary(1x) 29.04 55.0 0.00 1289.1 0.0
1024 float16 exact-binary(1x) 33.51 56.0 0.00 644.5 0.0
1024 float32 exact 480.70 100.0 0.00 1289.1 0.0
1024 float16 exact 337.50 100.0 0.00 644.5 0.0

Show me the code

Check out the code here

Conclusion

VectorChord is a good choice - providing superior performance and better developer experience (pre-filtering and better default settings). The vchordrq index is for most use cases the ideal choice as it delivers great performance and handles data distribution drift better than diskann indices. Using an ANN index only starts to make sense for big numbers of vectors (over 1 million).

References

#programming #machine-learning