Index format

format format

This section documents the txtai index format. Each component is designed to ensure open access to the underlying data in a programmatic and platform independent way

If an underlying library has an index format, that is used. Otherwise, txtai persists content with MessagePack serialization.

To learn more about how these components work together, read the Index Guide and Query Guide.

ANN

Approximate Nearest Neighbor (ANN) index configuration for storing vector embeddings.

ComponentStorage Format
FaissLocal file format provided by library
HnswlibLocal file format provided by library
AnnoyLocal file format provided by library
NumPyLocal NumPy array files via np.save / np.load
Postgres via pgvectorVector tables in a Postgres database

Core

Core embeddings index files.

ComponentStorage Format
ConfigurationEmbeddings index configuration stored as JSON
Index IdsEmbeddings index ids serialized with MessagePack. Only enabled when when content storage (database) is disabled.

Database

Databases store metadata, text and binary content.

ComponentStorage Format
SQLiteLocal database files with SQLite
DuckDBLocal database files with DuckDB
PostgresPostgres relational database via SQLAlchemy. Supports additional databases via this library.

Graph

Graph nodes and edges for an embeddings index

ComponentStorage Format
NetworkXNodes and edges exported to local file serialized with MessagePack
PostgresNodes and edges stored in a Postgres database. Supports additional databases.

Scoring

Sparse/keyword indexing

ComponentStorage Format
Local indexMetadata serialized with MessagePack. Terms stored in SQLite.
PostgresText indexed with Postgres Full Text Search (FTS)