Configuration
The following describes available embeddings configuration. These parameters are set in the Embeddings constructor via either the config
parameter or as keyword arguments.
Configuration is designed to be optional and set only when needed. Out of the box, sensible defaults are picked to get up and running fast. For example:
from txtai.embeddings import Embeddings
embeddings = Embeddings()
Creates a new embeddings instance, using all-MiniLM-L6-v2 as the vector model, Faiss as the ANN index backend and content disabled.
from txtai.embeddings import Embeddings
embeddings = Embeddings(content=True)
Is the same as above except it adds in SQLite for content storage.
The following sections link to all the available configuration options.
ANN
The default vector index backend is Faiss.
Cloud
Embeddings databases can optionally be synced with cloud storage.
Database
Content storage is disabled by default. When enabled, SQLite is the default storage engine.
General
General configuration that doesn’t fit elsewhere.
Graph
An accomplying graph index can be created with an embeddings database. This enables topic modeling, path traversal and more. NetworkX is the default graph index.
Scoring
Sparse keyword indexing and word vectors term weighting.
Vectors
Vector search is enabled by converting text and other binary data into embeddings vectors. These vectors are then stored in an ANN index. The vector model is optional and a default model is used when not provided.