Semantic search and workflows powered by language models
txtai is an open-source platform for semantic search and workflows powered by language models.
Traditional search systems use keywords to find data. Semantic search has an understanding of natural language and identifies results that have the same meaning, not necessarily the same keywords.
txtai builds embeddings databases, which are a union of vector indexes and relational databases. This enables similarity search with SQL. Embeddings databases can stand on their own and/or serve as a powerful knowledge source for large language model (LLM) prompts.
Semantic workflows connect language models together to build intelligent applications.
Integrate vector search, conversational search, automatic summarization, transcription, translation and more.
Summary of txtai features:
- 🔎 Similarity search with SQL, object storage, topic modeling, graph analysis, multiple vector index backends (Faiss, Annoy, Hnswlib) and support for external vector databases
- 📄 Create embeddings for text, documents, audio, images and video
- 💡 Pipelines powered by language models that run question-answering, labeling, transcription, translation, summarization, LLM prompts and more
- ↪️️ Workflows to join pipelines together and aggregate business logic. txtai processes can be simple microservices or multi-model workflows.
- ⚙️ Build with Python or YAML. API bindings available for JavaScript, Java, Rust and Go.
- ☁️ Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes)
The following applications are powered by txtai.
Application | Description |
---|---|
txtchat | Conversational search and workflows for all |
paperai | Semantic search and workflows for medical/scientific papers |
codequestion | Semantic search for developers |
tldrstory | Semantic search for headlines and story text |
txtai is built with Python 3.7+, Hugging Face Transformers, Sentence Transformers and FastAPI