Examples
See below for a comprehensive series of example notebooks and applications covering txtai.
Semantic Search
Build semantic/similarity/vector/neural search applications.
Notebook | Description | |
---|---|---|
Introducing txtai ▶️ | Overview of the functionality provided by txtai | |
Build an Embeddings index with Hugging Face Datasets | Index and search Hugging Face Datasets | |
Build an Embeddings index from a data source | Index and search a data source with word embeddings | |
Add semantic search to Elasticsearch | Add semantic search to existing search systems | |
Similarity search with images | Embed images and text into the same space for search | |
Custom Embeddings SQL functions | Add user-defined functions to Embeddings SQL | |
Model explainability | Explainability for semantic search | |
Query translation | Domain-specific natural language queries with query translation | |
Build a QA database | Question matching with semantic search | |
Semantic Graphs | Explore topics, data connectivity and run network analysis | |
Topic Modeling with BM25 | Topic modeling backed by a BM25 index |
LLM
LLM chains, retrieval augmented generation (RAG), chat with your data, pipelines and workflows that interface with large language models (LLMs).
Notebook | Description | |
---|---|---|
Prompt-driven search with LLMs | Embeddings-guided and Prompt-driven search with Large Language Models (LLMs) | |
Prompt templates and task chains | Build model prompts and connect tasks together with workflows | |
Build RAG pipelines with txtai | Guide on retrieval augmented generation including how to create citations | |
Integrate LLM frameworks | Integrate llama.cpp, LiteLLM and custom generation frameworks | |
Generate knowledge with Semantic Graphs and RAG | Knowledge exploration and discovery with Semantic Graphs and RAG | |
Build knowledge graphs with LLMs | Build knowledge graphs with LLM-driven entity extraction | |
Advanced RAG with graph path traversal | Graph path traversal to collect complex sets of data for advanced RAG | |
Advanced RAG with guided generation | Retrieval Augmented and Guided Generation | |
RAG with llama.cpp and external API services | RAG with additional vector and LLM frameworks | |
How RAG with txtai works | Create RAG processes, API services and Docker instances | |
Speech to Speech RAG ▶️ | Full cycle speech to speech workflow with RAG |
Pipelines
Transform data with language model backed pipelines.
Notebook | Description | |
---|---|---|
Extractive QA with txtai | Introduction to extractive question-answering with txtai | |
Extractive QA with Elasticsearch | Run extractive question-answering queries with Elasticsearch | |
Extractive QA to build structured data | Build structured datasets using extractive question-answering | |
Apply labels with zero shot classification | Use zero shot learning for labeling, classification and topic modeling | |
Building abstractive text summaries | Run abstractive text summarization | |
Extract text from documents | Extract text from PDF, Office, HTML and more | |
Text to speech generation | Generate speech from text | |
Transcribe audio to text | Convert audio files to text | |
Translate text between languages | Streamline machine translation and language detection | |
Generate image captions and detect objects | Captions and object detection for images | |
Near duplicate image detection | Identify duplicate and near-duplicate images |
Workflows
Efficiently process data at scale.
Notebook | Description | |
---|---|---|
Run pipeline workflows ▶️ | Simple yet powerful constructs to efficiently process data | |
Transform tabular data with composable workflows | Transform, index and search tabular data | |
Tensor workflows | Performant processing of large tensor arrays | |
Entity extraction workflows | Identify entity/label combinations | |
Workflow Scheduling | Schedule workflows with cron expressions | |
Push notifications with workflows | Generate and push notifications with workflows | |
Pictures are a worth a thousand words | Generate webpage summary images with DALL-E mini | |
Run txtai with native code | Execute workflows in native code with the Python C API | |
Generative Audio | Storytelling with generative audio workflows |
Model Training
Train NLP models.
Notebook | Description | |
---|---|---|
Train a text labeler | Build text sequence classification models | |
Train without labels | Use zero-shot classifiers to train new models | |
Train a QA model | Build and fine-tune question-answering models | |
Train a language model from scratch | Build new language models | |
Export and run models with ONNX | Export models with ONNX, run natively in JavaScript, Java and Rust | |
Export and run other machine learning models | Export and run models from scikit-learn, PyTorch and more |
API
Run distributed txtai, integrate with the API and cloud endpoints.
Notebook | Description | |
---|---|---|
API Gallery | Using txtai in JavaScript, Java, Rust and Go | |
Distributed embeddings cluster | Distribute an embeddings index across multiple data nodes | |
Embeddings in the Cloud | Load and use an embeddings index from the Hugging Face Hub | |
Custom API Endpoints | Extend the API with custom endpoints | |
API Authorization and Authentication | Add authorization, authentication and middleware dependencies to the API |
Architecture
Project architecture, data formats, external integrations, scale to production, benchmarks, and performance.
Notebook | Description | |
---|---|---|
Anatomy of a txtai index | Deep dive into the file formats behind a txtai embeddings index | |
Embeddings components | Composable search with vector, SQL and scoring components | |
Customize your own embeddings database | Ways to combine vector indexes with relational databases | |
Building an efficient sparse keyword index in Python | Fast and accurate sparse keyword indexing | |
Benefits of hybrid search | Improve accuracy with a combination of semantic and keyword search | |
External database integration | Store metadata in PostgreSQL, MariaDB, MySQL and more | |
All about vector quantization | Benchmarking scalar and product quantization methods | |
External vectorization | Vectorization with precomputed embeddings datasets and APIs | |
Integrate txtai with Postgres | Persist content, vectors and graph data in Postgres | |
Embeddings index format for open data access | Platform and programming language independent data storage with txtai |
Releases
New functionality added in major releases.
Notebook | Description | |
---|---|---|
What’s new in txtai 4.0 | Content storage, SQL, object storage, reindex and compressed indexes | |
What’s new in txtai 6.0 | Sparse, hybrid and subindexes for embeddings, LLM improvements | |
What’s new in txtai 7.0 | Semantic graph 2.0, LoRA/QLoRA training and binary API support |
Applications
Series of example applications with txtai. Links to hosted versions on Hugging Face Spaces are also provided, when available.
Application | Description | |
---|---|---|
Basic similarity search | Basic similarity search example. Data from the original txtai demo. | 🤗 |
Baseball stats | Match historical baseball player stats using vector search. | 🤗 |
Benchmarks | Calculate performance metrics for the BEIR datasets. | Local run only |
Book search | Book similarity search application. Index book descriptions and query using natural language statements. | Local run only |
Image search | Image similarity search application. Index a directory of images and run searches to identify images similar to the input query. | 🤗 |
Retrieval Augmented Generation | RAG with txtai embeddings databases. Ask questions and get answers from LLMs bound by a context. | Local run only |
Summarize an article | Summarize an article. Workflow that extracts text from a webpage and builds a summary. | 🤗 |
Wiki search | Wikipedia search application. Queries Wikipedia API and summarizes the top result. | 🤗 |
Workflow builder | Build and execute txtai workflows. Connect summarization, text extraction, transcription, translation and similarity search pipelines together to run unified workflows. | 🤗 |