Examples

examples examples

See below for a comprehensive series of example notebooks and applications covering txtai.

Build semantic/similarity/vector/neural search applications.

NotebookDescription
Introducing txtai ▶️Overview of the functionality provided by txtaiOpen In Colab
Build an Embeddings index with Hugging Face DatasetsIndex and search Hugging Face DatasetsOpen In Colab
Build an Embeddings index from a data sourceIndex and search a data source with word embeddingsOpen In Colab
Add semantic search to ElasticsearchAdd semantic search to existing search systemsOpen In Colab
Similarity search with imagesEmbed images and text into the same space for searchOpen In Colab
Custom Embeddings SQL functionsAdd user-defined functions to Embeddings SQLOpen In Colab
Model explainabilityExplainability for semantic searchOpen In Colab
Query translationDomain-specific natural language queries with query translationOpen In Colab
Build a QA databaseQuestion matching with semantic searchOpen In Colab
Semantic GraphsExplore topics, data connectivity and run network analysisOpen In Colab
Topic Modeling with BM25Topic modeling backed by a BM25 indexOpen In Colab

LLM

LLM chains, retrieval augmented generation (RAG), chat with your data, pipelines and workflows that interface with large language models (LLMs).

NotebookDescription
Prompt-driven search with LLMsEmbeddings-guided and Prompt-driven search with Large Language Models (LLMs)Open In Colab
Prompt templates and task chainsBuild model prompts and connect tasks together with workflowsOpen In Colab
Build RAG pipelines with txtaiGuide on retrieval augmented generation including how to create citationsOpen In Colab
Integrate LLM frameworksIntegrate llama.cpp, LiteLLM and custom generation frameworksOpen In Colab
Generate knowledge with Semantic Graphs and RAGKnowledge exploration and discovery with Semantic Graphs and RAGOpen In Colab
Build knowledge graphs with LLMsBuild knowledge graphs with LLM-driven entity extractionOpen In Colab
Advanced RAG with graph path traversalGraph path traversal to collect complex sets of data for advanced RAGOpen In Colab
Advanced RAG with guided generationRetrieval Augmented and Guided GenerationOpen In Colab
RAG with llama.cpp and external API servicesRAG with additional vector and LLM frameworksOpen In Colab
How RAG with txtai worksCreate RAG processes, API services and Docker instancesOpen In Colab
Speech to Speech RAG ▶️Full cycle speech to speech workflow with RAGOpen In Colab

Pipelines

Transform data with language model backed pipelines.

NotebookDescription
Extractive QA with txtaiIntroduction to extractive question-answering with txtaiOpen In Colab
Extractive QA with ElasticsearchRun extractive question-answering queries with ElasticsearchOpen In Colab
Extractive QA to build structured dataBuild structured datasets using extractive question-answeringOpen In Colab
Apply labels with zero shot classificationUse zero shot learning for labeling, classification and topic modelingOpen In Colab
Building abstractive text summariesRun abstractive text summarizationOpen In Colab
Extract text from documentsExtract text from PDF, Office, HTML and moreOpen In Colab
Text to speech generationGenerate speech from textOpen In Colab
Transcribe audio to textConvert audio files to textOpen In Colab
Translate text between languagesStreamline machine translation and language detectionOpen In Colab
Generate image captions and detect objectsCaptions and object detection for imagesOpen In Colab
Near duplicate image detectionIdentify duplicate and near-duplicate imagesOpen In Colab

Workflows

Efficiently process data at scale.

NotebookDescription
Run pipeline workflows ▶️Simple yet powerful constructs to efficiently process dataOpen In Colab
Transform tabular data with composable workflowsTransform, index and search tabular dataOpen In Colab
Tensor workflowsPerformant processing of large tensor arraysOpen In Colab
Entity extraction workflowsIdentify entity/label combinationsOpen In Colab
Workflow SchedulingSchedule workflows with cron expressionsOpen In Colab
Push notifications with workflowsGenerate and push notifications with workflowsOpen In Colab
Pictures are a worth a thousand wordsGenerate webpage summary images with DALL-E miniOpen In Colab
Run txtai with native codeExecute workflows in native code with the Python C APIOpen In Colab
Generative AudioStorytelling with generative audio workflowsOpen In Colab

Model Training

Train NLP models.

NotebookDescription
Train a text labelerBuild text sequence classification modelsOpen In Colab
Train without labelsUse zero-shot classifiers to train new modelsOpen In Colab
Train a QA modelBuild and fine-tune question-answering modelsOpen In Colab
Train a language model from scratchBuild new language modelsOpen In Colab
Export and run models with ONNXExport models with ONNX, run natively in JavaScript, Java and RustOpen In Colab
Export and run other machine learning modelsExport and run models from scikit-learn, PyTorch and moreOpen In Colab

API

Run distributed txtai, integrate with the API and cloud endpoints.

NotebookDescription
API GalleryUsing txtai in JavaScript, Java, Rust and GoOpen In Colab
Distributed embeddings clusterDistribute an embeddings index across multiple data nodesOpen In Colab
Embeddings in the CloudLoad and use an embeddings index from the Hugging Face HubOpen In Colab
Custom API EndpointsExtend the API with custom endpointsOpen In Colab
API Authorization and AuthenticationAdd authorization, authentication and middleware dependencies to the APIOpen In Colab

Architecture

Project architecture, data formats, external integrations, scale to production, benchmarks, and performance.

NotebookDescription
Anatomy of a txtai indexDeep dive into the file formats behind a txtai embeddings indexOpen In Colab
Embeddings componentsComposable search with vector, SQL and scoring componentsOpen In Colab
Customize your own embeddings databaseWays to combine vector indexes with relational databasesOpen In Colab
Building an efficient sparse keyword index in PythonFast and accurate sparse keyword indexingOpen In Colab
Benefits of hybrid searchImprove accuracy with a combination of semantic and keyword searchOpen In Colab
External database integrationStore metadata in PostgreSQL, MariaDB, MySQL and moreOpen In Colab
All about vector quantizationBenchmarking scalar and product quantization methodsOpen In Colab
External vectorizationVectorization with precomputed embeddings datasets and APIsOpen In Colab
Integrate txtai with PostgresPersist content, vectors and graph data in PostgresOpen In Colab
Embeddings index format for open data accessPlatform and programming language independent data storage with txtaiOpen In Colab

Releases

New functionality added in major releases.

NotebookDescription
What’s new in txtai 4.0Content storage, SQL, object storage, reindex and compressed indexesOpen In Colab
What’s new in txtai 6.0Sparse, hybrid and subindexes for embeddings, LLM improvementsOpen In Colab
What’s new in txtai 7.0Semantic graph 2.0, LoRA/QLoRA training and binary API supportOpen In Colab

Applications

Series of example applications with txtai. Links to hosted versions on Hugging Face Spaces are also provided, when available.

ApplicationDescription
Basic similarity searchBasic similarity search example. Data from the original txtai demo.🤗
Baseball statsMatch historical baseball player stats using vector search.🤗
BenchmarksCalculate performance metrics for the BEIR datasets.Local run only
Book searchBook similarity search application. Index book descriptions and query using natural language statements.Local run only
Image searchImage similarity search application. Index a directory of images and run searches to identify images similar to the input query.🤗
Retrieval Augmented GenerationRAG with txtai embeddings databases. Ask questions and get answers from LLMs bound by a context.Local run only
Summarize an articleSummarize an article. Workflow that extracts text from a webpage and builds a summary.🤗
Wiki searchWikipedia search application. Queries Wikipedia API and summarizes the top result.🤗
Workflow builderBuild and execute txtai workflows. Connect summarization, text extraction, transcription, translation and similarity search pipelines together to run unified workflows.🤗