Embedding Functions GPU Support

By default, Chroma does not require GPU support for embedding functions. However, if you want to use GPU support, some of the functions, especially those running locally provide GPU support.

Default Embedding Functions (Onnxruntime)

To use the default embedding functions with GPU support, you need to install onnxruntime-gpu package. You can install it with the following command:

  1. pip install onnxruntime-gpu

Note: To ensure no conflicts, you can uninstall onnxruntime (e.g. pip uninstall onnxruntime) in a separate environment.

List available providers:

  1. import onnxruntime
  2. print(onnxruntime.get_available_providers())

Select the desired provider and set it as preferred before using the embedding functions (in the below example, we use CUDAExecutionProvider):

  1. import time
  2. from chromadb.utils.embedding_functions import ONNXMiniLM_L6_V2
  3. ef = ONNXMiniLM_L6_V2(preferred_providers=['CUDAExecutionProvider'])
  4. docs = []
  5. for i in range(1000):
  6. docs.append(f"this is a document with id {i}")
  7. start_time = time.perf_counter()
  8. embeddings = ef(docs)
  9. end_time = time.perf_counter()
  10. print(f"Elapsed time: {end_time - start_time} seconds")

IMPORTANT OBSERVATION: Our observations are that for GPU support using sentence transformers with model all-MiniLM-L6-v2 outperforms onnxruntime with GPU support. In practical terms on a Colab T4 GPU, the onnxruntime example above runs for about 100s whereas the equivalent sentence transformers example runs for about 1.8s.

Sentence Transformers

  1. import time
  2. from chromadb.utils.embedding_functions import SentenceTransformerEmbeddingFunction
  3. # This will download the model to your machine and set it up for GPU support
  4. ef = SentenceTransformerEmbeddingFunction(model_name="thenlper/gte-small", device="cuda")
  5. # Test with 10k documents
  6. docs = []
  7. for i in range(10000):
  8. docs.append(f"this is a document with id {i}")
  9. start_time = time.perf_counter()
  10. embeddings = ef(docs)
  11. end_time = time.perf_counter()
  12. print(f"Elapsed time: {end_time - start_time} seconds")

Note: You can run the above example in google Colab - see the notebook

OpenCLIP

Prior to PR #1806, we simply used the torch package to load the model and run it on the GPU.

  1. import chromadb
  2. from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
  3. from chromadb.utils.data_loaders import ImageLoader
  4. import toch
  5. import os
  6. IMAGE_FOLDER = "images"
  7. toch.device("cuda")
  8. embedding_function = OpenCLIPEmbeddingFunction()
  9. image_loader = ImageLoader()
  10. client = chromadb.PersistentClient(path="my_local_data")
  11. collection = client.create_collection(
  12. name='multimodal_collection',
  13. embedding_function=embedding_function,
  14. data_loader=image_loader)
  15. image_uris = sorted([os.path.join(IMAGE_FOLDER, image_name) for image_name in os.listdir(IMAGE_FOLDER)])
  16. ids = [str(i) for i in range(len(image_uris))]
  17. collection.add(ids=ids, uris=image_uris)

After PR #1806:

  1. from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
  2. embedding_function = OpenCLIPEmbeddingFunction(device="cuda")

March 4, 2024