Cross-Encoders Reranking

Work in Progress

This page is a work in progress and may not be complete.

For now this is just a tiny snippet how to use a cross-encoder to rerank results returned from Chroma. Soon we will provide a more detailed guide to the usefulness of cross-encoders/rerankers.

Hugging Face Cross Encoders

  1. from sentence_transformers import CrossEncoder
  2. import numpy as np
  3. import chromadb
  4. client = chromadb.Client()
  5. collection = client.get_or_create_collection("my_collection")
  6. # add some documents
  7. collection.add(ids=["doc1", "doc2", "doc3"], documents=["Hello, world!", "Hello, Chroma!", "Hello, Universe!"])
  8. # query the collection
  9. query = "Hello, world!"
  10. results = collection.query(query_texts=[query], n_results=3)
  11. model = CrossEncoder('cross-encoder/ms-marco-MiniLM-L-6-v2', max_length=512)
  12. # rerank the results with original query and documents returned from Chroma
  13. scores = model.predict([(query, doc) for doc in results["documents"][0]])
  14. # get the highest scoring document
  15. print(results["documents"][0][np.argmax(scores)])

May 23, 2024