Time-based Queries

Filtering Documents By Timestamps

In the example below, we create a collection with 100 documents, each with a random timestamp in the last two weeks. We then query the collection for documents that were created in the last week.

The example demonstrates how Chroma metadata can be leveraged to filter documents based on how recently they were added or updated.

  1. import uuid
  2. import chromadb
  3. import datetime
  4. import random
  5. now = datetime.datetime.now()
  6. two_weeks_ago = now - datetime.timedelta(days=14)
  7. dates = [
  8. two_weeks_ago + datetime.timedelta(days=random.randint(0, 14))
  9. for _ in range(100)
  10. ]
  11. dates = [int(date.timestamp()) for date in dates]
  12. # convert epoch seconds to iso format
  13. def iso_date(epoch_seconds): return datetime.datetime.fromtimestamp(
  14. epoch_seconds).isoformat()
  15. client = chromadb.EphemeralClient()
  16. col = client.get_or_create_collection("test")
  17. col.add(ids=[f"{uuid.uuid4()}" for _ in range(100)], documents=[
  18. f"document {i}" for i in range(100)], metadatas=[{"date": date} for date in dates])
  19. res = col.get(where={"date": {"$gt": (now - datetime.timedelta(days=7)).timestamp()}})
  20. for i in res['metadatas']:
  21. print(iso_date(i['date']))

Ref: https://gist.github.com/tazarov/3c9301d22ab863dca0b6fb1e5e3511b1

November 29, 2023