Filters

Chroma provides two types of filters:

  • Metadata - filter documents based on metadata using where clause in either Collection.query() or Collection.get()
  • Document - filter documents based on document content using where_document in Collection.query() or Collection.get().

Those familiar with MongoDB queries will find Chroma’s filters very similar.

Metadata Filters

Equality

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": "is_equal_to_this"}
  5. )

Alternative syntax:

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": {"$eq": "is_equal_to_this"}}
  5. )

Inequality

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": {"$ne": "is_not_equal_to_this"}}
  5. )

Greater Than

Greater Than

The $gt operator is only supported for numerical values - int or float values.

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": {"$gt": 5}}
  5. )

Greater Than or Equal

Greater Than or Equal

The $gte operator is only supported for numerical values - int or float values.

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": {"$gte": 5.1}}
  5. )

Less Than

Less Than

The $lt operator is only supported for numerical values - int or float values.

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": {"$lt": 5}}
  5. )

Less Than or Equal

Less Than or Equal

The $lte operator is only supported for numerical values - int or float values.

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": {"$lte": 5.1}}
  5. )

In

In works on all data types - string, int, float, and bool.

In

The $in operator is only supported for list of values of the same type.

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": {"$in": ["value1", "value2"]}}
  5. )

Not In

Not In works on all data types - string, int, float, and bool.

Not In

The $nin operator is only supported for list of values of the same type.

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"metadata_field": {"$nin": ["value1", "value2"]}}
  5. )

Logical Operator: And

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"$and": [{"metadata_field1": "value1"}, {"metadata_field2": "value2"}]}
  5. )

Logical Operators can be nested.

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"$and": [{"metadata_field1": "value1"}, {"$or": [{"metadata_field2": "value2"}, {"metadata_field3": "value3"}]}]}
  5. )

Logical Operator: Or

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where={"$or": [{"metadata_field1": "value1"}, {"metadata_field2": "value2"}]}
  5. )

Document Filters

Contains

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where_document={"$contains": "search_string"}
  5. )

Not Contains

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where_document={"$not_contains": "search_string"}
  5. )

Logical Operator: And

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where_document={"$and": [{"$contains": "search_string1"}, {"$contains": "search_string2"}]}
  5. )

Logical Operators can be nested.

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where_document={"$and": [{"$contains": "search_string1"}, {"$or": [{"$not_contains": "search_string2"}, {"$not_contains": "search_string3"}]}]}
  5. )

Logical Operator: Or

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. n_results=2,
  4. where_document={"$or": [{"$not_contains": "search_string1"}, {"$not_contains": "search_string2"}]}
  5. )

July 1, 2024