Filters
Chroma provides two types of filters:
- Metadata - filter documents based on metadata using
where
clause in either Collection.query()
or Collection.get()
- Document - filter documents based on document content using
where_document
in Collection.query()
or Collection.get()
.
Those familiar with MongoDB queries will find Chroma’s filters very similar.
Equality
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": "is_equal_to_this"}
)
Alternative syntax:
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": {"$eq": "is_equal_to_this"}}
)
Inequality
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": {"$ne": "is_not_equal_to_this"}}
)
Greater Than
Greater Than
The $gt
operator is only supported for numerical values - int or float values.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": {"$gt": 5}}
)
Greater Than or Equal
Greater Than or Equal
The $gte
operator is only supported for numerical values - int or float values.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": {"$gte": 5.1}}
)
Less Than
Less Than
The $lt
operator is only supported for numerical values - int or float values.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": {"$lt": 5}}
)
Less Than or Equal
Less Than or Equal
The $lte
operator is only supported for numerical values - int or float values.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": {"$lte": 5.1}}
)
In
In works on all data types - string, int, float, and bool.
In
The $in
operator is only supported for list of values of the same type.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": {"$in": ["value1", "value2"]}}
)
Not In
Not In works on all data types - string, int, float, and bool.
Not In
The $nin
operator is only supported for list of values of the same type.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"metadata_field": {"$nin": ["value1", "value2"]}}
)
Logical Operator: And
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"$and": [{"metadata_field1": "value1"}, {"metadata_field2": "value2"}]}
)
Logical Operators can be nested.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"$and": [{"metadata_field1": "value1"}, {"$or": [{"metadata_field2": "value2"}, {"metadata_field3": "value3"}]}]}
)
Logical Operator: Or
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where={"$or": [{"metadata_field1": "value1"}, {"metadata_field2": "value2"}]}
)
Document Filters
Contains
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where_document={"$contains": "search_string"}
)
Not Contains
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where_document={"$not_contains": "search_string"}
)
Logical Operator: And
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where_document={"$and": [{"$contains": "search_string1"}, {"$contains": "search_string2"}]}
)
Logical Operators can be nested.
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where_document={"$and": [{"$contains": "search_string1"}, {"$or": [{"$not_contains": "search_string2"}, {"$not_contains": "search_string3"}]}]}
)
Logical Operator: Or
results = collection.query(
query_texts=["This is a query document"],
n_results=2,
where_document={"$or": [{"$not_contains": "search_string1"}, {"$not_contains": "search_string2"}]}
)
July 1, 2024