Naive Multi-tenancy Strategies

Single-note Chroma

The below strategies are applicable to single-node Chroma only. The strategies require your app to act as both PEP (Policy Enforcement Point) and PDP (Policy Decision Point) for authorization. This is a naive approach to multi-tenancy and is probably not suited for production environments, however it is a good and simple way to get started with multi-tenancy in Chroma.

Authorization

We are in the process of creating a list of articles on how to implement proper authorization in Chroma, leveraging the an external service and Chroma’s auth plugins. The first article of the series is available in Medium and will also be made available here soon.

Introduction

There are several multi-tenancy strategies available to users of Chroma. The actual strategy will depend on the needs of the user and the application. The strategies below apply to multi-user environments, but do no factor in partly-shared resources like groups or teams.

  • User-Per-Doc: In this scenario, the app maintains multiple collections and each collection document is associated with a single user.
  • User-Per-Collection: In this scenario, the app maintains multiple collections and each collection is associated with a single user.
  • User-Per-Database: In this scenario, the app maintains multiple databases with a single tenant and each database is associated with a single user.
  • User-Per-Tenant: In this scenario, the app maintains multiple tenants and each tenant is associated with a single user.

User-Per-Doc

The goal of this strategy is to grant user permissions to access individual documents.

multi-tenancy-user-per-doc.png

To implement this strategy you need to add some sort of user identification to each document that belongs to a user. For this example we will assume it is user_id.

  1. import chromadb
  2. client = chromadb.PersistentClient()
  3. collection = client.get_or_create_collection("my-collection")
  4. collection.add(
  5. documents=["This is document1", "This is document2"],
  6. metadatas=[{"user_id": "user1"}, {"user_id": "user2"}],
  7. ids=["doc1", "doc2"],
  8. )

At query time you will have to provide the user_id as a filter to your query like so:

  1. results = collection.query(
  2. query_texts=["This is a query document"],
  3. where=[{"user_id": "user1"}],
  4. )

To successfully implement this strategy your code needs to consistently add and filter on the user_id metadata to ensure separation of data.

Drawbacks:

  • Error-prone: Messing up the filtering can lead to data being leaked across users.
  • Scalability: As the number of users and documents grow, doing filtering on metadata can become slow.

User-Per-Collection

The goal of this strategy is to grant a user access to all documents in a collection.

multi-tenancy-user-per-collection.png

To implement this strategy you need to create a collection for each user. For this example we will assume it is user_id.

  1. import chromadb
  2. client = chromadb.PersistentClient()
  3. user_id = "user1"
  4. collection = client.get_or_create_collection(f"user-collection:{user_id}")
  5. collection.add(
  6. documents=["This is document1", "This is document2"],
  7. ids=["doc1", "doc2"],
  8. )

At query time you will have to provide the user_id as a filter to your query like so:

  1. user_id = "user1"
  2. user_collection = client.get_collection(f"user-collection:{user_id}")
  3. results = user_collection.query(
  4. query_texts=["This is a query document"],
  5. )

To successfully implement this strategy your code needs to consistently create and query the correct collection for the user.

Drawbacks:

  • Error-prone: Messing up the collection name can lead to data being leaked across users.
  • Shared document search: If you want to maintain some documents shared then you will have to create a separate collection for those documents and allow users to query the shared collection as well.

User-Per-Database

The goal of this strategy is to associate a user with a single database thus granting them access to all collections and documents within the database.

multi-tenancy-user-per-db.png

  1. import chromadb
  2. from chromadb import DEFAULT_TENANT
  3. from chromadb import Settings
  4. adminClient = chromadb.AdminClient(Settings(
  5. is_persistent=True,
  6. persist_directory="multitenant",
  7. ))
  8. # For Remote Chroma server:
  9. #
  10. # adminClient= chromadb.AdminClient(Settings(
  11. # chroma_api_impl="chromadb.api.fastapi.FastAPI",
  12. # chroma_server_host="localhost",
  13. # chroma_server_http_port="8000",
  14. # ))
  15. def get_or_create_db_for_user(user_id):
  16. database = f"db:{user_id}"
  17. try:
  18. adminClient.get_database(database)
  19. except Exception as e:
  20. adminClient.create_database(database, DEFAULT_TENANT)
  21. return DEFAULT_TENANT, database
  22. user_id = "user_John"
  23. tenant, database = get_or_create_db_for_user(user_id)
  24. # replace with chromadb.HttpClient for remote Chroma server
  25. client = chromadb.PersistentClient(path="multitenant", tenant=tenant, database=database)
  26. collection = client.get_or_create_collection("user_collection")
  27. collection.add(
  28. documents=["This is document1", "This is document2"],
  29. ids=["doc1", "doc2"],
  30. )

In the above code we do the following:

  • We create or get a database for each user in the DEFAULT_TENANT using the chromadb.AdminClient.
  • We then create a PersistentClient for each user with the tenant and database we got from the AdminClient.
  • We then create or get collection and add data to it.

Drawbacks:

  • This strategy requires consistent management of tenants and databases and their use in the client application.

User-Per-Tenant

The goal of this strategy is to associate a user with a single tenant thus granting them access to all databases, collections, and documents within the tenant.

multi-tenancy-user-per-tenant.png

  1. import chromadb
  2. from chromadb import DEFAULT_DATABASE
  3. from chromadb import Settings
  4. adminClient = chromadb.AdminClient(Settings(
  5. chroma_api_impl="chromadb.api.segment.SegmentAPI",
  6. is_persistent=True,
  7. persist_directory="multitenant",
  8. ))
  9. # For Remote Chroma server:
  10. #
  11. # adminClient= chromadb.AdminClient(Settings(
  12. # chroma_api_impl="chromadb.api.fastapi.FastAPI",
  13. # chroma_server_host="localhost",
  14. # chroma_server_http_port="8000",
  15. # ))
  16. def get_or_create_tenant_for_user(user_id):
  17. tenant_id = f"tenant_user:{user_id}"
  18. try:
  19. adminClient.get_tenant(tenant_id)
  20. except Exception as e:
  21. adminClient.create_tenant(tenant_id)
  22. adminClient.create_database(DEFAULT_DATABASE, tenant_id)
  23. return tenant_id, DEFAULT_DATABASE
  24. user_id = "user1"
  25. tenant, database = get_or_create_tenant_for_user(user_id)
  26. # replace with chromadb.HttpClient for remote Chroma server
  27. client = chromadb.PersistentClient(path="multitenant", tenant=tenant, database=database)
  28. collection = client.get_or_create_collection("user_collection")
  29. collection.add(
  30. documents=["This is document1", "This is document2"],
  31. ids=["doc1", "doc2"],
  32. )

In the above code we do the following:

  • We create or get a tenant for each user with DEFAULT_DATABASE using the chromadb.AdminClient.
  • We then create a PersistentClient for each user with the tenant and database we got from the AdminClient.
  • We then create or get collection and add data to it.

Drawbacks:

  • This strategy requires consistent management of tenants and databases and their use in the client application.

April 29, 2024