Graph

Enable graph storage via the graph parameter. This component requires the graph extras package.

When enabled, a graph network is built using the embeddings index. Graph nodes are synced with each embeddings index operation (index/upsert/delete). Graph edges are created using the embeddings index upon completion of each index/upsert/delete embeddings index call.

backend

  1. backend: networkx|rdbms|custom

Sets the graph backend. Defaults to networkx.

Add custom graph storage engines via setting this parameter to the fully resolvable class string.

The rdbms backend has the following additional settings.

rdbms

  1. url: database url connection string, alternatively can be set via the
  2. GRAPH_URL environment variable
  3. nodes: table to store node data, defaults to `nodes`
  4. edges: table to store edge data, defaults to `edges`

batchsize

  1. batchsize: int

Batch query size, used to query embeddings index - defaults to 256.

limit

  1. limit: int

Maximum number of results to return per embeddings query - defaults to 15.

minscore

  1. minscore: float

Minimum score required to consider embeddings query matches - defaults to 0.1.

approximate

  1. approximate: boolean

When true, queries only run for nodes without edges - defaults to true.

topics

  1. topics:
  2. algorithm: community detection algorithm (string), options are
  3. louvain (default), greedy, lpa
  4. level: controls number of topics (string), options are best (default) or first
  5. resolution: controls number of topics (int), larger values create more
  6. topics (int), defaults to 100
  7. labels: scoring index method used to build topic labels (string)
  8. options are bm25 (default), tfidf, sif
  9. terms: number of frequent terms to use for topic labels (int), defaults to 4
  10. stopwords: optional list of stop words to exclude from topic labels
  11. categories: optional list of categories used to group topics, allows
  12. granular topics with broad categories grouping topics

Enables topic modeling. Defaults are tuned so that in most cases these values don’t need to be changed (except for categories). These parameters are available for advanced use cases where one wants full control over the community detection process.