Methods

API

Bases: Application

Base API template. The API is an extended txtai application, adding the ability to cluster API instances together.

Downstream applications can extend this base template to add/modify functionality.

Source code in txtai/api/base.py

  1. 12
  2. 13
  3. 14
  4. 15
  5. 16
  6. 17
  7. 18
  8. 19
  9. 20
  10. 21
  11. 22
  12. 23
  13. 24
  14. 25
  15. 26
  16. 27
  17. 28
  18. 29
  19. 30
  20. 31
  21. 32
  22. 33
  23. 34
  24. 35
  25. 36
  26. 37
  27. 38
  28. 39
  29. 40
  30. 41
  31. 42
  32. 43
  33. 44
  34. 45
  35. 46
  36. 47
  37. 48
  38. 49
  39. 50
  40. 51
  41. 52
  42. 53
  43. 54
  44. 55
  45. 56
  46. 57
  47. 58
  48. 59
  49. 60
  50. 61
  51. 62
  52. 63
  53. 64
  54. 65
  55. 66
  56. 67
  57. 68
  58. 69
  59. 70
  60. 71
  61. 72
  62. 73
  63. 74
  64. 75
  65. 76
  66. 77
  67. 78
  68. 79
  69. 80
  70. 81
  71. 82
  72. 83
  73. 84
  74. 85
  75. 86
  76. 87
  77. 88
  78. 89
  79. 90
  80. 91
  81. 92
  82. 93
  83. 94
  84. 95
  85. 96
  86. 97
  87. 98
  88. 99
  89. 100
  90. 101
  91. 102
  92. 103
  93. 104
  94. 105
  95. 106
  96. 107
  97. 108
  98. 109
  99. 110
  100. 111
  101. 112
  102. 113
  103. 114
  104. 115
  105. 116
  106. 117
  107. 118
  108. 119
  109. 120
  110. 121
  111. 122
  112. 123
  113. 124
  114. 125
  115. 126
  116. 127
  117. 128
  118. 129
  119. 130
  120. 131
  121. 132
  122. 133
  123. 134
  124. 135
  125. 136
  126. 137
  127. 138
  128. 139
  129. 140
  130. 141
  131. 142
  132. 143
  133. 144
  134. 145
  135. 146
  136. 147
  137. 148
  138. 149
  139. 150
  140. 151
  141. 152
  142. 153
  143. 154
  144. 155
  145. 156
  146. 157
  147. 158
  148. 159
  1. class API(Application):
  2. “””
  3. Base API template. The API is an extended txtai application, adding the ability to cluster API instances together.
  4. Downstream applications can extend this base template to add/modify functionality.
  5. “””
  6. def init(self, config, loaddata=True):
  7. super().init(config, loaddata)
  8. # Embeddings cluster
  9. self.cluster = None
  10. if self.config.get(“cluster”):
  11. self.cluster = Cluster(self.config[“cluster”])
  12. # pylint: disable=W0221
  13. def search(self, query, limit=None, weights=None, index=None, parameters=None, graph=False, request=None):
  14. # When search is invoked via the API, limit is set from the request
  15. # When search is invoked directly, limit is set using the method parameter
  16. limit = self.limit(request.query_params.get(“limit”) if request and hasattr(request, query_params”) else limit)
  17. weights = self.weights(request.query_params.get(“weights”) if request and hasattr(request, query_params”) else weights)
  18. index = request.query_params.get(“index”) if request and hasattr(request, query_params”) else index
  19. parameters = request.query_params.get(“parameters”) if request and hasattr(request, query_params”) else parameters
  20. graph = request.query_params.get(“graph”) if request and hasattr(request, query_params”) else graph
  21. # Decode parameters
  22. parameters = json.loads(parameters) if parameters and isinstance(parameters, str) else parameters
  23. if self.cluster:
  24. return self.cluster.search(query, limit, weights, index, parameters, graph)
  25. return super().search(query, limit, weights, index, parameters, graph)
  26. def batchsearch(self, queries, limit=None, weights=None, index=None, parameters=None, graph=False):
  27. if self.cluster:
  28. return self.cluster.batchsearch(queries, self.limit(limit), weights, index, parameters, graph)
  29. return super().batchsearch(queries, limit, weights, index, parameters, graph)
  30. def add(self, documents):
  31. “””
  32. Adds a batch of documents for indexing.
  33. Downstream applications can override this method to also store full documents in an external system.
  34. Args:
  35. documents: list of {id: value, text: value}
  36. Returns:
  37. unmodified input documents
  38. “””
  39. if self.cluster:
  40. self.cluster.add(documents)
  41. else:
  42. super().add(documents)
  43. return documents
  44. def index(self):
  45. “””
  46. Builds an embeddings index for previously batched documents.
  47. “””
  48. if self.cluster:
  49. self.cluster.index()
  50. else:
  51. super().index()
  52. def upsert(self):
  53. “””
  54. Runs an embeddings upsert operation for previously batched documents.
  55. “””
  56. if self.cluster:
  57. self.cluster.upsert()
  58. else:
  59. super().upsert()
  60. def delete(self, ids):
  61. “””
  62. Deletes from an embeddings index. Returns list of ids deleted.
  63. Args:
  64. ids: list of ids to delete
  65. Returns:
  66. ids deleted
  67. “””
  68. if self.cluster:
  69. return self.cluster.delete(ids)
  70. return super().delete(ids)
  71. def reindex(self, config, function=None):
  72. “””
  73. Recreates this embeddings index using config. This method only works if document content storage is enabled.
  74. Args:
  75. config: new config
  76. function: optional function to prepare content for indexing
  77. “””
  78. if self.cluster:
  79. self.cluster.reindex(config, function)
  80. else:
  81. super().reindex(config, function)
  82. def count(self):
  83. “””
  84. Total number of elements in this embeddings index.
  85. Returns:
  86. number of elements in embeddings index
  87. “””
  88. if self.cluster:
  89. return self.cluster.count()
  90. return super().count()
  91. def limit(self, limit):
  92. “””
  93. Parses the number of results to return from the request. Allows range of 1-250, with a default of 10.
  94. Args:
  95. limit: limit parameter
  96. Returns:
  97. bounded limit
  98. “””
  99. # Return between 1 and 250 results, defaults to 10
  100. return max(1, min(250, int(limit) if limit else 10))
  101. def weights(self, weights):
  102. “””
  103. Parses the weights parameter from the request.
  104. Args:
  105. weights: weights parameter
  106. Returns:
  107. weights
  108. “””
  109. return float(weights) if weights else weights

add(documents)

Adds a batch of documents for indexing.

Downstream applications can override this method to also store full documents in an external system.

Parameters:

NameTypeDescriptionDefault
documents

list of {id: value, text: value}

required

Returns:

TypeDescription

unmodified input documents

Source code in txtai/api/base.py

  1. 51
  2. 52
  3. 53
  4. 54
  5. 55
  6. 56
  7. 57
  8. 58
  9. 59
  10. 60
  11. 61
  12. 62
  13. 63
  14. 64
  15. 65
  16. 66
  17. 67
  18. 68
  19. 69
  1. def add(self, documents):
  2. “””
  3. Adds a batch of documents for indexing.
  4. Downstream applications can override this method to also store full documents in an external system.
  5. Args:
  6. documents: list of {id: value, text: value}
  7. Returns:
  8. unmodified input documents
  9. “””
  10. if self.cluster:
  11. self.cluster.add(documents)
  12. else:
  13. super().add(documents)
  14. return documents

addobject(data, uid, field)

Helper method that builds a batch of object documents.

Parameters:

NameTypeDescriptionDefault
data

object content

required
uid

optional list of corresponding uids

required
field

optional field to set

required

Returns:

TypeDescription

documents

Source code in txtai/app/base.py

  1. 402
  2. 403
  3. 404
  4. 405
  5. 406
  6. 407
  7. 408
  8. 409
  9. 410
  10. 411
  11. 412
  12. 413
  13. 414
  14. 415
  15. 416
  16. 417
  17. 418
  18. 419
  19. 420
  20. 421
  21. 422
  22. 423
  23. 424
  24. 425
  25. 426
  26. 427
  27. 428
  28. 429
  29. 430
  1. def addobject(self, data, uid, field):
  2. “””
  3. Helper method that builds a batch of object documents.
  4. Args:
  5. data: object content
  6. uid: optional list of corresponding uids
  7. field: optional field to set
  8. Returns:
  9. documents
  10. “””
  11. # Raise error if index is not writable
  12. if not self.config.get(“writable”):
  13. raise ReadOnlyError(“Attempting to add documents to a read-only index (writable != True)”)
  14. documents = []
  15. for x, content in enumerate(data):
  16. if field:
  17. row = {“id”: uid[x], field: content} if uid else {field: content}
  18. elif uid:
  19. row = (uid[x], content)
  20. else:
  21. row = content
  22. documents.append(row)
  23. return self.add(documents)

batchexplain(queries, texts=None, limit=10)

Explains the importance of each input token in text for a list of queries.

Parameters:

NameTypeDescriptionDefault
query

queries text

required
texts

optional list of text, otherwise runs search queries

None
limit

optional limit if texts is None

10

Returns:

TypeDescription

list of dict per input text per query where a higher token scores represents higher importance relative to the query

Source code in txtai/app/base.py

  1. 613
  2. 614
  3. 615
  4. 616
  5. 617
  6. 618
  7. 619
  8. 620
  9. 621
  10. 622
  11. 623
  12. 624
  13. 625
  14. 626
  15. 627
  16. 628
  17. 629
  18. 630
  1. def batchexplain(self, queries, texts=None, limit=10):
  2. “””
  3. Explains the importance of each input token in text for a list of queries.
  4. Args:
  5. query: queries text
  6. texts: optional list of text, otherwise runs search queries
  7. limit: optional limit if texts is None
  8. Returns:
  9. list of dict per input text per query where a higher token scores represents higher importance relative to the query
  10. “””
  11. if self.embeddings:
  12. with self.lock:
  13. return self.embeddings.batchexplain(queries, texts, limit)
  14. return None

batchsimilarity(queries, texts)

Computes the similarity between list of queries and list of text. Returns a list of {id: value, score: value} sorted by highest score per query, where id is the index in texts.

Parameters:

NameTypeDescriptionDefault
queries

queries text

required
texts

list of text

required

Returns:

TypeDescription

list of {id: value, score: value} per query

Source code in txtai/app/base.py

  1. 572
  2. 573
  3. 574
  4. 575
  5. 576
  6. 577
  7. 578
  8. 579
  9. 580
  10. 581
  11. 582
  12. 583
  13. 584
  14. 585
  15. 586
  16. 587
  17. 588
  18. 589
  19. 590
  20. 591
  21. 592
  1. def batchsimilarity(self, queries, texts):
  2. “””
  3. Computes the similarity between list of queries and list of text. Returns a list
  4. of {id: value, score: value} sorted by highest score per query, where id is the
  5. index in texts.
  6. Args:
  7. queries: queries text
  8. texts: list of text
  9. Returns:
  10. list of {id: value, score: value} per query
  11. “””
  12. # Use similarity instance if available otherwise fall back to embeddings model
  13. if similarity in self.pipelines:
  14. return [[{“id”: uid, score”: float(score)} for uid, score in r] for r in self.pipelinessimilarity]
  15. if self.embeddings:
  16. return [[{“id”: uid, score”: float(score)} for uid, score in r] for r in self.embeddings.batchsimilarity(queries, texts)]
  17. return None

batchtransform(texts, category=None, index=None)

Transforms list of text into embeddings arrays.

Parameters:

NameTypeDescriptionDefault
texts

list of text

required
category

category for instruction-based embeddings

None
index

index name, if applicable

None

Returns:

TypeDescription

embeddings arrays

Source code in txtai/app/base.py

  1. 650
  2. 651
  3. 652
  4. 653
  5. 654
  6. 655
  7. 656
  8. 657
  9. 658
  10. 659
  11. 660
  12. 661
  13. 662
  14. 663
  15. 664
  16. 665
  17. 666
  1. def batchtransform(self, texts, category=None, index=None):
  2. “””
  3. Transforms list of text into embeddings arrays.
  4. Args:
  5. texts: list of text
  6. category: category for instruction-based embeddings
  7. index: index name, if applicable
  8. Returns:
  9. embeddings arrays
  10. “””
  11. if self.embeddings:
  12. return [[float(x) for x in result] for result in self.embeddings.batchtransform(texts, category, index)]
  13. return None

count()

Total number of elements in this embeddings index.

Returns:

TypeDescription

number of elements in embeddings index

Source code in txtai/api/base.py

  1. 121
  2. 122
  3. 123
  4. 124
  5. 125
  6. 126
  7. 127
  8. 128
  9. 129
  10. 130
  11. 131
  12. 132
  1. def count(self):
  2. “””
  3. Total number of elements in this embeddings index.
  4. Returns:
  5. number of elements in embeddings index
  6. “””
  7. if self.cluster:
  8. return self.cluster.count()
  9. return super().count()

delete(ids)

Deletes from an embeddings index. Returns list of ids deleted.

Parameters:

NameTypeDescriptionDefault
ids

list of ids to delete

required

Returns:

TypeDescription

ids deleted

Source code in txtai/api/base.py

  1. 91
  2. 92
  3. 93
  4. 94
  5. 95
  6. 96
  7. 97
  8. 98
  9. 99
  10. 100
  11. 101
  12. 102
  13. 103
  14. 104
  15. 105
  1. def delete(self, ids):
  2. “””
  3. Deletes from an embeddings index. Returns list of ids deleted.
  4. Args:
  5. ids: list of ids to delete
  6. Returns:
  7. ids deleted
  8. “””
  9. if self.cluster:
  10. return self.cluster.delete(ids)
  11. return super().delete(ids)

explain(query, texts=None, limit=10)

Explains the importance of each input token in text for a query.

Parameters:

NameTypeDescriptionDefault
query

query text

required
texts

optional list of text, otherwise runs search query

None
limit

optional limit if texts is None

10

Returns:

TypeDescription

list of dict per input text where a higher token scores represents higher importance relative to the query

Source code in txtai/app/base.py

  1. 594
  2. 595
  3. 596
  4. 597
  5. 598
  6. 599
  7. 600
  8. 601
  9. 602
  10. 603
  11. 604
  12. 605
  13. 606
  14. 607
  15. 608
  16. 609
  17. 610
  18. 611
  1. def explain(self, query, texts=None, limit=10):
  2. “””
  3. Explains the importance of each input token in text for a query.
  4. Args:
  5. query: query text
  6. texts: optional list of text, otherwise runs search query
  7. limit: optional limit if texts is None
  8. Returns:
  9. list of dict per input text where a higher token scores represents higher importance relative to the query
  10. “””
  11. if self.embeddings:
  12. with self.lock:
  13. return self.embeddings.explain(query, texts, limit)
  14. return None

extract(queue, texts=None)

Extracts answers to input questions.

Parameters:

NameTypeDescriptionDefault
queue

list of {name: value, query: value, question: value, snippet: value}

required
texts

optional list of text

None

Returns:

TypeDescription

list of {name: value, answer: value}

Source code in txtai/app/base.py

  1. 668
  2. 669
  3. 670
  4. 671
  5. 672
  6. 673
  7. 674
  8. 675
  9. 676
  10. 677
  11. 678
  12. 679
  13. 680
  14. 681
  15. 682
  16. 683
  17. 684
  18. 685
  19. 686
  20. 687
  1. def extract(self, queue, texts=None):
  2. “””
  3. Extracts answers to input questions.
  4. Args:
  5. queue: list of {name: value, query: value, question: value, snippet: value}
  6. texts: optional list of text
  7. Returns:
  8. list of {name: value, answer: value}
  9. “””
  10. if self.embeddings and extractor in self.pipelines:
  11. # Get extractor instance
  12. extractor = self.pipelines[“extractor”]
  13. # Run extractor and return results as dicts
  14. return extractor(queue, texts)
  15. return None

index()

Builds an embeddings index for previously batched documents.

Source code in txtai/api/base.py

  1. 71
  2. 72
  3. 73
  4. 74
  5. 75
  6. 76
  7. 77
  8. 78
  9. 79
  1. def index(self):
  2. “””
  3. Builds an embeddings index for previously batched documents.
  4. “””
  5. if self.cluster:
  6. self.cluster.index()
  7. else:
  8. super().index()

label(text, labels)

Applies a zero shot classifier to text using a list of labels. Returns a list of {id: value, score: value} sorted by highest score, where id is the index in labels.

Parameters:

NameTypeDescriptionDefault
text

text|list

required
labels

list of labels

required

Returns:

TypeDescription

list of {id: value, score: value} per text element

Source code in txtai/app/base.py

  1. 689
  2. 690
  3. 691
  4. 692
  5. 693
  6. 694
  7. 695
  8. 696
  9. 697
  10. 698
  11. 699
  12. 700
  13. 701
  14. 702
  15. 703
  16. 704
  17. 705
  18. 706
  19. 707
  20. 708
  21. 709
  22. 710
  1. def label(self, text, labels):
  2. “””
  3. Applies a zero shot classifier to text using a list of labels. Returns a list of
  4. {id: value, score: value} sorted by highest score, where id is the index in labels.
  5. Args:
  6. text: text|list
  7. labels: list of labels
  8. Returns:
  9. list of {id: value, score: value} per text element
  10. “””
  11. if labels in self.pipelines:
  12. # Text is a string
  13. if isinstance(text, str):
  14. return [{“id”: uid, score”: float(score)} for uid, score in self.pipelineslabels]
  15. # Text is a list
  16. return [[{“id”: uid, score”: float(score)} for uid, score in result] for result in self.pipelineslabels]
  17. return None

pipeline(name, *args, **kwargs)

Generic pipeline execution method.

Parameters:

NameTypeDescriptionDefault
name

pipeline name

required
args

pipeline positional arguments

()
kwargs

pipeline keyword arguments

{}

Returns:

TypeDescription

pipeline results

Source code in txtai/app/base.py

  1. 712
  2. 713
  3. 714
  4. 715
  5. 716
  6. 717
  7. 718
  8. 719
  9. 720
  10. 721
  11. 722
  12. 723
  13. 724
  14. 725
  15. 726
  16. 727
  17. 728
  18. 729
  19. 730
  20. 731
  1. def pipeline(self, name, args, **kwargs):
  2. “””
  3. Generic pipeline execution method.
  4. Args:
  5. name: pipeline name
  6. args: pipeline positional arguments
  7. kwargs: pipeline keyword arguments
  8. Returns:
  9. pipeline results
  10. “””
  11. # Backwards compatible with previous pipeline function arguments
  12. args = args[0] if args and len(args) == 1 and isinstance(args[0], tuple) else args
  13. if name in self.pipelines:
  14. return self.pipelines[name](args, **kwargs)
  15. return None

reindex(config, function=None)

Recreates this embeddings index using config. This method only works if document content storage is enabled.

Parameters:

NameTypeDescriptionDefault
config

new config

required
function

optional function to prepare content for indexing

None

Source code in txtai/api/base.py

  1. 107
  2. 108
  3. 109
  4. 110
  5. 111
  6. 112
  7. 113
  8. 114
  9. 115
  10. 116
  11. 117
  12. 118
  13. 119
  1. def reindex(self, config, function=None):
  2. “””
  3. Recreates this embeddings index using config. This method only works if document content storage is enabled.
  4. Args:
  5. config: new config
  6. function: optional function to prepare content for indexing
  7. “””
  8. if self.cluster:
  9. self.cluster.reindex(config, function)
  10. else:
  11. super().reindex(config, function)

similarity(query, texts)

Computes the similarity between query and list of text. Returns a list of {id: value, score: value} sorted by highest score, where id is the index in texts.

Parameters:

NameTypeDescriptionDefault
query

query text

required
texts

list of text

required

Returns:

TypeDescription

list of {id: value, score: value}

Source code in txtai/app/base.py

  1. 550
  2. 551
  3. 552
  4. 553
  5. 554
  6. 555
  7. 556
  8. 557
  9. 558
  10. 559
  11. 560
  12. 561
  13. 562
  14. 563
  15. 564
  16. 565
  17. 566
  18. 567
  19. 568
  20. 569
  21. 570
  1. def similarity(self, query, texts):
  2. “””
  3. Computes the similarity between query and list of text. Returns a list of
  4. {id: value, score: value} sorted by highest score, where id is the index
  5. in texts.
  6. Args:
  7. query: query text
  8. texts: list of text
  9. Returns:
  10. list of {id: value, score: value}
  11. “””
  12. # Use similarity instance if available otherwise fall back to embeddings model
  13. if similarity in self.pipelines:
  14. return [{“id”: uid, score”: float(score)} for uid, score in self.pipelinessimilarity]
  15. if self.embeddings:
  16. return [{“id”: uid, score”: float(score)} for uid, score in self.embeddings.similarity(query, texts)]
  17. return None

transform(text, category=None, index=None)

Transforms text into embeddings arrays.

Parameters:

NameTypeDescriptionDefault
text

input text

required
category

category for instruction-based embeddings

None
index

index name, if applicable

None

Returns:

TypeDescription

embeddings array

Source code in txtai/app/base.py

  1. 632
  2. 633
  3. 634
  4. 635
  5. 636
  6. 637
  7. 638
  8. 639
  9. 640
  10. 641
  11. 642
  12. 643
  13. 644
  14. 645
  15. 646
  16. 647
  17. 648
  1. def transform(self, text, category=None, index=None):
  2. “””
  3. Transforms text into embeddings arrays.
  4. Args:
  5. text: input text
  6. category: category for instruction-based embeddings
  7. index: index name, if applicable
  8. Returns:
  9. embeddings array
  10. “””
  11. if self.embeddings:
  12. return [float(x) for x in self.embeddings.transform(text, category, index)]
  13. return None

upsert()

Runs an embeddings upsert operation for previously batched documents.

Source code in txtai/api/base.py

  1. 81
  2. 82
  3. 83
  4. 84
  5. 85
  6. 86
  7. 87
  8. 88
  9. 89
  1. def upsert(self):
  2. “””
  3. Runs an embeddings upsert operation for previously batched documents.
  4. “””
  5. if self.cluster:
  6. self.cluster.upsert()
  7. else:
  8. super().upsert()

wait()

Closes threadpool and waits for completion.

Source code in txtai/app/base.py

  1. 755
  2. 756
  3. 757
  4. 758
  5. 759
  6. 760
  7. 761
  8. 762
  9. 763
  1. def wait(self):
  2. “””
  3. Closes threadpool and waits for completion.
  4. “””
  5. if self.pool:
  6. self.pool.close()
  7. self.pool.join()
  8. self.pool = None

workflow(name, elements)

Executes a workflow.

Parameters:

NameTypeDescriptionDefault
name

workflow name

required
elements

elements to process

required

Returns:

TypeDescription

processed elements

Source code in txtai/app/base.py

  1. 733
  2. 734
  3. 735
  4. 736
  5. 737
  6. 738
  7. 739
  8. 740
  9. 741
  10. 742
  11. 743
  12. 744
  13. 745
  14. 746
  15. 747
  16. 748
  17. 749
  18. 750
  19. 751
  20. 752
  21. 753
  1. def workflow(self, name, elements):
  2. “””
  3. Executes a workflow.
  4. Args:
  5. name: workflow name
  6. elements: elements to process
  7. Returns:
  8. processed elements
  9. “””
  10. if hasattr(elements, len“) and hasattr(elements, getitem“):
  11. # Convert to tuples and return as a list since input is sized
  12. elements = [tuple(element) if isinstance(element, list) else element for element in elements]
  13. else:
  14. # Convert to tuples and return as a generator since input is not sized
  15. elements = (tuple(element) if isinstance(element, list) else element for element in elements)
  16. # Execute workflow
  17. return self.workflowsname