
pipeline pipeline

The image hash pipeline generates perceptual image hashes. These hashes can be used to detect near-duplicate images. This method is not backed by machine learning models and not intended to find conceptually similar images.


The following shows a simple example using this pipeline.

  1. from txtai.pipeline import ImageHash
  2. # Create and run pipeline
  3. ihash = ImageHash()
  4. ihash("path to image file")

See the link below for a more detailed example.

Near duplicate image detectionIdentify duplicate and near-duplicate imagesOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.


  1. # Create pipeline using lower case class name
  2. imagehash:
  3. # Run pipeline with workflow
  4. workflow:
  5. imagehash:
  6. tasks:
  7. - action: imagehash

Run with Workflows

  1. from import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("imagehash", ["path to image file"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"imagehash", "elements":["path to image file"]}'


Python documentation for the pipeline.

Creates an ImageHash pipeline.



image hashing algorithm (average, perceptual, difference, wavelet, color)


hash size


outputs hex strings if True (default), otherwise the pipeline returns numpy arrays


Source code in txtai/pipeline/image/

  1. 25
  2. 26
  3. 27
  4. 28
  5. 29
  6. 30
  7. 31
  8. 32
  9. 33
  10. 34
  11. 35
  12. 36
  13. 37
  14. 38
  15. 39
  16. 40
  1. def init(self, algorithm=”average”, size=8, strings=True):
  2. “””
  3. Creates an ImageHash pipeline.
  4. Args:
  5. algorithm: image hashing algorithm (average, perceptual, difference, wavelet, color)
  6. size: hash size
  7. strings: outputs hex strings if True (default), otherwise the pipeline returns numpy arrays
  8. “””
  9. if not PIL:
  10. raise ImportError(‘ImageHash pipeline is not available - install pipeline extra to enable’)
  11. self.algorithm = algorithm
  12. self.size = size
  13. self.strings = strings

Generates perceptual image hashes.







list of hashes

Source code in txtai/pipeline/image/

  1. 42
  2. 43
  3. 44
  4. 45
  5. 46
  6. 47
  7. 48
  8. 49
  9. 50
  10. 51
  11. 52
  12. 53
  13. 54
  14. 55
  15. 56
  16. 57
  17. 58
  18. 59
  19. 60
  20. 61
  21. 62
  22. 63
  1. def call(self, images):
  2. “””
  3. Generates perceptual image hashes.
  4. Args:
  5. images: image|list
  6. Returns:
  7. list of hashes
  8. “””
  9. # Convert single element to list
  10. values = [images] if not isinstance(images, list) else images
  11. # Open images if file strings
  12. values = [ if isinstance(image, str) else image for image in values]
  13. # Convert images to hashes
  14. hashes = [self.ihash(image) for image in values]
  15. # Return single element if single element passed in
  16. return hashes[0] if not isinstance(images, list) else hashes