ImageHash

pipeline pipeline

The image hash pipeline generates perceptual image hashes. These hashes can be used to detect near-duplicate images. This method is not backed by machine learning models and not intended to find conceptually similar images.

Example

The following shows a simple example using this pipeline.

  1. from txtai.pipeline import ImageHash
  2. # Create and run pipeline
  3. ihash = ImageHash()
  4. ihash("path to image file")

See the link below for a more detailed example.

NotebookDescription
Near duplicate image detectionIdentify duplicate and near-duplicate imagesOpen In Colab

Configuration-driven example

Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.

config.yml

  1. # Create pipeline using lower case class name
  2. imagehash:
  3. # Run pipeline with workflow
  4. workflow:
  5. imagehash:
  6. tasks:
  7. - action: imagehash

Run with Workflows

  1. from txtai import Application
  2. # Create and run pipeline with workflow
  3. app = Application("config.yml")
  4. list(app.workflow("imagehash", ["path to image file"]))

Run with API

  1. CONFIG=config.yml uvicorn "txtai.api:app" &
  2. curl \
  3. -X POST "http://localhost:8000/workflow" \
  4. -H "Content-Type: application/json" \
  5. -d '{"name":"imagehash", "elements":["path to image file"]}'

Methods

Python documentation for the pipeline.

__init__(algorithm='average', size=8, strings=True)

Creates an ImageHash pipeline.

Parameters:

NameTypeDescriptionDefault
algorithm

image hashing algorithm (average, perceptual, difference, wavelet, color)

‘average’
size

hash size

8
strings

outputs hex strings if True (default), otherwise the pipeline returns numpy arrays

True

Source code in txtai/pipeline/image/imagehash.py

  1. 25
  2. 26
  3. 27
  4. 28
  5. 29
  6. 30
  7. 31
  8. 32
  9. 33
  10. 34
  11. 35
  12. 36
  13. 37
  14. 38
  15. 39
  16. 40
  1. def init(self, algorithm=”average”, size=8, strings=True):
  2. “””
  3. Creates an ImageHash pipeline.
  4. Args:
  5. algorithm: image hashing algorithm (average, perceptual, difference, wavelet, color)
  6. size: hash size
  7. strings: outputs hex strings if True (default), otherwise the pipeline returns numpy arrays
  8. “””
  9. if not PIL:
  10. raise ImportError(‘ImageHash pipeline is not available - install pipeline extra to enable’)
  11. self.algorithm = algorithm
  12. self.size = size
  13. self.strings = strings

__call__(images)

Generates perceptual image hashes.

Parameters:

NameTypeDescriptionDefault
images

image|list

required

Returns:

TypeDescription

list of hashes

Source code in txtai/pipeline/image/imagehash.py

  1. 42
  2. 43
  3. 44
  4. 45
  5. 46
  6. 47
  7. 48
  8. 49
  9. 50
  10. 51
  11. 52
  12. 53
  13. 54
  14. 55
  15. 56
  16. 57
  17. 58
  18. 59
  19. 60
  20. 61
  21. 62
  22. 63
  1. def call(self, images):
  2. “””
  3. Generates perceptual image hashes.
  4. Args:
  5. images: image|list
  6. Returns:
  7. list of hashes
  8. “””
  9. # Convert single element to list
  10. values = [images] if not isinstance(images, list) else images
  11. # Open images if file strings
  12. values = [Image.open(image) if isinstance(image, str) else image for image in values]
  13. # Convert images to hashes
  14. hashes = [self.ihash(image) for image in values]
  15. # Return single element if single element passed in
  16. return hashes[0] if not isinstance(images, list) else hashes