ImageHash
The image hash pipeline generates perceptual image hashes. These hashes can be used to detect near-duplicate images. This method is not backed by machine learning models and not intended to find conceptually similar images.
Example
The following shows a simple example using this pipeline.
from txtai.pipeline import ImageHash
# Create and run pipeline
ihash = ImageHash()
ihash("path to image file")
See the link below for a more detailed example.
Notebook | Description | |
---|---|---|
Near duplicate image detection | Identify duplicate and near-duplicate images |
Configuration-driven example
Pipelines are run with Python or configuration. Pipelines can be instantiated in configuration using the lower case name of the pipeline. Configuration-driven pipelines are run with workflows or the API.
config.yml
# Create pipeline using lower case class name
imagehash:
# Run pipeline with workflow
workflow:
imagehash:
tasks:
- action: imagehash
Run with Workflows
from txtai.app import Application
# Create and run pipeline with workflow
app = Application("config.yml")
list(app.workflow("imagehash", ["path to image file"]))
Run with API
CONFIG=config.yml uvicorn "txtai.api:app" &
curl \
-X POST "http://localhost:8000/workflow" \
-H "Content-Type: application/json" \
-d '{"name":"imagehash", "elements":["path to image file"]}'
Methods
Python documentation for the pipeline.
__init__(self, algorithm='average', size=8, strings=True)
special
Creates an ImageHash pipeline.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
algorithm | image hashing algorithm (average, perceptual, difference, wavelet, color) | ‘average’ | |
size | hash size | 8 | |
strings | outputs hex strings if True (default), otherwise the pipeline returns numpy arrays | True |
Source code in txtai/pipeline/image/imagehash.py
def __init__(self, algorithm="average", size=8, strings=True):
"""
Creates an ImageHash pipeline.
Args:
algorithm: image hashing algorithm (average, perceptual, difference, wavelet, color)
size: hash size
strings: outputs hex strings if True (default), otherwise the pipeline returns numpy arrays
"""
if not PIL:
raise ImportError('ImageHash pipeline is not available - install "pipeline" extra to enable')
self.algorithm = algorithm
self.size = size
self.strings = strings
__call__(self, images)
special
Generates perceptual image hashes.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images | image|list | required |
Returns:
Type | Description |
---|---|
list of hashes |
Source code in txtai/pipeline/image/imagehash.py
def __call__(self, images):
"""
Generates perceptual image hashes.
Args:
images: image|list
Returns:
list of hashes
"""
# Convert single element to list
values = [images] if not isinstance(images, list) else images
# Open images if file strings
values = [Image.open(image) if isinstance(image, str) else image for image in values]
# Convert images to hashes
hashes = [self.ihash(image) for image in values]
# Return single element if single element passed in
return hashes[0] if not isinstance(images, list) else hashes