Fluent Interface
Jina provides a simple fluent interface for Document
that allows one to process (often preprocess) a Document object by chaining methods. For example to read an image file as numpy.ndarray
, resize it, normalize it and then store it to another file; one can simply do:
from jina import Document
d = (
Document(uri='apple.png')
.load_uri_to_image_blob()
.set_image_blob_shape((64, 64))
.set_image_blob_normalization()
.dump_image_blob_to_file('apple1.png')
)
Original apple.png
Processed apple1.png
Important
Note that, chaining methods always modify the original Document in-place. That means the above example is equivalent to:
from jina import Document
d = Document(uri='apple.png')
(d.load_uri_to_image_blob()
.set_image_blob_shape((64, 64))
.set_image_blob_normalization()
.dump_image_blob_to_file('apple1.png'))
Parallelization
Fluent interface is super useful when processing a large DocumentArray
or DocumentArrayMemmap
. One can leverage map()
to speed up things quite a lot.
The following example shows the time difference on preprocessing ~6000 image Documents.
from jina import DocumentArray
from jina.logging.profile import TimeContext
docs = DocumentArray.from_files('*.jpg')
def foo(d):
return (d.load_uri_to_image_blob()
.set_image_blob_normalization()
.set_image_blob_channel_axis(-1, 0))
with TimeContext('map-process'):
for d in docs.map(foo, backend='process'):
pass
with TimeContext('map-thread'):
for d in docs.map(foo, backend='thread'):
pass
with TimeContext('for-loop'):
for d in docs:
foo(d)
map-process ... map-process takes 5 seconds (5.55s)
map-thread ... map-thread takes 10 seconds (10.28s)
for-loop ... for-loop takes 18 seconds (18.52s)
Methods
All the following methods can be chained.
Convert
Provide helper functions for Document
to support conversion between blob
, text
and buffer
.
convert_blob_to_buffer()
convert_buffer_to_blob()
convert_uri_to_datauri()
TextData
Provide helper functions for Document
to support text data.
convert_blob_to_text()
convert_text_to_blob()
dump_text_to_datauri()
load_uri_to_text()
ImageData
Provide helper functions for Document
to support image data.
convert_buffer_to_image_blob()
convert_image_blob_to_buffer()
convert_image_blob_to_sliding_windows()
convert_image_blob_to_uri()
dump_image_blob_to_file()
load_uri_to_image_blob()
set_image_blob_channel_axis()
set_image_blob_inv_normalization()
set_image_blob_normalization()
set_image_blob_shape()
AudioData
Provide helper functions for Document
to support audio data.
dump_audio_blob_to_file()
load_uri_to_audio_blob()
BufferData
Provide helper functions for Document
to handle binary data.
dump_buffer_to_datauri()
load_uri_to_buffer()
DumpFile
Provide helper functions for Document
to dump content to a file.
dump_buffer_to_file()
dump_uri_to_file()
ContentProperty
Provide helper functions for Document
to allow universal content property access.
dump_content_to_datauri()
VideoData
Provide helper functions for Document
to support video data.
dump_video_blob_to_file()
load_uri_to_video_blob()
SingletonSugar
Provide sugary syntax for Document
by inheriting methods from DocumentArray
embed()
match()
MeshData
Provide helper functions for Document
to support 3D mesh data and point cloud.
load_uri_to_point_cloud_blob()