Binary Data

OrientDB natively handles binary data, namely BLOB. However, there are some considerations to take into account based on the type of binary data, the size, the kind of usage, etc.

Sometimes it’s better to store binary records in a different path then default database directory to benefit of faster HD (like a SSD) or just to go in parallel if the OS and HW configuration allow this.

In this case create a new cluster in a different path:

  1. db.addCluster("physical", "binary", "/mnt/ssd", "binary" );

All the records in cluster binary will reside in files created under the directory /mnt/ssd.

Techniques

Store on file system and save the path in the document

This is the simpler way to handle binary data: store them to the file system and just keep the path to retrieve them.

Example:

  1. ODocument doc = new ODocument();
  2. doc.field("binary", "/usr/local/orientdb/binary/test.pdf");
  3. doc.save();

Pros:

  • Easy to write
  • 100% delegated to the File System

Cons:

  • Binary data can’t be automatically distributed using the OrientDB cluster

Store it as a Document field

ODocument class is able to manage binary data in form of byte[] (byte array). Example:

  1. ODocument doc = new ODocument();
  2. doc.field("binary", "Binary data".getBytes());
  3. doc.save();

This is the easiest way to keep the binary data inside the database, but it’s not really efficient on large BLOB because the binary content is serialized in Base64. This means a waste of space (33% more) and a run-time cost in marshalling/unmarshalling.

Also be aware that once the binary data reaches a certain size (10 MB in some recent testing), the database’s performance can decrease significantly. If this occurs, the solution is to use the ORecordBytes solution described below.

Pros:

  • Easy to write

Cons:

  • Waste of space +33%
  • Run-time cost of marshalling/unmarshalling
  • Significant performance decrease once the binary reaches a certain large size

Store it with ORecordBytes

The ORecordBytes class is a record implementation able to store binary content without conversions (see above). This is the faster way to handle binary data with OrientDB but needs a separate record to handle it. This technique also offers the highest performance when storing and retrieving large binary data records.

Example:

  1. ORecordBytes record = new ORecordBytes("Binary data".getBytes());
  2. record.save();

Since this is a separate record, the best way to reference it is to link it to a Document record. Example:

  1. ORecordBytes record = new ORecordBytes("Binary data".getBytes());
  2. ODocument doc = new ODocument();
  3. doc.field("id", 12345);
  4. doc.field("binary", record);
  5. doc.save();

In this way you can access to the binary data by traversing the binary field of the parent’s document record.

  1. ORecordBytes record = doc.field("binary");
  2. byte[] content = record.toStream();

You can manipulate directly the buffer and save it back again by calling the setDirty() against the object:

  1. byte[] content = record.toStream();
  2. content[0] = 0;
  3. record.setDirty();
  4. record.save();

Or you can work against another byte[]:

  1. byte[] content = record.toStream();
  2. byte[] newContent = new byte[content*2];
  3. System.arrayCopy(content, 0, newContent, 0, content.length);
  4. record.fromStream(newContent);
  5. record.setDirty();
  6. record.save();

ORecordBytes class can work with Java Streams:

  1. ORecordBytes record = new ORecordBytes().fromInputStream(in);
  2. record.toOutputStream(out);

Pros:

  • Fast and compact solution

Cons:

  • Slightly complex management

Large content: split in multiple ORecordBytes

OrientDB can store up to 2Gb as record content. But there are other limitations on network buffers and file sizes you should tune to reach the 2GB barrier.

However managing big chunks of binary data means having big byte[] structures in RAM and this could cause a Out Of Memory of the JVM. Many users reported that splitting the binary data in chunks it’s the best solution.

Continuing from the last example we could handle not a single reference against one ORecordBytes record but multiple references. A One-To-Many relationship. For this purpose the LINKLIST type fits perfect because maintains the order.

To avoid OrientDB caches in memory large records use the massive insert intent and keep in the collection the RID, not the entire records.

Example to store in OrientDB the file content:

  1. database.declareIntent( new OIntentMassiveInsert() );
  2. List<ORID> chunks = new ArrayList<ORID>();
  3. InputStream in = new BufferedInputStream( new FileInputStream( file ) );
  4. while ( in.available() > 0 ) {
  5. final ORecordBytes chunk = new ORecordBytes();
  6. // READ REMAINING DATA, BUT NOT MORE THAN 8K
  7. chunk.fromInputStream( in, 8192 );
  8. // SAVE THE CHUNK TO GET THE REFERENCE (IDENTITY) AND FREE FROM THE MEMORY
  9. database.save( chunk );
  10. // SAVE ITS REFERENCE INTO THE COLLECTION
  11. chunks.add( chunk.getIdentity() );
  12. }
  13. // SAVE THE COLLECTION OF REFERENCES IN A NEW DOCUMENT
  14. ODocument record = new ODocument();
  15. record.field( "data", chunks );
  16. database.save( record );
  17. database.declareIntent( null );

Example to read back the file content:

  1. record.setLazyLoad(false);
  2. for (OIdentifiable id : (List<OIdentifiable>) record.field("data")) {
  3. ORecordBytes chunk = (ORecordBytes) id.getRecord();
  4. chunk.toOutputStream(out);
  5. chunk.unload();
  6. }

Pros:

  • Fastest and compact solution

Cons:

  • More complex management

Conclusion

What to use?

  • Have you short binary data? Store them as document’s field
  • Do you want the maximum of performance and better use of the space? Store it with ORecordBytes
  • Have you large binary objects? Store it with ORecordBytes but split the content in multiple records