About storage

TLDR

  • If the database does not shrink after deleting documents or indexes, it is expected behavior. You are not losing space, MeiliSearch is keeping this space for performance reasons.
  • You should have the same amount of RAM than the space taken on disk by MeiliSearch for optimal performances.

MeiliSearch is a database. It stores the indexed documents along with the data needed to perform lightning search.

Writing a database is hard, and we do not want to reinvent the wheel, so MeiliSearch uses a storage engine under the hood. Using a storage engine allows MeiliSearch to focus on improving search relevancy and search performance while abstracting the complicated task of creating, reading and updating documents on disk, and in memory.

LMDB

The storage engine of MeiliSearch is LMDBAbout storage - 图1 (opens new window). LMDB is a transactional key-value store written in C that was developed for OpenLDAP, and it has ACID properties.

We chose LMDB after we successfully (or not) tried MeiliSearch with SledAbout storage - 图2 (opens new window) and RocksDBAbout storage - 图3 (opens new window) and decided to move on with LMDB because it is the best combination of performance and stability for Meilisearch.

Memory mapping

LMDB stores its data in a memory-mapped fileAbout storage - 图4 (opens new window). All data fetched from LMDB is returned straight from the memory map, which means there is no memory allocation or memory copy during data fetches.

All documents stored on disk are automatically loaded in memory when MeiliSearch asks for them. This ensures LMDB will always make the best use of the RAM available to retrieve the documents.

For the best performance, it is recommended to provide the same amount of RAM as the size the database takes on disk, so all the data structures can fit in memory.

Understanding LMDB

The choice of LMDB comes with certain pros and cons. In order to understand this choice, its upsides and downsides, we need to have an insight on how LMDB impact size and memory usage. This is well explained in a blogpost of LMDBAbout storage - 图5 (opens new window) and we are trying to summarize it here.

Database size

When freeing entries from the database (in our case, removing documents from MeiliSearch), one can observe that no space disk is released. The space previously used by the entry is marked as free for LMDB but not made available for the operating system.
Unlike other storage engines, LMDB chooses this design for performance issues as there is no need for a compaction phase.

As a result, you may see that the disk occupied by LMDB and therefore by MeiliSearch keeps growing even if you are deleting indexes or documents. This is normal behavior, and you can note that the database will not grow again if you write some data after deleting indexes or documents.

Memory usage

Since LMDB is memory mapped, it is the operating system who will manage the real memory allocated or not to MeiliSearch.

Thus, if you run MeiliSearch as a standalone program on a server, LMDB will use the maximum RAM it can use.
If you run MeiliSearch along with other programs, the OS will manage memory based on everyone’s need making MeiliSearch quite flexible when used in development.

TIP

Virtual memory != Real memory
Virtual memory is the memory asked by a program to the OS. This is not the memory that the program will actually use.

In this case, MeiliSearch will always ask for a memory map of 200Gb. This refers to the virtual memory requested to the OS by MeiliSearch, but as you can see, the amount of real memory in RAM used will be smaller.

Measured disk usage

We did some measurements on the default movies.jsonAbout storage - 图6 (opens new window) dataset that you can find in the getting started guide.
This dataset is a JSON file of 8.6 MB and has 19,553 documents.
When we index this file in MeiliSearch, the amount of disk space taken by LMDB is 122MB.

Raw JSONMeiliSearch database size on diskReal memory sizePrivate memory sizeVirtual memory size
8.6 MB122 MB ( raw data * 14 )≃ 6.3 MB120 MB (≃ size on disk)204 Gb (memory map)

That means this dataset is using 6.3 MB of RAM and 122 MB of disk space.