This documentation is for Dovecot v1.x, see wiki2 for v2.x documentation.

Dovecot's index files

The basic idea behind Dovecot's index files is that it makes reading the mailboxes a lot faster. The index files consist of the following files:

Each mailbox has its own separate index files. If the index files are disabled, the same structures are still kept in the memory, except cache file is disabled completely (because the client probably won't fetch the same data twice within a connection).

If index files are missing, Dovecot creates them automatically when the mailbox is opened. If at any point creating a file or growing a file gives "not enough disk space" error, the indexes are transparently moved to memory for the rest of the session.

See Design/Indexes for more technical information how the index files are handled.

Main index

The main index contains the following information for each message:

This is the same information that most other IMAP servers keep in memory while the mailbox is open, but Dovecot has the advantage of keeping the information permanently stored so it's easy to get it when opening the mailbox.

The index file's header also contains some summary information, such as how many messages exist, how many of them are unseen and how many are marked with \Deleted flag. Opening mailboxes and answering to STATUS IMAP commands can be usually done simply by getting the required information from the index file's header. This is why these operations are extremely fast with Dovecot compared to other servers that don't use an equivalent index file.

Mailbox synchronization

The main index's header also contains mailbox syncing state:

The index file is synchronized against mailbox only if the syncing information changes.

Cache file

Cache file may contain the following information for messages:

IMAP clients can work in many different ways. There are basically 2 types:

  1. Online clients that ask for the same information multiple times (eg. webmails, Pine)
  2. Offline clients that usually download first some of the interesting message headers and only after that the message bodies (possibly automatically, or possibly only when the user opens the mail). Most IMAP clients behave like this.

Cache file is extremely helpful with the type 1 clients. The first time that client requests message headers or some other metadata they're stored into the cache file. The second time they ask for the same information Dovecot can now get it quickly from the cache file instead of opening the message and parsing the headers.

For type 2 clients the cache file is helpful if they use multiple clients or if the data was cached while the message was being saved (Dovecot v1.1+ can do this). Some of the information is helpful in any case, for example it's required to know the message's virtual size when downloading the message. Without the virtual size being in cache Dovecot first has to read the whole message to calculate it.

Only the mailbox metadata that client(s) have asked for earlier are stored into cache file. This allows Dovecot to be adaptive to different clients' needs and still not waste disk space (and cause extra disk I/O!) for fields that client never needs.

Dovecot can cache fields either permanently or temporarily. Temporarily cached fields are dropped from the cache file after about a week. Dovecot uses two rules to determine when data should be cached permanently instead of temporarily:

  1. Client accessed messages in non-sequential order within this session. This most likely means it doesn't have a local cache.
  2. Client accessed a message older than one week.

Design/Indexes/Cache explains the reasons for these rules.

Transaction log

All changes to the main index go through transaction log first. This has two advantages when the mailbox is accessed using multiple simultaneous connections:

  1. It allows getting a list of changes quickly so that IMAP clients can be notified of the changes. An alternative would be to do a comparison of two index mappings, which is what most other IMAP servers do.
  2. mmap_disable=yes implementation relies on the transaction log. Instead of re-reading the whole main index file after each change it's necessary to only read a few bytes from the transaction log.

In Dovecot v1.1+ the transaction log plays an even more important role. The main index file is updated only "once in a while" to reduce disk writes, so it is common to first read the main index and then apply new changes from the transaction log on top of that. With empty mailboxes (eg. download+delete POP3 users) it would even be possible to delete the whole main index and keep only the transaction log (although this isn't done currently).

None: IndexFiles (last edited 2009-11-23 16:43:09 by c)