MVStore file format documentation.

8c0bb119 · Thomas Mueller · 831e6937 · 8c0bb119
--- a/h2/src/docsrc/html/mvstore.html
+++ b/h2/src/docsrc/html/mvstore.html
@@ -478,15 +478,56 @@ it is recommended to use it together with the MVCC mode

 <h2 id="fileFormat">File Format</h2>
 <p>
-The data is stored in one file. The file contains two file headers (to be safe), 
-and a number of chunks. The file headers are one block each; a block is 4096 bytes.
+The data is stored in one file. 
+The file contains two file headers (for safety), and a number of chunks. 
+The file headers are one block each; a block is 4096 bytes.
 Each chunk is at least one block, but typically 200 blocks or more.
-There might be a number of free blocks in front of every chunk.
+Data is stored in the chunks in the form of a 
+<a href="https://en.wikipedia.org/wiki/Log-structured_file_system">log structured storage</a>.
 There is one chunk for every version.
 </p>
 <pre>
 [ file header 1 ] [ file header 2 ] [ chunk ] [ chunk ] ... [ chunk ]
 </pre>
+<p>
+Each chunk contains a number of B-tree pages.
+As an example, the following code:
+</p>
+<pre>
+MVStore s = MVStore.open(fileName);
+MVMap<Integer, String> map = s.openMap("data");
+for (int i = 0; i < 400; i++) {
+    map.put(i, "Hello");
+}
+s.commit();
+for (int i = 0; i < 100; i++) {
+    map.put(0, "Hi");
+}
+s.commit();
+s.close();
+</pre>
+<p>
+will result in the following two chunks (excluding metadata):
+</p>
+<p>
+<b>Chunk 1:</b>
+</p>
+<ul><li>Page 1: leaf with 140 entries (keys 0 - 139)
+</li><li>Page 2: leaf with 260 entries (keys 140 - 399)
+</li><li>Page 3: node with 2 entries pointing to page 1 and 2 (the root)
+</li></ul>
+<p>
+<b>Chunk 2:</b>
+</p>
+<ul><li>Page 4: leaf with 140 entries (keys 0 - 139)
+</li><li>Page 5: node with 2 entries pointing to page 4 and 1 (the root)
+</li></ul>
+<p>
+That means each chunk contains the changes of one version,
+that means the new version of the changed pages and the parent pages,
+recursively, up to the root page. Pages in subsequent chunks refer to
+pages in earlier chunks.
+</p>

 <h3>File Header</h3>
 <p>
@@ -573,11 +614,11 @@ to be stored in the next chunk, and the number of live pages in the old chunk is
 This mechanism is called copy-on-write, and is similar to how the
 <a href="https://en.wikipedia.org/wiki/Btrfs">Btrfs</a> file system works.
 Chunks without live pages are marked as free, so the space can be re-used by more recent chunks.
-Because not all chunks are of the same size, there can be some unused space in front of a chunk
+Because not all chunks are of the same size, there can be a number of free blocks in front of a chunk
 for some time (until a small chunk is written or the chunks are compacted).
 There is a <a href="http://stackoverflow.com/questions/13650134/after-how-many-seconds-are-file-system-write-buffers-typically-flushed">
 delay of 45 seconds</a> (by default) before a free chunk is overwritten,
-to ensure new versions are persisted first, as hard disks sometimes re-order write operations.
+to ensure new versions are persisted first.
 </p>
 <p>
 How the newest chunk is located when opening a store:
@@ -613,10 +654,10 @@ and <a href="https://en.wikipedia.org/wiki/Variable-length_quantity">variable si
 </li><li>len (variable size int): The number of keys in the page.
 </li><li>type (byte): The page type (0 for leaf page, 1 for internal node;
    plus 2 if the page data is compressed).
-</li><li>keys (byte array): All keys, stored depending on the data type.
 </li><li>children (array of long; internal nodes only): The position of the children.
 </li><li>childCounts (array of variable size long; internal nodes only):
    The total number of entries for the given child page.
+</li><li>keys (byte array): All keys, stored depending on the data type.
 </li><li>values (byte array; leaf pages only): All values, stored depending on the data type.
 </li></ul>
 <p>