提交 96d60f94 authored 作者: Thomas Mueller's avatar Thomas Mueller

MVStore: store root pages first; better size estimation for serialized objects;…

MVStore: store root pages first; better size estimation for serialized objects; option to compress database content.
上级 2705a8c5
......@@ -266,6 +266,7 @@ Serialization is pluggable. The default serialization currently supports many co
and uses Java serialization for other objects. The following classes are currently directly supported:
<code>Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal,
String, UUID, Date</code> and arrays (both primitive arrays and object arrays).
For serialized objects, the size estimate is adjusted using an exponential moving average.
</p><p>
Parameterized data types are supported
(for example one could build a string data type that limits the length).
......@@ -511,14 +512,14 @@ will result in the following two chunks (excluding metadata):
</p>
<p>
<b>Chunk 1:</b><br />
- Page 1: leaf with 140 entries (keys 0 - 139)<br />
- Page 2: leaf with 260 entries (keys 140 - 399)<br />
- Page 3: (root) node with 2 entries pointing to page 1 and 2<br />
- Page 1: (root) node with 2 entries pointing to page 2 and 3<br />
- Page 2: leaf with 140 entries (keys 0 - 139)<br />
- Page 3: leaf with 260 entries (keys 140 - 399)<br />
</p>
<p>
<b>Chunk 2:</b><br />
- Page 4: leaf with 140 entries (keys 0 - 139)<br />
- Page 5: (root) node with 2 entries pointing to page 4 and 2<br />
- Page 4: (root) node with 2 entries pointing to page 3 and 5<br />
- Page 5: leaf with 140 entries (keys 0 - 139)<br />
</p>
<p>
That means each chunk contains the changes of one version:
......@@ -650,7 +651,7 @@ and <a href="https://en.wikipedia.org/wiki/Variable-length_quantity">variable si
</li><li>mapId (variable size int): The id of the map this page belongs to.
</li><li>len (variable size int): The number of keys in the page.
</li><li>type (byte): The page type (0 for leaf page, 1 for internal node;
plus 2 if the page data is compressed).
plus 2 if the keys and values are compressed).
</li><li>children (array of long; internal nodes only): The position of the children.
</li><li>childCounts (array of variable size long; internal nodes only):
The total number of entries for the given child page.
......@@ -658,8 +659,11 @@ and <a href="https://en.wikipedia.org/wiki/Variable-length_quantity">variable si
</li><li>values (byte array; leaf pages only): All values, stored depending on the data type.
</li></ul>
<p>
Even though this is not required by the file format, each B-tree is stored
"upside down", that means the leaf pages first, then the internal nodes, and lastly the root page.
Even though this is not required by the file format, pages are stored in the following order:
For each map, the root page is stored first, then the internal nodes (if there are any),
and then the leaf pages.
This should speed up reads for media where sequential reads are faster than random access reads.
The metadata map is stored at the end of a chunk.
</p>
<p>
Pointers to pages are stored as a long, using a special format:
......@@ -680,7 +684,7 @@ and when a page is marked as removed, the live maximum length is adjusted.
This allows to estimate the amount of free space within a block, in addition to the number of free pages.
</p>
<p>
The total number of entries in a child nodes is kept to allow efficient range counting,
The total number of entries in child nodes are kept to allow efficient range counting,
lookup by index, and skip operations.
The pages form a <a href="http://www.chiark.greenend.org.uk/~sgtatham/algorithms/cbtree.html">counted B-tree</a>.
</p>
......
......@@ -327,6 +327,13 @@ public class DbSettings extends SettingsBase {
* Use the MVStore storage engine.
*/
public final boolean mvStore = get("MV_STORE", false);
/**
* Database setting <code>COMPRESS</code>
* (default: false).<br />
* Compress data when storing.
*/
public final boolean compressData = get("COMPRESS", false);
private DbSettings(HashMap<String, String> s) {
super(s);
......
......@@ -51,12 +51,7 @@ MVStore:
- maybe change the length code to have lower gaps
- improve memory calculation for transient and cache
specially for large pages (when using the StreamStore)
- automated 'kill process' and 'power failure' test
- update checkstyle
- feature to auto-compact from time to time and on close
- test and possibly improve compact operation (for large dbs)
- possibly split chunk metadata into immutable and mutable
- compact: avoid processing pages using a counting bloom filter
......@@ -75,7 +70,6 @@ MVStore:
- serialization for lists, sets, sets, sorted sets, maps, sorted maps
- maybe rename 'rollback' to 'revert' to distinguish from transactions
- support other compression algorithms (deflate, LZ4,...)
- support opening (existing) maps by id
- remove features that are not really needed; simplify the code
possibly using a separate layer or tools
(retainVersion?)
......@@ -110,6 +104,8 @@ MVStore:
or use a small page size for metadata
- data type "string": maybe use prefix compression for keys
- test chunk id rollover
- feature to auto-compact from time to time and on close
- compact very small chunks
*/
......@@ -790,7 +786,7 @@ public class MVStore {
* @param pos the position
* @return the chunk
*/
Chunk getChunk(long pos) {
private Chunk getChunk(long pos) {
int chunkId = DataUtils.getPageChunkId(pos);
Chunk c = chunks.get(chunkId);
if (c == null) {
......
......@@ -113,10 +113,6 @@ public class MVStoreTool {
pageSize);
p += pageSize;
remaining--;
if (compressed) {
continue;
}
String[] keys = new String[entries];
long[] children = null;
long[] counts = null;
if (node) {
......@@ -130,10 +126,13 @@ public class MVStoreTool {
counts[i] = s;
}
}
String[] keys = new String[entries];
if (mapId == 0) {
for (int i = 0; i < entries; i++) {
String k = StringDataType.INSTANCE.read(chunk);
keys[i] = k;
if (!compressed) {
for (int i = 0; i < entries; i++) {
String k = StringDataType.INSTANCE.read(chunk);
keys[i] = k;
}
}
if (node) {
// meta map node
......@@ -151,7 +150,7 @@ public class MVStoreTool {
keys[entries],
DataUtils.getPageChunkId(cp),
DataUtils.getPageOffset(cp));
} else {
} else if (!compressed) {
// meta map leaf
String[] values = new String[entries];
for (int i = 0; i < entries; i++) {
......
......@@ -113,7 +113,7 @@ public class Page {
* @param version the version
* @return the new page
*/
public static Page createEmpty(MVMap<?, ?> map, long version) {
static Page createEmpty(MVMap<?, ?> map, long version) {
return create(map, version,
0, EMPTY_OBJECT_ARRAY, EMPTY_OBJECT_ARRAY,
0, null, null, null,
......@@ -355,7 +355,7 @@ public class Page {
* @param at the split index
* @return the page with the entries after the split index
*/
public Page split(int at) {
Page split(int at) {
return isLeaf() ? splitLeaf(at) : splitNode(at);
}
......@@ -500,7 +500,7 @@ public class Page {
* @param index the index
* @param total the new value
*/
public void setCounts(int index, long total) {
private void setCounts(int index, long total) {
if (total != counts[index]) {
if ((sharedFlags & SHARED_COUNTS) != 0) {
counts = Arrays.copyOf(counts, counts.length);
......@@ -748,17 +748,6 @@ public class Page {
keyCount = len;
int type = buff.get();
boolean node = (type & 1) == DataUtils.PAGE_TYPE_NODE;
boolean compressed = (type & DataUtils.PAGE_COMPRESSED) != 0;
if (compressed) {
Compressor compressor = map.getStore().getCompressor();
int lenAdd = DataUtils.readVarInt(buff);
int compLen = pageLength + start - buff.position();
byte[] comp = DataUtils.newBytes(compLen);
buff.get(comp);
int l = compLen + lenAdd;
buff = ByteBuffer.allocate(l);
compressor.expand(comp, 0, compLen, buff.array(), buff.arrayOffset(), l);
}
if (node) {
childCount = len + 1;
children = new long[len + 1];
......@@ -775,6 +764,17 @@ public class Page {
}
totalCount = total;
}
boolean compressed = (type & DataUtils.PAGE_COMPRESSED) != 0;
if (compressed) {
Compressor compressor = map.getStore().getCompressor();
int lenAdd = DataUtils.readVarInt(buff);
int compLen = pageLength + start - buff.position();
byte[] comp = DataUtils.newBytes(compLen);
buff.get(comp);
int l = compLen + lenAdd;
buff = ByteBuffer.allocate(l);
compressor.expand(comp, 0, compLen, buff.array(), buff.arrayOffset(), l);
}
map.getKeyType().read(buff, keys, len, true);
if (!node) {
values = new Object[len];
......@@ -789,8 +789,9 @@ public class Page {
*
* @param chunk the chunk
* @param buff the target buffer
* @return the position of the buffer just after the type
*/
private void write(Chunk chunk, WriteBuffer buff) {
private int write(Chunk chunk, WriteBuffer buff) {
int start = buff.position();
int len = keyCount;
int type = children != null ? DataUtils.PAGE_TYPE_NODE
......@@ -798,17 +799,16 @@ public class Page {
buff.putInt(0).
putShort((byte) 0).
putVarInt(map.getId()).
putVarInt(len).
put((byte) type);
int compressStart = buff.position();
putVarInt(len);
int typePos = buff.position();
buff.put((byte) type);
if (type == DataUtils.PAGE_TYPE_NODE) {
for (int i = 0; i <= len; i++) {
buff.putLong(children[i]);
}
writeChildren(buff);
for (int i = 0; i <= len; i++) {
buff.putVarLong(counts[i]);
}
}
int compressStart = buff.position();
map.getKeyType().write(buff, keys, len, true);
if (type == DataUtils.PAGE_TYPE_LEAF) {
map.getValueType().write(buff, values, len, false);
......@@ -818,13 +818,13 @@ public class Page {
Compressor compressor = map.getStore().getCompressor();
int expLen = buff.position() - compressStart;
byte[] exp = new byte[expLen];
buff.position(compressStart).
get(exp);
buff.position(compressStart).get(exp);
byte[] comp = new byte[exp.length * 2];
int compLen = compressor.compress(exp, exp.length, comp, 0);
if (compLen + DataUtils.getVarIntLen(compLen - expLen) < expLen) {
buff.position(compressStart - 1).
put((byte) (type + DataUtils.PAGE_COMPRESSED)).
buff.position(typePos).
put((byte) (type + DataUtils.PAGE_COMPRESSED));
buff.position(compressStart).
putVarInt(expLen - compLen).
put(comp, 0, compLen);
}
......@@ -847,6 +847,14 @@ public class Page {
chunk.maxLenLive += max;
chunk.pageCount++;
chunk.pageCountLive++;
return typePos + 1;
}
private void writeChildren(WriteBuffer buff) {
int len = keyCount;
for (int i = 0; i <= len; i++) {
buff.putLong(children[i]);
}
}
/**
......@@ -861,6 +869,7 @@ public class Page {
// already stored before
return;
}
int patch = write(chunk, buff);
if (!isLeaf()) {
int len = childCount;
for (int i = 0; i < len; i++) {
......@@ -870,8 +879,11 @@ public class Page {
children[i] = p.getPos();
}
}
int old = buff.position();
buff.position(patch);
writeChildren(buff);
buff.position(old);
}
write(chunk, buff);
}
/**
......
......@@ -77,6 +77,11 @@ public class MVTableEngine implements TableEngine {
}
builder.encryptionKey(password);
}
if (db.getSettings().compressData) {
builder.compressData();
// use a larger page split size to improve the compression ratio
builder.pageSplitSize(64 * 1024);
}
builder.backgroundExceptionHandler(new UncaughtExceptionHandler() {
@Override
......
......@@ -1471,6 +1471,8 @@ public class ObjectDataType implements DataType {
* The type for serialized objects.
*/
class SerializedObjectType extends AutoDetectDataType {
int averageSize = 10000;
SerializedObjectType(ObjectDataType base) {
super(base, TYPE_SERIALIZED_OBJECT);
......@@ -1508,7 +1510,7 @@ public class ObjectDataType implements DataType {
public int getMemory(Object obj) {
DataType t = getType(obj);
if (t == this) {
return 1000;
return averageSize;
}
return t.getMemory(obj);
}
......@@ -1521,6 +1523,10 @@ public class ObjectDataType implements DataType {
return;
}
byte[] data = serialize(obj);
int size = data.length;
// adjust the average size
// using an exponential moving average
averageSize = (size + 15 * averageSize) / 16;
buff.put((byte) TYPE_SERIALIZED_OBJECT).putVarInt(data.length)
.put(data);
}
......
......@@ -10,6 +10,7 @@ import java.sql.PreparedStatement;
import java.sql.Statement;
import java.util.Random;
import org.h2.store.FileLister;
import org.h2.store.fs.FileUtils;
import org.h2.test.TestBase;
......@@ -42,6 +43,8 @@ public class TestBenchmark extends TestBase {
}
private void test(boolean mvStore) throws Exception {
// testInsertSelect(mvStore);
// testBinary(mvStore);
testCreateIndex(mvStore);
}
......@@ -51,7 +54,7 @@ public class TestBenchmark extends TestBase {
Statement stat;
String url = "mvstore";
if (mvStore) {
url += ";MV_STORE=TRUE;MV_STORE=TRUE";
url += ";MV_STORE=TRUE"; // ;COMPRESS=TRUE";
}
url = getURL(url, true);
......@@ -84,7 +87,11 @@ public class TestBenchmark extends TestBase {
System.out.println((System.currentTimeMillis() - start) + " "
+ (mvStore ? "mvstore" : "default"));
conn.createStatement().execute("shutdown compact");
conn.close();
for (String f : FileLister.getDatabaseFiles(getBaseDir(), "mvstore", true)) {
System.out.println(" " + f + " " + FileUtils.size(f));
}
}
private void testBinary(boolean mvStore) throws Exception {
......@@ -93,7 +100,7 @@ public class TestBenchmark extends TestBase {
Statement stat;
String url = "mvstore";
if (mvStore) {
url += ";MV_STORE=TRUE;MV_STORE=TRUE";
url += ";MV_STORE=TRUE";
}
url = getURL(url, true);
......@@ -143,7 +150,7 @@ public class TestBenchmark extends TestBase {
Statement stat;
String url = "mvstore";
if (mvStore) {
url += ";MV_STORE=TRUE;LOG=0";
url += ";MV_STORE=TRUE;LOG=0;COMPRESS=TRUE";
}
url = getURL(url, true);
......@@ -176,7 +183,11 @@ public class TestBenchmark extends TestBase {
System.out.println((System.currentTimeMillis() - start) + " "
+ (mvStore ? "mvstore" : "default"));
conn.createStatement().execute("shutdown compact");
conn.close();
for (String f : FileLister.getDatabaseFiles(getBaseDir(), "mvstore", true)) {
System.out.println(" " + f + " " + FileUtils.size(f));
}
}
......
......@@ -49,6 +49,7 @@ public class TestMVStore extends TestBase {
public void test() throws Exception {
FileUtils.deleteRecursive(getBaseDir(), true);
FileUtils.createDirectories(getBaseDir());
testCompressed();
testFileFormatExample();
testMaxChunkLength();
testCacheInfo();
......@@ -102,6 +103,23 @@ public class TestMVStore extends TestBase {
// longer running tests
testLargerThan2G();
}
private void testCompressed() {
String fileName = getBaseDir() + "/testCompressed.h3";
MVStore s = new MVStore.Builder().fileName(fileName).compressData().open();
MVMap<String, String> map = s.openMap("data");
String data = "xxxxxxxxxx";
for (int i = 0; i < 400; i++) {
map.put(data + i, data);
}
s.close();
s = new MVStore.Builder().fileName(fileName).open();
map = s.openMap("data");
for (int i = 0; i < 400; i++) {
assertEquals(data, map.get(data + i));
}
s.close();
}
private void testFileFormatExample() {
String fileName = getBaseDir() + "/testFileFormatExample.h3";
......@@ -116,7 +134,7 @@ public class TestMVStore extends TestBase {
}
s.commit();
s.close();
// MVStoreTool.dump(fileName);
// ;MVStoreTool.dump(fileName);
}
private void testMaxChunkLength() {
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论