提交 8537973a authored 作者: Thomas Mueller's avatar Thomas Mueller

MVStore: support concurrent transactions (PostgreSQL read-committed)

上级 82884531
...@@ -24,10 +24,21 @@ MVStore ...@@ -24,10 +24,21 @@ MVStore
Store Builder</a><br /> Store Builder</a><br />
<a href="#r_tree"> <a href="#r_tree">
R-Tree</a><br /> R-Tree</a><br />
<a href="#features"> <a href="#features">
Features</a><br /> Features</a><br />
- <a href="#maps">Maps</a><br />
- <a href="#versions">Versions</a><br />
- <a href="#transactions">Transactions</a><br />
- <a href="#inMemory">In-Memory Performance and Usage</a><br />
- <a href="#dataTypes">Pluggable Data Types</a><br />
- <a href="#blob">BLOB Support</a><br />
- <a href="#pluggableMap">R-Tree and Pluggable Map Implementations</a><br />
- <a href="#caching">Concurrent Operations and Caching</a><br />
- <a href="#logStructured">Log Structured Storage</a><br />
- <a href="#fileSystem">File System Abstraction, File Locking and Online Backup</a><br />
- <a href="#encryption">Encrypted Files</a><br />
- <a href="#tools">Tools</a><br />
- <a href="#exceptionHandling">Exception Handling</a><br />
<a href="#differences"> <a href="#differences">
Similar Projects and Differences to Other Storage Engines</a><br /> Similar Projects and Differences to Other Storage Engines</a><br />
<a href="#current_state"> <a href="#current_state">
...@@ -45,8 +56,7 @@ But it can be also directly within an application, without using JDBC or SQL. ...@@ -45,8 +56,7 @@ But it can be also directly within an application, without using JDBC or SQL.
</li><li>Both file-based persistence and in-memory operation are supported. </li><li>Both file-based persistence and in-memory operation are supported.
</li><li>It is intended to be fast, simple to use, and small. </li><li>It is intended to be fast, simple to use, and small.
</li><li>Old versions of the data can be read concurrently with all other operations. </li><li>Old versions of the data can be read concurrently with all other operations.
</li><li>Transaction are supported (currently only one transaction at a time). </li><li>Transaction are supported.
</li><li>Transactions (even if they are persisted) can be rolled back.
</li><li>The tool is very modular. It supports pluggable data types / serialization, </li><li>The tool is very modular. It supports pluggable data types / serialization,
pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage, pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage,
and a file system abstraction to support encrypted files and zip files. and a file system abstraction to support encrypted files and zip files.
...@@ -166,7 +176,7 @@ The minimum number of dimensions is 1, the maximum is 255. ...@@ -166,7 +176,7 @@ The minimum number of dimensions is 1, the maximum is 255.
<h2 id="features">Features</h2> <h2 id="features">Features</h2>
<h3>Maps</h3> <h3 id="maps">Maps</h3>
<p> <p>
Each store supports a set of named maps. Each store supports a set of named maps.
A map is sorted by key, and supports the common lookup operations, A map is sorted by key, and supports the common lookup operations,
...@@ -186,13 +196,13 @@ of the index, and the value of the map is the primary key of the table (for non- ...@@ -186,13 +196,13 @@ of the index, and the value of the map is the primary key of the table (for non-
the key of the map must also contain the primary key). the key of the map must also contain the primary key).
</p> </p>
<h3>Versions / Transactions</h3> <h3 id="versions">Versions</h3>
<p> <p>
Multiple versions are supported. Multiple versions are supported.
A version is a snapshot of all the data of all maps at a given point in time. A version is a snapshot of all the data of all maps at a given point in time.
A transaction is a number of actions between two versions. A transaction is a number of actions between two versions.
</p><p> </p><p>
Versions / transactions are not immediately persisted; instead, only the version counter is incremented. Versions are not immediately persisted; instead, only the version counter is incremented.
If there is a change after switching to a new version, a snapshot of the old version is kept in memory, If there is a change after switching to a new version, a snapshot of the old version is kept in memory,
so that it can still be read. so that it can still be read.
</p><p> </p><p>
...@@ -203,7 +213,23 @@ This behavior is also called COW (copy on write). ...@@ -203,7 +213,23 @@ This behavior is also called COW (copy on write).
Rollback is supported (rollback to any old in-memory version or an old persisted version). Rollback is supported (rollback to any old in-memory version or an old persisted version).
</p> </p>
<h3>In-Memory Performance and Usage</h3> <h3 id="transactions">Transactions</h3>
<p>
The multi-version support is the basis for the transaction support.
In the simple case, when only one transaction is open at a time,
rolling back the transaction only requires to revert to an old version.
</p><p>
To support multiple concurrent open transactions, a transaction utility is included,
the <code>TransactionStore</code>.
This utility stores the changed entries in a separate map, similar to a transaction log
(except that only the key of a changed row is stored,
and the entries of a transaction are removed when the transaction is committed).
The storage overhead of this utility is very small compared to the overhead of a regular transaction log.
The tool supports PostgreSQL style "read committed" transaction isolation.
There is no limit on the size of a transaction (the log is not kept in memory).
</p>
<h3 id="inMemory">In-Memory Performance and Usage</h3>
<p> <p>
Performance of in-memory operations is comparable with <code>java.util.TreeMap</code> Performance of in-memory operations is comparable with <code>java.util.TreeMap</code>
(many operations are actually faster), but usually slower than <code>java.util.HashMap</code>. (many operations are actually faster), but usually slower than <code>java.util.HashMap</code>.
...@@ -220,7 +246,7 @@ If a file name is specified, all operations occur in memory (with the same ...@@ -220,7 +246,7 @@ If a file name is specified, all operations occur in memory (with the same
performance characteristics) until data is persisted. performance characteristics) until data is persisted.
</p> </p>
<h3>Pluggable Data Types</h3> <h3 id="dataTypes">Pluggable Data Types</h3>
<p> <p>
Serialization is pluggable. The default serialization currently supports many common data types, Serialization is pluggable. The default serialization currently supports many common data types,
and uses Java serialization for other objects. The following classes are currently directly supported: and uses Java serialization for other objects. The following classes are currently directly supported:
...@@ -236,7 +262,7 @@ Also, there is no inherent limit to the number of maps and chunks. ...@@ -236,7 +262,7 @@ Also, there is no inherent limit to the number of maps and chunks.
Due to using a log structured storage, there is no special case handling for large keys or pages. Due to using a log structured storage, there is no special case handling for large keys or pages.
</p> </p>
<h3>BLOB Support</h3> <h3 id="blob">BLOB Support</h3>
<p> <p>
There is a mechanism that stores large binary objects by splitting them into smaller blocks. There is a mechanism that stores large binary objects by splitting them into smaller blocks.
This allows to store objects that don't fit in memory. This allows to store objects that don't fit in memory.
...@@ -244,7 +270,7 @@ Streaming as well as random access reads on such objects are supported. ...@@ -244,7 +270,7 @@ Streaming as well as random access reads on such objects are supported.
This tool is written on top of the store (only using the map interface). This tool is written on top of the store (only using the map interface).
</p> </p>
<h3>R-Tree and Pluggable Map Implementations</h3> <h3 id="pluggableMap">R-Tree and Pluggable Map Implementations</h3>
<p> <p>
The map implementation is pluggable. The map implementation is pluggable.
In addition to the default <code>MVMap</code> (multi-version map), In addition to the default <code>MVMap</code> (multi-version map),
...@@ -252,7 +278,7 @@ there is a multi-version R-tree map implementation ...@@ -252,7 +278,7 @@ there is a multi-version R-tree map implementation
for spatial operations (contain and intersection; nearest neighbor is not yet implemented). for spatial operations (contain and intersection; nearest neighbor is not yet implemented).
</p> </p>
<h3>Concurrent Operations and Caching</h3> <h3 id="caching">Concurrent Operations and Caching</h3>
<p> <p>
The default map implementation supports concurrent reads on old versions of the data. The default map implementation supports concurrent reads on old versions of the data.
All such read operations can occur in parallel. Concurrent reads from the page cache, All such read operations can occur in parallel. Concurrent reads from the page cache,
...@@ -281,7 +307,7 @@ the map could be split into multiple maps in different stores ('sharding'). ...@@ -281,7 +307,7 @@ the map could be split into multiple maps in different stores ('sharding').
The plan is to add such a mechanism later when needed. The plan is to add such a mechanism later when needed.
</p> </p>
<h3>Log Structured Storage</h3> <h3 id="logStructured">Log Structured Storage</h3>
<p> <p>
Changes are buffered in memory, and once enough changes have accumulated, Changes are buffered in memory, and once enough changes have accumulated,
they are written in one continuous disk write operation. they are written in one continuous disk write operation.
...@@ -327,7 +353,7 @@ But temporarily, disk space usage might actually be a bit higher than for a regu ...@@ -327,7 +353,7 @@ But temporarily, disk space usage might actually be a bit higher than for a regu
as disk space is not immediately re-used (there are no in-place updates). as disk space is not immediately re-used (there are no in-place updates).
</p> </p>
<h3>File System Abstraction, File Locking and Online Backup</h3> <h3 id="fileSystem">File System Abstraction, File Locking and Online Backup</h3>
<p> <p>
The file system is pluggable (the same file system abstraction is used as H2 uses). The file system is pluggable (the same file system abstraction is used as H2 uses).
The file can be encrypted using an encrypting file system. The file can be encrypted using an encrypting file system.
...@@ -347,7 +373,7 @@ new data is always appended at the end of the file. ...@@ -347,7 +373,7 @@ new data is always appended at the end of the file.
Then, the file can be copied (the file handle is available to the application). Then, the file can be copied (the file handle is available to the application).
</p> </p>
<h3>Encrypted Files</h3> <h3 id="encryption">Encrypted Files</h3>
<p> <p>
File encryption ensures the data can only be read with the correct password. File encryption ensures the data can only be read with the correct password.
Data can be encrypted as follows: Data can be encrypted as follows:
...@@ -378,12 +404,12 @@ The following algorithms and settings are used: ...@@ -378,12 +404,12 @@ The following algorithms and settings are used:
Only little more than one AES-128 round per block is needed. Only little more than one AES-128 round per block is needed.
</li></ul> </li></ul>
<h3>Tools</h3> <h3 id="tools">Tools</h3>
<p> <p>
There is a tool (<code>MVStoreTool</code>) to dump the contents of a file. There is a tool (<code>MVStoreTool</code>) to dump the contents of a file.
</p> </p>
<h3>Exception Handling</h3> <h3 id="exceptionHandling">Exception Handling</h3>
<p> <p>
This tool does not throw checked exceptions. This tool does not throw checked exceptions.
Instead, unchecked exceptions are thrown if needed. Instead, unchecked exceptions are thrown if needed.
......
...@@ -1208,87 +1208,90 @@ Limit on the complexity of SQL statements. Statements of the following form will ...@@ -1208,87 +1208,90 @@ Limit on the complexity of SQL statements. Statements of the following form will
There is no limit for the following entities, except the memory and storage capacity: maximum identifier length (table name, column name, and so on); maximum number of tables, columns, indexes, triggers, and other database objects; maximum statement length, number of parameters per statement, tables per statement, expressions in order by, group by, having, and so on; maximum rows per query; maximum columns per table, columns per index, indexes per table, lob columns per table, and so on; maximum row length, index row length, select row length; maximum length of a varchar column, decimal column, literal in a statement. There is no limit for the following entities, except the memory and storage capacity: maximum identifier length (table name, column name, and so on); maximum number of tables, columns, indexes, triggers, and other database objects; maximum statement length, number of parameters per statement, tables per statement, expressions in order by, group by, having, and so on; maximum rows per query; maximum columns per table, columns per index, indexes per table, lob columns per table, and so on; maximum row length, index row length, select row length; maximum length of a varchar column, decimal column, literal in a statement.
@advanced_1403_li @advanced_1403_li
Querying from the metadata tables is slow if there are many tables (thousands).
@advanced_1404_li
For limitations on data types, see the documentation of the respective Java data type or the data type documentation of this database. For limitations on data types, see the documentation of the respective Java data type or the data type documentation of this database.
@advanced_1404_h2 @advanced_1405_h2
Glossary and Links Glossary and Links
@advanced_1405_th @advanced_1406_th
Term Term
@advanced_1406_th @advanced_1407_th
Description Description
@advanced_1407_td @advanced_1408_td
AES-128 AES-128
@advanced_1408_td @advanced_1409_td
A block encryption algorithm. See also: <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">Wikipedia: AES</a> A block encryption algorithm. See also: <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">Wikipedia: AES</a>
@advanced_1409_td @advanced_1410_td
Birthday Paradox Birthday Paradox
@advanced_1410_td @advanced_1411_td
Describes the higher than expected probability that two persons in a room have the same birthday. Also valid for randomly generated UUIDs. See also: <a href="http://en.wikipedia.org/wiki/Birthday_paradox">Wikipedia: Birthday Paradox</a> Describes the higher than expected probability that two persons in a room have the same birthday. Also valid for randomly generated UUIDs. See also: <a href="http://en.wikipedia.org/wiki/Birthday_paradox">Wikipedia: Birthday Paradox</a>
@advanced_1411_td @advanced_1412_td
Digest Digest
@advanced_1412_td @advanced_1413_td
Protocol to protect a password (but not to protect data). See also: <a href="http://www.faqs.org/rfcs/rfc2617.html">RFC 2617: HTTP Digest Access Authentication</a> Protocol to protect a password (but not to protect data). See also: <a href="http://www.faqs.org/rfcs/rfc2617.html">RFC 2617: HTTP Digest Access Authentication</a>
@advanced_1413_td @advanced_1414_td
GCJ GCJ
@advanced_1414_td @advanced_1415_td
Compiler for Java. <a href="http://gcc.gnu.org/java">GNU Compiler for the Java</a> and <a href="http://www.dobysoft.com/products/nativej">NativeJ (commercial)</a> Compiler for Java. <a href="http://gcc.gnu.org/java">GNU Compiler for the Java</a> and <a href="http://www.dobysoft.com/products/nativej">NativeJ (commercial)</a>
@advanced_1415_td @advanced_1416_td
HTTPS HTTPS
@advanced_1416_td @advanced_1417_td
A protocol to provide security to HTTP connections. See also: <a href="http://www.ietf.org/rfc/rfc2818.txt">RFC 2818: HTTP Over TLS</a> A protocol to provide security to HTTP connections. See also: <a href="http://www.ietf.org/rfc/rfc2818.txt">RFC 2818: HTTP Over TLS</a>
@advanced_1417_td @advanced_1418_td
Modes of Operation Modes of Operation
@advanced_1418_a @advanced_1419_a
Wikipedia: Block cipher modes of operation Wikipedia: Block cipher modes of operation
@advanced_1419_td @advanced_1420_td
Salt Salt
@advanced_1420_td @advanced_1421_td
Random number to increase the security of passwords. See also: <a href="http://en.wikipedia.org/wiki/Key_derivation_function">Wikipedia: Key derivation function</a> Random number to increase the security of passwords. See also: <a href="http://en.wikipedia.org/wiki/Key_derivation_function">Wikipedia: Key derivation function</a>
@advanced_1421_td @advanced_1422_td
SHA-256 SHA-256
@advanced_1422_td @advanced_1423_td
A cryptographic one-way hash function. See also: <a href="http://en.wikipedia.org/wiki/SHA_family">Wikipedia: SHA hash functions</a> A cryptographic one-way hash function. See also: <a href="http://en.wikipedia.org/wiki/SHA_family">Wikipedia: SHA hash functions</a>
@advanced_1423_td @advanced_1424_td
SQL Injection SQL Injection
@advanced_1424_td @advanced_1425_td
A security vulnerability where an application embeds SQL statements or expressions in user input. See also: <a href="http://en.wikipedia.org/wiki/SQL_injection">Wikipedia: SQL Injection</a> A security vulnerability where an application embeds SQL statements or expressions in user input. See also: <a href="http://en.wikipedia.org/wiki/SQL_injection">Wikipedia: SQL Injection</a>
@advanced_1425_td @advanced_1426_td
Watermark Attack Watermark Attack
@advanced_1426_td @advanced_1427_td
Security problem of certain encryption programs where the existence of certain data can be proven without decrypting. For more information, search in the internet for 'watermark attack cryptoloop' Security problem of certain encryption programs where the existence of certain data can be proven without decrypting. For more information, search in the internet for 'watermark attack cryptoloop'
@advanced_1427_td @advanced_1428_td
SSL/TLS SSL/TLS
@advanced_1428_td @advanced_1429_td
Secure Sockets Layer / Transport Layer Security. See also: <a href="http://java.sun.com/products/jsse/">Java Secure Socket Extension (JSSE)</a> Secure Sockets Layer / Transport Layer Security. See also: <a href="http://java.sun.com/products/jsse/">Java Secure Socket Extension (JSSE)</a>
@advanced_1429_td @advanced_1430_td
XTEA XTEA
@advanced_1430_td @advanced_1431_td
A block encryption algorithm. See also: <a href="http://en.wikipedia.org/wiki/XTEA">Wikipedia: XTEA</a> A block encryption algorithm. See also: <a href="http://en.wikipedia.org/wiki/XTEA">Wikipedia: XTEA</a>
@build_1000_h1 @build_1000_h1
...@@ -3848,7 +3851,7 @@ Ignore Unknown Settings ...@@ -3848,7 +3851,7 @@ Ignore Unknown Settings
Changing Other Settings when Opening a Connection Changing Other Settings when Opening a Connection
@features_1389_p @features_1389_p
In addition to the settings already described, other database settings can be passed in the database URL. Adding <code>;setting=value</code> at the end of a database URL is the same as executing the statement <code>SET setting value</code> just after connecting. For a list of supported settings, see <a href="grammar.html">SQL Grammar</a>. In addition to the settings already described, other database settings can be passed in the database URL. Adding <code>;setting=value</code> at the end of a database URL is the same as executing the statement <code>SET setting value</code> just after connecting. For a list of supported settings, see <a href="grammar.html">SQL Grammar</a> or the <a href="../javadoc/org/h2/constant/DbSettings.html">DbSettings</a> javadoc.
@features_1390_h2 @features_1390_h2
Custom File Access Mode Custom File Access Mode
...@@ -6799,301 +6802,346 @@ MVStore ...@@ -6799,301 +6802,346 @@ MVStore
@mvstore_1005_a @mvstore_1005_a
Features Features
@mvstore_1006_a @mvstore_1006_div
- <a href="#maps">Maps</a>
@mvstore_1007_div
- <a href="#versions">Versions</a>
@mvstore_1008_div
- <a href="#transactions">Transactions</a>
@mvstore_1009_div
- <a href="#inMemory">In-Memory Performance and Usage</a>
@mvstore_1010_div
- <a href="#dataTypes">Pluggable Data Types</a>
@mvstore_1011_div
- <a href="#blob">BLOB Support</a>
@mvstore_1012_div
- <a href="#pluggableMap">R-Tree and Pluggable Map Implementations</a>
@mvstore_1013_div
- <a href="#caching">Concurrent Operations and Caching</a>
@mvstore_1014_div
- <a href="#logStructured">Log Structured Storage</a>
@mvstore_1015_div
- <a href="#fileSystem">File System Abstraction, File Locking and Online Backup</a>
@mvstore_1016_div
- <a href="#encryption">Encrypted Files</a>
@mvstore_1017_div
- <a href="#tools">Tools</a>
@mvstore_1018_div
- <a href="#exceptionHandling">Exception Handling</a>
@mvstore_1019_a
Similar Projects and Differences to Other Storage Engines Similar Projects and Differences to Other Storage Engines
@mvstore_1007_a @mvstore_1020_a
Current State Current State
@mvstore_1008_a @mvstore_1021_a
Requirements Requirements
@mvstore_1009_h2 @mvstore_1022_h2
Overview Overview
@mvstore_1010_p @mvstore_1023_p
The MVStore is work in progress, and is planned to be the next storage subsystem of H2. But it can be also directly within an application, without using JDBC or SQL. The MVStore is work in progress, and is planned to be the next storage subsystem of H2. But it can be also directly within an application, without using JDBC or SQL.
@mvstore_1011_li @mvstore_1024_li
MVStore stands for "multi-version store". MVStore stands for "multi-version store".
@mvstore_1012_li @mvstore_1025_li
Each store contains a number of maps (using the <code>java.util.Map</code> interface). Each store contains a number of maps (using the <code>java.util.Map</code> interface).
@mvstore_1013_li @mvstore_1026_li
Both file-based persistence and in-memory operation are supported. Both file-based persistence and in-memory operation are supported.
@mvstore_1014_li @mvstore_1027_li
It is intended to be fast, simple to use, and small. It is intended to be fast, simple to use, and small.
@mvstore_1015_li @mvstore_1028_li
Old versions of the data can be read concurrently with all other operations. Old versions of the data can be read concurrently with all other operations.
@mvstore_1016_li @mvstore_1029_li
Transaction are supported (currently only one transaction at a time). Transaction are supported.
@mvstore_1017_li
Transactions (even if they are persisted) can be rolled back.
@mvstore_1018_li @mvstore_1030_li
The tool is very modular. It supports pluggable data types / serialization, pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage, and a file system abstraction to support encrypted files and zip files. The tool is very modular. It supports pluggable data types / serialization, pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage, and a file system abstraction to support encrypted files and zip files.
@mvstore_1019_h2 @mvstore_1031_h2
Example Code Example Code
@mvstore_1020_p @mvstore_1032_p
The following sample code show how to create a store, open a map, add some data, and access the current and an old version: The following sample code show how to create a store, open a map, add some data, and access the current and an old version:
@mvstore_1021_h2 @mvstore_1033_h2
Store Builder Store Builder
@mvstore_1022_p @mvstore_1034_p
The <code>MVStore.Builder</code> provides a fluid interface to build a store if more complex configuration options are used. The following code contains all supported configuration options: The <code>MVStore.Builder</code> provides a fluid interface to build a store if more complex configuration options are used. The following code contains all supported configuration options:
@mvstore_1023_li @mvstore_1035_li
cacheSizeMB: the cache size in MB. cacheSizeMB: the cache size in MB.
@mvstore_1024_li @mvstore_1036_li
compressData: compress the data when storing. compressData: compress the data when storing.
@mvstore_1025_li @mvstore_1037_li
encryptionKey: the encryption key for file encryption. encryptionKey: the encryption key for file encryption.
@mvstore_1026_li @mvstore_1038_li
fileName: the name of the file, for file based stores. fileName: the name of the file, for file based stores.
@mvstore_1027_li @mvstore_1039_li
readOnly: open the file in read-only mode. readOnly: open the file in read-only mode.
@mvstore_1028_li @mvstore_1040_li
writeBufferSize: the size of the write buffer in MB. writeBufferSize: the size of the write buffer in MB.
@mvstore_1029_li @mvstore_1041_li
writeDelay: the maximum delay until committed changes are stored (unless stored explicitly). writeDelay: the maximum delay until committed changes are stored (unless stored explicitly).
@mvstore_1030_h2 @mvstore_1042_h2
R-Tree R-Tree
@mvstore_1031_p @mvstore_1043_p
The <code>MVRTreeMap</code> is an R-tree implementation that supports fast spatial queries. It can be used as follows: The <code>MVRTreeMap</code> is an R-tree implementation that supports fast spatial queries. It can be used as follows:
@mvstore_1032_p @mvstore_1044_p
The default number of dimensions is 2. To use a different number of dimensions, call <code>new MVRTreeMap.Builder&lt;String&gt;().dimensions(3)</code>. The minimum number of dimensions is 1, the maximum is 255. The default number of dimensions is 2. To use a different number of dimensions, call <code>new MVRTreeMap.Builder&lt;String&gt;().dimensions(3)</code>. The minimum number of dimensions is 1, the maximum is 255.
@mvstore_1033_h2 @mvstore_1045_h2
Features Features
@mvstore_1034_h3 @mvstore_1046_h3
Maps Maps
@mvstore_1035_p @mvstore_1047_p
Each store supports a set of named maps. A map is sorted by key, and supports the common lookup operations, including access to the first and last key, iterate over some or all keys, and so on. Each store supports a set of named maps. A map is sorted by key, and supports the common lookup operations, including access to the first and last key, iterate over some or all keys, and so on.
@mvstore_1036_p @mvstore_1048_p
Also supported, and very uncommon for maps, is fast index lookup: the keys of the map can be accessed like a list (get the key at the given index, get the index of a certain key). That means getting the median of two keys is trivial, and range of keys can be counted very quickly. The iterator supports fast skipping. This is possible because internally, each map is organized in the form of a counted B+-tree. Also supported, and very uncommon for maps, is fast index lookup: the keys of the map can be accessed like a list (get the key at the given index, get the index of a certain key). That means getting the median of two keys is trivial, and range of keys can be counted very quickly. The iterator supports fast skipping. This is possible because internally, each map is organized in the form of a counted B+-tree.
@mvstore_1037_p @mvstore_1049_p
In database terms, a map can be used like a table, where the key of the map is the primary key of the table, and the value is the row. A map can also represent an index, where the key of the map is the key of the index, and the value of the map is the primary key of the table (for non-unique indexes, the key of the map must also contain the primary key). In database terms, a map can be used like a table, where the key of the map is the primary key of the table, and the value is the row. A map can also represent an index, where the key of the map is the key of the index, and the value of the map is the primary key of the table (for non-unique indexes, the key of the map must also contain the primary key).
@mvstore_1038_h3 @mvstore_1050_h3
Versions / Transactions Versions
@mvstore_1039_p @mvstore_1051_p
Multiple versions are supported. A version is a snapshot of all the data of all maps at a given point in time. A transaction is a number of actions between two versions. Multiple versions are supported. A version is a snapshot of all the data of all maps at a given point in time. A transaction is a number of actions between two versions.
@mvstore_1040_p @mvstore_1052_p
Versions / transactions are not immediately persisted; instead, only the version counter is incremented. If there is a change after switching to a new version, a snapshot of the old version is kept in memory, so that it can still be read. Versions are not immediately persisted; instead, only the version counter is incremented. If there is a change after switching to a new version, a snapshot of the old version is kept in memory, so that it can still be read.
@mvstore_1041_p @mvstore_1053_p
Old persisted versions are readable until the old data was explicitly overwritten. Creating a snapshot is fast: only the pages that are changed after a snapshot are copied. This behavior is also called COW (copy on write). Old persisted versions are readable until the old data was explicitly overwritten. Creating a snapshot is fast: only the pages that are changed after a snapshot are copied. This behavior is also called COW (copy on write).
@mvstore_1042_p @mvstore_1054_p
Rollback is supported (rollback to any old in-memory version or an old persisted version). Rollback is supported (rollback to any old in-memory version or an old persisted version).
@mvstore_1043_h3 @mvstore_1055_h3
Transactions
@mvstore_1056_p
The multi-version support is the basis for the transaction support. In the simple case, when only one transaction is open at a time, rolling back the transaction only requires to revert to an old version.
@mvstore_1057_p
To support multiple concurrent open transactions, a transaction utility is included, the <code>TransactionStore</code>. This utility stores the changed entries in a separate map, similar to a transaction log (except that only the key of a changed row is stored, and the entries of a transaction are removed when the transaction is committed). The storage overhead of this utility is very small compared to the overhead of a regular transaction log. The tool supports PostgreSQL style "read committed" transaction isolation. There is no limit on the size of a transaction (the log is not kept in memory).
@mvstore_1058_h3
In-Memory Performance and Usage In-Memory Performance and Usage
@mvstore_1044_p @mvstore_1059_p
Performance of in-memory operations is comparable with <code>java.util.TreeMap</code> (many operations are actually faster), but usually slower than <code>java.util.HashMap</code>. Performance of in-memory operations is comparable with <code>java.util.TreeMap</code> (many operations are actually faster), but usually slower than <code>java.util.HashMap</code>.
@mvstore_1045_p @mvstore_1060_p
The memory overhead for large maps is slightly better than for the regular map implementations, but there is a higher overhead per map. For maps with less than 25 entries, the regular map implementations use less memory on average. The memory overhead for large maps is slightly better than for the regular map implementations, but there is a higher overhead per map. For maps with less than 25 entries, the regular map implementations use less memory on average.
@mvstore_1046_p @mvstore_1061_p
If no file name is specified, the store operates purely in memory. Except for persisting data, all features are supported in this mode (multi-versioning, index lookup, R-tree and so on). If a file name is specified, all operations occur in memory (with the same performance characteristics) until data is persisted. If no file name is specified, the store operates purely in memory. Except for persisting data, all features are supported in this mode (multi-versioning, index lookup, R-tree and so on). If a file name is specified, all operations occur in memory (with the same performance characteristics) until data is persisted.
@mvstore_1047_h3 @mvstore_1062_h3
Pluggable Data Types Pluggable Data Types
@mvstore_1048_p @mvstore_1063_p
Serialization is pluggable. The default serialization currently supports many common data types, and uses Java serialization for other objects. The following classes are currently directly supported: <code>Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal, String, UUID, Date</code> and arrays (both primitive arrays and object arrays). Serialization is pluggable. The default serialization currently supports many common data types, and uses Java serialization for other objects. The following classes are currently directly supported: <code>Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal, String, UUID, Date</code> and arrays (both primitive arrays and object arrays).
@mvstore_1049_p @mvstore_1064_p
Parameterized data types are supported (for example one could build a string data type that limits the length for some reason). Parameterized data types are supported (for example one could build a string data type that limits the length for some reason).
@mvstore_1050_p @mvstore_1065_p
The storage engine itself does not have any length limits, so that keys, values, pages, and chunks can be very big (as big as fits in memory). Also, there is no inherent limit to the number of maps and chunks. Due to using a log structured storage, there is no special case handling for large keys or pages. The storage engine itself does not have any length limits, so that keys, values, pages, and chunks can be very big (as big as fits in memory). Also, there is no inherent limit to the number of maps and chunks. Due to using a log structured storage, there is no special case handling for large keys or pages.
@mvstore_1051_h3 @mvstore_1066_h3
BLOB Support BLOB Support
@mvstore_1052_p @mvstore_1067_p
There is a mechanism that stores large binary objects by splitting them into smaller blocks. This allows to store objects that don't fit in memory. Streaming as well as random access reads on such objects are supported. This tool is written on top of the store (only using the map interface). There is a mechanism that stores large binary objects by splitting them into smaller blocks. This allows to store objects that don't fit in memory. Streaming as well as random access reads on such objects are supported. This tool is written on top of the store (only using the map interface).
@mvstore_1053_h3 @mvstore_1068_h3
R-Tree and Pluggable Map Implementations R-Tree and Pluggable Map Implementations
@mvstore_1054_p @mvstore_1069_p
The map implementation is pluggable. In addition to the default <code>MVMap</code> (multi-version map), there is a multi-version R-tree map implementation for spatial operations (contain and intersection; nearest neighbor is not yet implemented). The map implementation is pluggable. In addition to the default <code>MVMap</code> (multi-version map), there is a multi-version R-tree map implementation for spatial operations (contain and intersection; nearest neighbor is not yet implemented).
@mvstore_1055_h3 @mvstore_1070_h3
Concurrent Operations and Caching Concurrent Operations and Caching
@mvstore_1056_p @mvstore_1071_p
The default map implementation supports concurrent reads on old versions of the data. All such read operations can occur in parallel. Concurrent reads from the page cache, as well as concurrent reads from the file system are supported. The default map implementation supports concurrent reads on old versions of the data. All such read operations can occur in parallel. Concurrent reads from the page cache, as well as concurrent reads from the file system are supported.
@mvstore_1057_p @mvstore_1072_p
Storing changes can occur concurrently to modifying the data, as it operates on a snapshot. Storing changes can occur concurrently to modifying the data, as it operates on a snapshot.
@mvstore_1058_p @mvstore_1073_p
Caching is done on the page level. The page cache is a concurrent LIRS cache, which should be resistant against scan operations. Caching is done on the page level. The page cache is a concurrent LIRS cache, which should be resistant against scan operations.
@mvstore_1059_p @mvstore_1074_p
The default map implementation does not support concurrent modification operations on a map (the same as <code>HashMap</code> and <code>TreeMap</code>). Similar to those classes, the map tries to detect concurrent modification. The default map implementation does not support concurrent modification operations on a map (the same as <code>HashMap</code> and <code>TreeMap</code>). Similar to those classes, the map tries to detect concurrent modification.
@mvstore_1060_p @mvstore_1075_p
With the <code>MVMapConcurrent</code> implementation, read operations even on the newest version can happen concurrently with all other operations, without risk of corruption. This comes with slightly reduced speed in single threaded mode, the same as with other <code>ConcurrentHashMap</code> implementations. Write operations first read the relevant area from disk to memory (this can happen concurrently), and only then modify the data. The in-memory part of write operations is synchronized. With the <code>MVMapConcurrent</code> implementation, read operations even on the newest version can happen concurrently with all other operations, without risk of corruption. This comes with slightly reduced speed in single threaded mode, the same as with other <code>ConcurrentHashMap</code> implementations. Write operations first read the relevant area from disk to memory (this can happen concurrently), and only then modify the data. The in-memory part of write operations is synchronized.
@mvstore_1061_p @mvstore_1076_p
For fully scalable concurrent write operations to a map (in-memory and to disk), the map could be split into multiple maps in different stores ('sharding'). The plan is to add such a mechanism later when needed. For fully scalable concurrent write operations to a map (in-memory and to disk), the map could be split into multiple maps in different stores ('sharding'). The plan is to add such a mechanism later when needed.
@mvstore_1062_h3 @mvstore_1077_h3
Log Structured Storage Log Structured Storage
@mvstore_1063_p @mvstore_1078_p
Changes are buffered in memory, and once enough changes have accumulated, they are written in one continuous disk write operation. (According to a test, write throughput of a common SSD gets higher the larger the block size, until a block size of 2 MB, and then does not further increase.) By default, committed changes are automatically written once every second in a background thread, even if only little data was changed. Changes can also be written explicitly by calling <code>store()</code>. To avoid out of memory, uncommitted changes are also written when needed, however they are rolled back when closing the store, or at the latest (when the store was not correctly closed) when opening the store. Changes are buffered in memory, and once enough changes have accumulated, they are written in one continuous disk write operation. (According to a test, write throughput of a common SSD gets higher the larger the block size, until a block size of 2 MB, and then does not further increase.) By default, committed changes are automatically written once every second in a background thread, even if only little data was changed. Changes can also be written explicitly by calling <code>store()</code>. To avoid out of memory, uncommitted changes are also written when needed, however they are rolled back when closing the store, or at the latest (when the store was not correctly closed) when opening the store.
@mvstore_1064_p @mvstore_1079_p
When storing, all changed pages are serialized, optionally compressed using the LZF algorithm, and written sequentially to a free area of the file. Each such change set is called a chunk. All parent pages of the changed B-trees are stored in this chunk as well, so that each chunk also contains the root of each changed map (which is the entry point to read this version of the data). There is no separate index: all data is stored as a list of pages. Per store, there is one additional map that contains the metadata (the list of maps, where the root page of each map is stored, and the list of chunks). When storing, all changed pages are serialized, optionally compressed using the LZF algorithm, and written sequentially to a free area of the file. Each such change set is called a chunk. All parent pages of the changed B-trees are stored in this chunk as well, so that each chunk also contains the root of each changed map (which is the entry point to read this version of the data). There is no separate index: all data is stored as a list of pages. Per store, there is one additional map that contains the metadata (the list of maps, where the root page of each map is stored, and the list of chunks).
@mvstore_1065_p @mvstore_1080_p
There are usually two write operations per chunk: one to store the chunk data (the pages), and one to update the file header (so it points to the latest chunk). If the chunk is appended at the end of the file, the file header is only written at the end of the chunk. There are usually two write operations per chunk: one to store the chunk data (the pages), and one to update the file header (so it points to the latest chunk). If the chunk is appended at the end of the file, the file header is only written at the end of the chunk.
@mvstore_1066_p @mvstore_1081_p
There is no transaction log, no undo log, and there are no in-place updates (however unused chunks are overwritten by default). There is no transaction log, no undo log, and there are no in-place updates (however unused chunks are overwritten by default).
@mvstore_1067_p @mvstore_1082_p
Old data is kept for at least 45 seconds (configurable), so that there are no explicit sync operations required to guarantee data consistency, but an application can also sync explicitly when needed. To reuse disk space, the chunks with the lowest amount of live data are compacted (the live data is simply stored again in the next chunk). To improve data locality and disk space usage, the plan is to automatically defragment and compact data. Old data is kept for at least 45 seconds (configurable), so that there are no explicit sync operations required to guarantee data consistency, but an application can also sync explicitly when needed. To reuse disk space, the chunks with the lowest amount of live data are compacted (the live data is simply stored again in the next chunk). To improve data locality and disk space usage, the plan is to automatically defragment and compact data.
@mvstore_1068_p @mvstore_1083_p
Compared to traditional storage engines (that use a transaction log, undo log, and main storage area), the log structured storage is simpler, more flexible, and typically needs less disk operations per change, as data is only written once instead of twice or 3 times, and because the B-tree pages are always full (they are stored next to each other) and can be easily compressed. But temporarily, disk space usage might actually be a bit higher than for a regular database, as disk space is not immediately re-used (there are no in-place updates). Compared to traditional storage engines (that use a transaction log, undo log, and main storage area), the log structured storage is simpler, more flexible, and typically needs less disk operations per change, as data is only written once instead of twice or 3 times, and because the B-tree pages are always full (they are stored next to each other) and can be easily compressed. But temporarily, disk space usage might actually be a bit higher than for a regular database, as disk space is not immediately re-used (there are no in-place updates).
@mvstore_1069_h3 @mvstore_1084_h3
File System Abstraction, File Locking and Online Backup File System Abstraction, File Locking and Online Backup
@mvstore_1070_p @mvstore_1085_p
The file system is pluggable (the same file system abstraction is used as H2 uses). The file can be encrypted using an encrypting file system. Other file system implementations support reading from a compressed zip or jar file. The file system is pluggable (the same file system abstraction is used as H2 uses). The file can be encrypted using an encrypting file system. Other file system implementations support reading from a compressed zip or jar file.
@mvstore_1071_p @mvstore_1086_p
Each store may only be opened once within a JVM. When opening a store, the file is locked in exclusive mode, so that the file can only be changed from within one process. Files can be opened in read-only mode, in which case a shared lock is used. Each store may only be opened once within a JVM. When opening a store, the file is locked in exclusive mode, so that the file can only be changed from within one process. Files can be opened in read-only mode, in which case a shared lock is used.
@mvstore_1072_p @mvstore_1087_p
The persisted data can be backed up to a different file at any time, even during write operations (online backup). To do that, automatic disk space reuse needs to be first disabled, so that new data is always appended at the end of the file. Then, the file can be copied (the file handle is available to the application). The persisted data can be backed up to a different file at any time, even during write operations (online backup). To do that, automatic disk space reuse needs to be first disabled, so that new data is always appended at the end of the file. Then, the file can be copied (the file handle is available to the application).
@mvstore_1073_h3 @mvstore_1088_h3
Encrypted Files Encrypted Files
@mvstore_1074_p @mvstore_1089_p
File encryption ensures the data can only be read with the correct password. Data can be encrypted as follows: File encryption ensures the data can only be read with the correct password. Data can be encrypted as follows:
@mvstore_1075_p @mvstore_1090_p
The following algorithms and settings are used: The following algorithms and settings are used:
@mvstore_1076_li @mvstore_1091_li
The password char array is cleared after use, to reduce the risk that the password is stolen even if the attacker has access to the main memory. The password char array is cleared after use, to reduce the risk that the password is stolen even if the attacker has access to the main memory.
@mvstore_1077_li @mvstore_1092_li
The password is hashed according to the PBKDF2 standard, using the SHA-256 hash algorithm. The password is hashed according to the PBKDF2 standard, using the SHA-256 hash algorithm.
@mvstore_1078_li @mvstore_1093_li
The length of the salt is 64 bits, so that an attacker can not use a pre-calculated password hash table (rainbow table). It is generated using a cryptographically secure random number generator. The length of the salt is 64 bits, so that an attacker can not use a pre-calculated password hash table (rainbow table). It is generated using a cryptographically secure random number generator.
@mvstore_1079_li @mvstore_1094_li
To speed up opening an encrypted stores on Android, the number of PBKDF2 iterations is 10. The higher the value, the better the protection against brute-force password cracking attacks, but the slower is opening a file. To speed up opening an encrypted stores on Android, the number of PBKDF2 iterations is 10. The higher the value, the better the protection against brute-force password cracking attacks, but the slower is opening a file.
@mvstore_1080_li @mvstore_1095_li
The file itself is encrypted using the standardized disk encryption mode XTS-AES. Only little more than one AES-128 round per block is needed. The file itself is encrypted using the standardized disk encryption mode XTS-AES. Only little more than one AES-128 round per block is needed.
@mvstore_1081_h3 @mvstore_1096_h3
Tools Tools
@mvstore_1082_p @mvstore_1097_p
There is a tool (<code>MVStoreTool</code>) to dump the contents of a file. There is a tool (<code>MVStoreTool</code>) to dump the contents of a file.
@mvstore_1083_h3 @mvstore_1098_h3
Exception Handling Exception Handling
@mvstore_1084_p @mvstore_1099_p
This tool does not throw checked exceptions. Instead, unchecked exceptions are thrown if needed. The error message always contains the version of the tool. The following exceptions can occur: This tool does not throw checked exceptions. Instead, unchecked exceptions are thrown if needed. The error message always contains the version of the tool. The following exceptions can occur:
@mvstore_1085_code @mvstore_1100_code
IllegalStateException IllegalStateException
@mvstore_1086_li @mvstore_1101_li
if a map was already closed or an IO exception occurred, for example if the file was locked, is already closed, could not be opened or closed, if reading or writing failed, if the file is corrupt, or if there is an internal error in the tool. if a map was already closed or an IO exception occurred, for example if the file was locked, is already closed, could not be opened or closed, if reading or writing failed, if the file is corrupt, or if there is an internal error in the tool.
@mvstore_1087_code @mvstore_1102_code
IllegalArgumentException IllegalArgumentException
@mvstore_1088_li @mvstore_1103_li
if a method was called with an illegal argument. if a method was called with an illegal argument.
@mvstore_1089_code @mvstore_1104_code
UnsupportedOperationException UnsupportedOperationException
@mvstore_1090_li @mvstore_1105_li
if a method was called that is not supported, for example trying to modify a read-only map or view. if a method was called that is not supported, for example trying to modify a read-only map or view.
@mvstore_1091_code @mvstore_1106_code
ConcurrentModificationException ConcurrentModificationException
@mvstore_1092_li @mvstore_1107_li
if the object is modified concurrently. if the object is modified concurrently.
@mvstore_1093_h2 @mvstore_1108_h2
Similar Projects and Differences to Other Storage Engines Similar Projects and Differences to Other Storage Engines
@mvstore_1094_p @mvstore_1109_p
Unlike similar storage engines like LevelDB and Kyoto Cabinet, the MVStore is written in Java and can easily be embedded in a Java and Android application. Unlike similar storage engines like LevelDB and Kyoto Cabinet, the MVStore is written in Java and can easily be embedded in a Java and Android application.
@mvstore_1095_p @mvstore_1110_p
The MVStore is somewhat similar to the Berkeley DB Java Edition because it is also written in Java, and is also a log structured storage, but the H2 license is more liberal. The MVStore is somewhat similar to the Berkeley DB Java Edition because it is also written in Java, and is also a log structured storage, but the H2 license is more liberal.
@mvstore_1096_p @mvstore_1111_p
Like SQLite, the MVStore keeps all data in one file. Unlike SQLite, the MVStore uses is a log structured storage. The plan is to make the MVStore both easier to use as well as faster than SQLite. In a recent (very simple) test, the MVStore was about twice as fast as SQLite on Android. Like SQLite, the MVStore keeps all data in one file. Unlike SQLite, the MVStore uses is a log structured storage. The plan is to make the MVStore both easier to use as well as faster than SQLite. In a recent (very simple) test, the MVStore was about twice as fast as SQLite on Android.
@mvstore_1097_p @mvstore_1112_p
The API of the MVStore is similar to MapDB (previously known as JDBM) from Jan Kotek, and some code is shared between MapDB and JDBM. However, unlike MapDB, the MVStore uses is a log structured storage. The MVStore does not have a record size limit. The API of the MVStore is similar to MapDB (previously known as JDBM) from Jan Kotek, and some code is shared between MapDB and JDBM. However, unlike MapDB, the MVStore uses is a log structured storage. The MVStore does not have a record size limit.
@mvstore_1098_h2 @mvstore_1113_h2
Current State Current State
@mvstore_1099_p @mvstore_1114_p
The code is still experimental at this stage. The API as well as the behavior may partially change. Features may be added and removed (even thought the main features will stay). The code is still experimental at this stage. The API as well as the behavior may partially change. Features may be added and removed (even thought the main features will stay).
@mvstore_1100_h2 @mvstore_1115_h2
Requirements Requirements
@mvstore_1101_p @mvstore_1116_p
The MVStore is included in the latest H2 jar file. The MVStore is included in the latest H2 jar file.
@mvstore_1102_p @mvstore_1117_p
There are no special requirements to use it. The MVStore should run on any JVM as well as on Android. There are no special requirements to use it. The MVStore should run on any JVM as well as on Android.
@mvstore_1103_p @mvstore_1118_p
To build just the MVStore (without the database engine), run: To build just the MVStore (without the database engine), run:
@mvstore_1104_p @mvstore_1119_p
This will create the file <code>bin/h2mvstore-1.3.170.jar</code> (about 130 KB). This will create the file <code>bin/h2mvstore-1.3.170.jar</code> (about 130 KB).
@performance_1000_h1 @performance_1000_h1
......
...@@ -1208,87 +1208,90 @@ SSL/TLS 接続 ...@@ -1208,87 +1208,90 @@ SSL/TLS 接続
#There is no limit for the following entities, except the memory and storage capacity: maximum identifier length (table name, column name, and so on); maximum number of tables, columns, indexes, triggers, and other database objects; maximum statement length, number of parameters per statement, tables per statement, expressions in order by, group by, having, and so on; maximum rows per query; maximum columns per table, columns per index, indexes per table, lob columns per table, and so on; maximum row length, index row length, select row length; maximum length of a varchar column, decimal column, literal in a statement. #There is no limit for the following entities, except the memory and storage capacity: maximum identifier length (table name, column name, and so on); maximum number of tables, columns, indexes, triggers, and other database objects; maximum statement length, number of parameters per statement, tables per statement, expressions in order by, group by, having, and so on; maximum rows per query; maximum columns per table, columns per index, indexes per table, lob columns per table, and so on; maximum row length, index row length, select row length; maximum length of a varchar column, decimal column, literal in a statement.
@advanced_1403_li @advanced_1403_li
#Querying from the metadata tables is slow if there are many tables (thousands).
@advanced_1404_li
#For limitations on data types, see the documentation of the respective Java data type or the data type documentation of this database. #For limitations on data types, see the documentation of the respective Java data type or the data type documentation of this database.
@advanced_1404_h2 @advanced_1405_h2
用語集とリンク 用語集とリンク
@advanced_1405_th @advanced_1406_th
用語 用語
@advanced_1406_th @advanced_1407_th
説明 説明
@advanced_1407_td @advanced_1408_td
AES-128 AES-128
@advanced_1408_td @advanced_1409_td
#A block encryption algorithm. See also: <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">Wikipedia: AES</a> #A block encryption algorithm. See also: <a href="http://en.wikipedia.org/wiki/Advanced_Encryption_Standard">Wikipedia: AES</a>
@advanced_1409_td @advanced_1410_td
Birthday Paradox Birthday Paradox
@advanced_1410_td @advanced_1411_td
#Describes the higher than expected probability that two persons in a room have the same birthday. Also valid for randomly generated UUIDs. See also: <a href="http://en.wikipedia.org/wiki/Birthday_paradox">Wikipedia: Birthday Paradox</a> #Describes the higher than expected probability that two persons in a room have the same birthday. Also valid for randomly generated UUIDs. See also: <a href="http://en.wikipedia.org/wiki/Birthday_paradox">Wikipedia: Birthday Paradox</a>
@advanced_1411_td @advanced_1412_td
Digest Digest
@advanced_1412_td @advanced_1413_td
#Protocol to protect a password (but not to protect data). See also: <a href="http://www.faqs.org/rfcs/rfc2617.html">RFC 2617: HTTP Digest Access Authentication</a> #Protocol to protect a password (but not to protect data). See also: <a href="http://www.faqs.org/rfcs/rfc2617.html">RFC 2617: HTTP Digest Access Authentication</a>
@advanced_1413_td @advanced_1414_td
GCJ GCJ
@advanced_1414_td @advanced_1415_td
#Compiler for Java. <a href="http://gcc.gnu.org/java">GNU Compiler for the Java</a> and <a href="http://www.dobysoft.com/products/nativej">NativeJ (commercial)</a> #Compiler for Java. <a href="http://gcc.gnu.org/java">GNU Compiler for the Java</a> and <a href="http://www.dobysoft.com/products/nativej">NativeJ (commercial)</a>
@advanced_1415_td @advanced_1416_td
HTTPS HTTPS
@advanced_1416_td @advanced_1417_td
#A protocol to provide security to HTTP connections. See also: <a href="http://www.ietf.org/rfc/rfc2818.txt">RFC 2818: HTTP Over TLS</a> #A protocol to provide security to HTTP connections. See also: <a href="http://www.ietf.org/rfc/rfc2818.txt">RFC 2818: HTTP Over TLS</a>
@advanced_1417_td @advanced_1418_td
Modes of Operation Modes of Operation
@advanced_1418_a @advanced_1419_a
#Wikipedia: Block cipher modes of operation #Wikipedia: Block cipher modes of operation
@advanced_1419_td @advanced_1420_td
Salt Salt
@advanced_1420_td @advanced_1421_td
#Random number to increase the security of passwords. See also: <a href="http://en.wikipedia.org/wiki/Key_derivation_function">Wikipedia: Key derivation function</a> #Random number to increase the security of passwords. See also: <a href="http://en.wikipedia.org/wiki/Key_derivation_function">Wikipedia: Key derivation function</a>
@advanced_1421_td @advanced_1422_td
SHA-256 SHA-256
@advanced_1422_td @advanced_1423_td
#A cryptographic one-way hash function. See also: <a href="http://en.wikipedia.org/wiki/SHA_family">Wikipedia: SHA hash functions</a> #A cryptographic one-way hash function. See also: <a href="http://en.wikipedia.org/wiki/SHA_family">Wikipedia: SHA hash functions</a>
@advanced_1423_td @advanced_1424_td
SQLインジェクション SQLインジェクション
@advanced_1424_td @advanced_1425_td
#A security vulnerability where an application embeds SQL statements or expressions in user input. See also: <a href="http://en.wikipedia.org/wiki/SQL_injection">Wikipedia: SQL Injection</a> #A security vulnerability where an application embeds SQL statements or expressions in user input. See also: <a href="http://en.wikipedia.org/wiki/SQL_injection">Wikipedia: SQL Injection</a>
@advanced_1425_td @advanced_1426_td
Watermark Attack (透かし攻撃) Watermark Attack (透かし攻撃)
@advanced_1426_td @advanced_1427_td
#Security problem of certain encryption programs where the existence of certain data can be proven without decrypting. For more information, search in the internet for 'watermark attack cryptoloop' #Security problem of certain encryption programs where the existence of certain data can be proven without decrypting. For more information, search in the internet for 'watermark attack cryptoloop'
@advanced_1427_td @advanced_1428_td
SSL/TLS SSL/TLS
@advanced_1428_td @advanced_1429_td
#Secure Sockets Layer / Transport Layer Security. See also: <a href="http://java.sun.com/products/jsse/">Java Secure Socket Extension (JSSE)</a> #Secure Sockets Layer / Transport Layer Security. See also: <a href="http://java.sun.com/products/jsse/">Java Secure Socket Extension (JSSE)</a>
@advanced_1429_td @advanced_1430_td
XTEA XTEA
@advanced_1430_td @advanced_1431_td
#A block encryption algorithm. See also: <a href="http://en.wikipedia.org/wiki/XTEA">Wikipedia: XTEA</a> #A block encryption algorithm. See also: <a href="http://en.wikipedia.org/wiki/XTEA">Wikipedia: XTEA</a>
@build_1000_h1 @build_1000_h1
...@@ -3848,7 +3851,7 @@ jdbc:h2:mem: ...@@ -3848,7 +3851,7 @@ jdbc:h2:mem:
接続が開始された時に他の設定を変更する 接続が開始された時に他の設定を変更する
@features_1389_p @features_1389_p
# In addition to the settings already described, other database settings can be passed in the database URL. Adding <code>;setting=value</code> at the end of a database URL is the same as executing the statement <code>SET setting value</code> just after connecting. For a list of supported settings, see <a href="grammar.html">SQL Grammar</a>. # In addition to the settings already described, other database settings can be passed in the database URL. Adding <code>;setting=value</code> at the end of a database URL is the same as executing the statement <code>SET setting value</code> just after connecting. For a list of supported settings, see <a href="grammar.html">SQL Grammar</a> or the <a href="../javadoc/org/h2/constant/DbSettings.html">DbSettings</a> javadoc.
@features_1390_h2 @features_1390_h2
カスタムファイル アクセスモード カスタムファイル アクセスモード
...@@ -6799,301 +6802,346 @@ H2 データベース エンジン ...@@ -6799,301 +6802,346 @@ H2 データベース エンジン
@mvstore_1005_a @mvstore_1005_a
# Features # Features
@mvstore_1006_a @mvstore_1006_div
# - <a href="#maps">Maps</a>
@mvstore_1007_div
# - <a href="#versions">Versions</a>
@mvstore_1008_div
# - <a href="#transactions">Transactions</a>
@mvstore_1009_div
# - <a href="#inMemory">In-Memory Performance and Usage</a>
@mvstore_1010_div
# - <a href="#dataTypes">Pluggable Data Types</a>
@mvstore_1011_div
# - <a href="#blob">BLOB Support</a>
@mvstore_1012_div
# - <a href="#pluggableMap">R-Tree and Pluggable Map Implementations</a>
@mvstore_1013_div
# - <a href="#caching">Concurrent Operations and Caching</a>
@mvstore_1014_div
# - <a href="#logStructured">Log Structured Storage</a>
@mvstore_1015_div
# - <a href="#fileSystem">File System Abstraction, File Locking and Online Backup</a>
@mvstore_1016_div
# - <a href="#encryption">Encrypted Files</a>
@mvstore_1017_div
# - <a href="#tools">Tools</a>
@mvstore_1018_div
# - <a href="#exceptionHandling">Exception Handling</a>
@mvstore_1019_a
# Similar Projects and Differences to Other Storage Engines # Similar Projects and Differences to Other Storage Engines
@mvstore_1007_a @mvstore_1020_a
# Current State # Current State
@mvstore_1008_a @mvstore_1021_a
# Requirements # Requirements
@mvstore_1009_h2 @mvstore_1022_h2
#Overview #Overview
@mvstore_1010_p @mvstore_1023_p
# The MVStore is work in progress, and is planned to be the next storage subsystem of H2. But it can be also directly within an application, without using JDBC or SQL. # The MVStore is work in progress, and is planned to be the next storage subsystem of H2. But it can be also directly within an application, without using JDBC or SQL.
@mvstore_1011_li @mvstore_1024_li
#MVStore stands for "multi-version store". #MVStore stands for "multi-version store".
@mvstore_1012_li @mvstore_1025_li
#Each store contains a number of maps (using the <code>java.util.Map</code> interface). #Each store contains a number of maps (using the <code>java.util.Map</code> interface).
@mvstore_1013_li @mvstore_1026_li
#Both file-based persistence and in-memory operation are supported. #Both file-based persistence and in-memory operation are supported.
@mvstore_1014_li @mvstore_1027_li
#It is intended to be fast, simple to use, and small. #It is intended to be fast, simple to use, and small.
@mvstore_1015_li @mvstore_1028_li
#Old versions of the data can be read concurrently with all other operations. #Old versions of the data can be read concurrently with all other operations.
@mvstore_1016_li @mvstore_1029_li
#Transaction are supported (currently only one transaction at a time). #Transaction are supported.
@mvstore_1017_li
#Transactions (even if they are persisted) can be rolled back.
@mvstore_1018_li @mvstore_1030_li
#The tool is very modular. It supports pluggable data types / serialization, pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage, and a file system abstraction to support encrypted files and zip files. #The tool is very modular. It supports pluggable data types / serialization, pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage, and a file system abstraction to support encrypted files and zip files.
@mvstore_1019_h2 @mvstore_1031_h2
#Example Code #Example Code
@mvstore_1020_p @mvstore_1032_p
# The following sample code show how to create a store, open a map, add some data, and access the current and an old version: # The following sample code show how to create a store, open a map, add some data, and access the current and an old version:
@mvstore_1021_h2 @mvstore_1033_h2
#Store Builder #Store Builder
@mvstore_1022_p @mvstore_1034_p
# The <code>MVStore.Builder</code> provides a fluid interface to build a store if more complex configuration options are used. The following code contains all supported configuration options: # The <code>MVStore.Builder</code> provides a fluid interface to build a store if more complex configuration options are used. The following code contains all supported configuration options:
@mvstore_1023_li @mvstore_1035_li
#cacheSizeMB: the cache size in MB. #cacheSizeMB: the cache size in MB.
@mvstore_1024_li @mvstore_1036_li
#compressData: compress the data when storing. #compressData: compress the data when storing.
@mvstore_1025_li @mvstore_1037_li
#encryptionKey: the encryption key for file encryption. #encryptionKey: the encryption key for file encryption.
@mvstore_1026_li @mvstore_1038_li
#fileName: the name of the file, for file based stores. #fileName: the name of the file, for file based stores.
@mvstore_1027_li @mvstore_1039_li
#readOnly: open the file in read-only mode. #readOnly: open the file in read-only mode.
@mvstore_1028_li @mvstore_1040_li
#writeBufferSize: the size of the write buffer in MB. #writeBufferSize: the size of the write buffer in MB.
@mvstore_1029_li @mvstore_1041_li
#writeDelay: the maximum delay until committed changes are stored (unless stored explicitly). #writeDelay: the maximum delay until committed changes are stored (unless stored explicitly).
@mvstore_1030_h2 @mvstore_1042_h2
#R-Tree #R-Tree
@mvstore_1031_p @mvstore_1043_p
# The <code>MVRTreeMap</code> is an R-tree implementation that supports fast spatial queries. It can be used as follows: # The <code>MVRTreeMap</code> is an R-tree implementation that supports fast spatial queries. It can be used as follows:
@mvstore_1032_p @mvstore_1044_p
# The default number of dimensions is 2. To use a different number of dimensions, call <code>new MVRTreeMap.Builder&lt;String&gt;().dimensions(3)</code>. The minimum number of dimensions is 1, the maximum is 255. # The default number of dimensions is 2. To use a different number of dimensions, call <code>new MVRTreeMap.Builder&lt;String&gt;().dimensions(3)</code>. The minimum number of dimensions is 1, the maximum is 255.
@mvstore_1033_h2 @mvstore_1045_h2
特徴 特徴
@mvstore_1034_h3 @mvstore_1046_h3
#Maps #Maps
@mvstore_1035_p @mvstore_1047_p
# Each store supports a set of named maps. A map is sorted by key, and supports the common lookup operations, including access to the first and last key, iterate over some or all keys, and so on. # Each store supports a set of named maps. A map is sorted by key, and supports the common lookup operations, including access to the first and last key, iterate over some or all keys, and so on.
@mvstore_1036_p @mvstore_1048_p
# Also supported, and very uncommon for maps, is fast index lookup: the keys of the map can be accessed like a list (get the key at the given index, get the index of a certain key). That means getting the median of two keys is trivial, and range of keys can be counted very quickly. The iterator supports fast skipping. This is possible because internally, each map is organized in the form of a counted B+-tree. # Also supported, and very uncommon for maps, is fast index lookup: the keys of the map can be accessed like a list (get the key at the given index, get the index of a certain key). That means getting the median of two keys is trivial, and range of keys can be counted very quickly. The iterator supports fast skipping. This is possible because internally, each map is organized in the form of a counted B+-tree.
@mvstore_1037_p @mvstore_1049_p
# In database terms, a map can be used like a table, where the key of the map is the primary key of the table, and the value is the row. A map can also represent an index, where the key of the map is the key of the index, and the value of the map is the primary key of the table (for non-unique indexes, the key of the map must also contain the primary key). # In database terms, a map can be used like a table, where the key of the map is the primary key of the table, and the value is the row. A map can also represent an index, where the key of the map is the key of the index, and the value of the map is the primary key of the table (for non-unique indexes, the key of the map must also contain the primary key).
@mvstore_1038_h3 @mvstore_1050_h3
#Versions / Transactions #Versions
@mvstore_1039_p @mvstore_1051_p
# Multiple versions are supported. A version is a snapshot of all the data of all maps at a given point in time. A transaction is a number of actions between two versions. # Multiple versions are supported. A version is a snapshot of all the data of all maps at a given point in time. A transaction is a number of actions between two versions.
@mvstore_1040_p @mvstore_1052_p
# Versions / transactions are not immediately persisted; instead, only the version counter is incremented. If there is a change after switching to a new version, a snapshot of the old version is kept in memory, so that it can still be read. # Versions are not immediately persisted; instead, only the version counter is incremented. If there is a change after switching to a new version, a snapshot of the old version is kept in memory, so that it can still be read.
@mvstore_1041_p @mvstore_1053_p
# Old persisted versions are readable until the old data was explicitly overwritten. Creating a snapshot is fast: only the pages that are changed after a snapshot are copied. This behavior is also called COW (copy on write). # Old persisted versions are readable until the old data was explicitly overwritten. Creating a snapshot is fast: only the pages that are changed after a snapshot are copied. This behavior is also called COW (copy on write).
@mvstore_1042_p @mvstore_1054_p
# Rollback is supported (rollback to any old in-memory version or an old persisted version). # Rollback is supported (rollback to any old in-memory version or an old persisted version).
@mvstore_1043_h3 @mvstore_1055_h3
#Transactions
@mvstore_1056_p
# The multi-version support is the basis for the transaction support. In the simple case, when only one transaction is open at a time, rolling back the transaction only requires to revert to an old version.
@mvstore_1057_p
# To support multiple concurrent open transactions, a transaction utility is included, the <code>TransactionStore</code>. This utility stores the changed entries in a separate map, similar to a transaction log (except that only the key of a changed row is stored, and the entries of a transaction are removed when the transaction is committed). The storage overhead of this utility is very small compared to the overhead of a regular transaction log. The tool supports PostgreSQL style "read committed" transaction isolation. There is no limit on the size of a transaction (the log is not kept in memory).
@mvstore_1058_h3
#In-Memory Performance and Usage #In-Memory Performance and Usage
@mvstore_1044_p @mvstore_1059_p
# Performance of in-memory operations is comparable with <code>java.util.TreeMap</code> (many operations are actually faster), but usually slower than <code>java.util.HashMap</code>. # Performance of in-memory operations is comparable with <code>java.util.TreeMap</code> (many operations are actually faster), but usually slower than <code>java.util.HashMap</code>.
@mvstore_1045_p @mvstore_1060_p
# The memory overhead for large maps is slightly better than for the regular map implementations, but there is a higher overhead per map. For maps with less than 25 entries, the regular map implementations use less memory on average. # The memory overhead for large maps is slightly better than for the regular map implementations, but there is a higher overhead per map. For maps with less than 25 entries, the regular map implementations use less memory on average.
@mvstore_1046_p @mvstore_1061_p
# If no file name is specified, the store operates purely in memory. Except for persisting data, all features are supported in this mode (multi-versioning, index lookup, R-tree and so on). If a file name is specified, all operations occur in memory (with the same performance characteristics) until data is persisted. # If no file name is specified, the store operates purely in memory. Except for persisting data, all features are supported in this mode (multi-versioning, index lookup, R-tree and so on). If a file name is specified, all operations occur in memory (with the same performance characteristics) until data is persisted.
@mvstore_1047_h3 @mvstore_1062_h3
#Pluggable Data Types #Pluggable Data Types
@mvstore_1048_p @mvstore_1063_p
# Serialization is pluggable. The default serialization currently supports many common data types, and uses Java serialization for other objects. The following classes are currently directly supported: <code>Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal, String, UUID, Date</code> and arrays (both primitive arrays and object arrays). # Serialization is pluggable. The default serialization currently supports many common data types, and uses Java serialization for other objects. The following classes are currently directly supported: <code>Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal, String, UUID, Date</code> and arrays (both primitive arrays and object arrays).
@mvstore_1049_p @mvstore_1064_p
# Parameterized data types are supported (for example one could build a string data type that limits the length for some reason). # Parameterized data types are supported (for example one could build a string data type that limits the length for some reason).
@mvstore_1050_p @mvstore_1065_p
# The storage engine itself does not have any length limits, so that keys, values, pages, and chunks can be very big (as big as fits in memory). Also, there is no inherent limit to the number of maps and chunks. Due to using a log structured storage, there is no special case handling for large keys or pages. # The storage engine itself does not have any length limits, so that keys, values, pages, and chunks can be very big (as big as fits in memory). Also, there is no inherent limit to the number of maps and chunks. Due to using a log structured storage, there is no special case handling for large keys or pages.
@mvstore_1051_h3 @mvstore_1066_h3
#BLOB Support #BLOB Support
@mvstore_1052_p @mvstore_1067_p
# There is a mechanism that stores large binary objects by splitting them into smaller blocks. This allows to store objects that don't fit in memory. Streaming as well as random access reads on such objects are supported. This tool is written on top of the store (only using the map interface). # There is a mechanism that stores large binary objects by splitting them into smaller blocks. This allows to store objects that don't fit in memory. Streaming as well as random access reads on such objects are supported. This tool is written on top of the store (only using the map interface).
@mvstore_1053_h3 @mvstore_1068_h3
#R-Tree and Pluggable Map Implementations #R-Tree and Pluggable Map Implementations
@mvstore_1054_p @mvstore_1069_p
# The map implementation is pluggable. In addition to the default <code>MVMap</code> (multi-version map), there is a multi-version R-tree map implementation for spatial operations (contain and intersection; nearest neighbor is not yet implemented). # The map implementation is pluggable. In addition to the default <code>MVMap</code> (multi-version map), there is a multi-version R-tree map implementation for spatial operations (contain and intersection; nearest neighbor is not yet implemented).
@mvstore_1055_h3 @mvstore_1070_h3
#Concurrent Operations and Caching #Concurrent Operations and Caching
@mvstore_1056_p @mvstore_1071_p
# The default map implementation supports concurrent reads on old versions of the data. All such read operations can occur in parallel. Concurrent reads from the page cache, as well as concurrent reads from the file system are supported. # The default map implementation supports concurrent reads on old versions of the data. All such read operations can occur in parallel. Concurrent reads from the page cache, as well as concurrent reads from the file system are supported.
@mvstore_1057_p @mvstore_1072_p
# Storing changes can occur concurrently to modifying the data, as it operates on a snapshot. # Storing changes can occur concurrently to modifying the data, as it operates on a snapshot.
@mvstore_1058_p @mvstore_1073_p
# Caching is done on the page level. The page cache is a concurrent LIRS cache, which should be resistant against scan operations. # Caching is done on the page level. The page cache is a concurrent LIRS cache, which should be resistant against scan operations.
@mvstore_1059_p @mvstore_1074_p
# The default map implementation does not support concurrent modification operations on a map (the same as <code>HashMap</code> and <code>TreeMap</code>). Similar to those classes, the map tries to detect concurrent modification. # The default map implementation does not support concurrent modification operations on a map (the same as <code>HashMap</code> and <code>TreeMap</code>). Similar to those classes, the map tries to detect concurrent modification.
@mvstore_1060_p @mvstore_1075_p
# With the <code>MVMapConcurrent</code> implementation, read operations even on the newest version can happen concurrently with all other operations, without risk of corruption. This comes with slightly reduced speed in single threaded mode, the same as with other <code>ConcurrentHashMap</code> implementations. Write operations first read the relevant area from disk to memory (this can happen concurrently), and only then modify the data. The in-memory part of write operations is synchronized. # With the <code>MVMapConcurrent</code> implementation, read operations even on the newest version can happen concurrently with all other operations, without risk of corruption. This comes with slightly reduced speed in single threaded mode, the same as with other <code>ConcurrentHashMap</code> implementations. Write operations first read the relevant area from disk to memory (this can happen concurrently), and only then modify the data. The in-memory part of write operations is synchronized.
@mvstore_1061_p @mvstore_1076_p
# For fully scalable concurrent write operations to a map (in-memory and to disk), the map could be split into multiple maps in different stores ('sharding'). The plan is to add such a mechanism later when needed. # For fully scalable concurrent write operations to a map (in-memory and to disk), the map could be split into multiple maps in different stores ('sharding'). The plan is to add such a mechanism later when needed.
@mvstore_1062_h3 @mvstore_1077_h3
#Log Structured Storage #Log Structured Storage
@mvstore_1063_p @mvstore_1078_p
# Changes are buffered in memory, and once enough changes have accumulated, they are written in one continuous disk write operation. (According to a test, write throughput of a common SSD gets higher the larger the block size, until a block size of 2 MB, and then does not further increase.) By default, committed changes are automatically written once every second in a background thread, even if only little data was changed. Changes can also be written explicitly by calling <code>store()</code>. To avoid out of memory, uncommitted changes are also written when needed, however they are rolled back when closing the store, or at the latest (when the store was not correctly closed) when opening the store. # Changes are buffered in memory, and once enough changes have accumulated, they are written in one continuous disk write operation. (According to a test, write throughput of a common SSD gets higher the larger the block size, until a block size of 2 MB, and then does not further increase.) By default, committed changes are automatically written once every second in a background thread, even if only little data was changed. Changes can also be written explicitly by calling <code>store()</code>. To avoid out of memory, uncommitted changes are also written when needed, however they are rolled back when closing the store, or at the latest (when the store was not correctly closed) when opening the store.
@mvstore_1064_p @mvstore_1079_p
# When storing, all changed pages are serialized, optionally compressed using the LZF algorithm, and written sequentially to a free area of the file. Each such change set is called a chunk. All parent pages of the changed B-trees are stored in this chunk as well, so that each chunk also contains the root of each changed map (which is the entry point to read this version of the data). There is no separate index: all data is stored as a list of pages. Per store, there is one additional map that contains the metadata (the list of maps, where the root page of each map is stored, and the list of chunks). # When storing, all changed pages are serialized, optionally compressed using the LZF algorithm, and written sequentially to a free area of the file. Each such change set is called a chunk. All parent pages of the changed B-trees are stored in this chunk as well, so that each chunk also contains the root of each changed map (which is the entry point to read this version of the data). There is no separate index: all data is stored as a list of pages. Per store, there is one additional map that contains the metadata (the list of maps, where the root page of each map is stored, and the list of chunks).
@mvstore_1065_p @mvstore_1080_p
# There are usually two write operations per chunk: one to store the chunk data (the pages), and one to update the file header (so it points to the latest chunk). If the chunk is appended at the end of the file, the file header is only written at the end of the chunk. # There are usually two write operations per chunk: one to store the chunk data (the pages), and one to update the file header (so it points to the latest chunk). If the chunk is appended at the end of the file, the file header is only written at the end of the chunk.
@mvstore_1066_p @mvstore_1081_p
# There is no transaction log, no undo log, and there are no in-place updates (however unused chunks are overwritten by default). # There is no transaction log, no undo log, and there are no in-place updates (however unused chunks are overwritten by default).
@mvstore_1067_p @mvstore_1082_p
# Old data is kept for at least 45 seconds (configurable), so that there are no explicit sync operations required to guarantee data consistency, but an application can also sync explicitly when needed. To reuse disk space, the chunks with the lowest amount of live data are compacted (the live data is simply stored again in the next chunk). To improve data locality and disk space usage, the plan is to automatically defragment and compact data. # Old data is kept for at least 45 seconds (configurable), so that there are no explicit sync operations required to guarantee data consistency, but an application can also sync explicitly when needed. To reuse disk space, the chunks with the lowest amount of live data are compacted (the live data is simply stored again in the next chunk). To improve data locality and disk space usage, the plan is to automatically defragment and compact data.
@mvstore_1068_p @mvstore_1083_p
# Compared to traditional storage engines (that use a transaction log, undo log, and main storage area), the log structured storage is simpler, more flexible, and typically needs less disk operations per change, as data is only written once instead of twice or 3 times, and because the B-tree pages are always full (they are stored next to each other) and can be easily compressed. But temporarily, disk space usage might actually be a bit higher than for a regular database, as disk space is not immediately re-used (there are no in-place updates). # Compared to traditional storage engines (that use a transaction log, undo log, and main storage area), the log structured storage is simpler, more flexible, and typically needs less disk operations per change, as data is only written once instead of twice or 3 times, and because the B-tree pages are always full (they are stored next to each other) and can be easily compressed. But temporarily, disk space usage might actually be a bit higher than for a regular database, as disk space is not immediately re-used (there are no in-place updates).
@mvstore_1069_h3 @mvstore_1084_h3
#File System Abstraction, File Locking and Online Backup #File System Abstraction, File Locking and Online Backup
@mvstore_1070_p @mvstore_1085_p
# The file system is pluggable (the same file system abstraction is used as H2 uses). The file can be encrypted using an encrypting file system. Other file system implementations support reading from a compressed zip or jar file. # The file system is pluggable (the same file system abstraction is used as H2 uses). The file can be encrypted using an encrypting file system. Other file system implementations support reading from a compressed zip or jar file.
@mvstore_1071_p @mvstore_1086_p
# Each store may only be opened once within a JVM. When opening a store, the file is locked in exclusive mode, so that the file can only be changed from within one process. Files can be opened in read-only mode, in which case a shared lock is used. # Each store may only be opened once within a JVM. When opening a store, the file is locked in exclusive mode, so that the file can only be changed from within one process. Files can be opened in read-only mode, in which case a shared lock is used.
@mvstore_1072_p @mvstore_1087_p
# The persisted data can be backed up to a different file at any time, even during write operations (online backup). To do that, automatic disk space reuse needs to be first disabled, so that new data is always appended at the end of the file. Then, the file can be copied (the file handle is available to the application). # The persisted data can be backed up to a different file at any time, even during write operations (online backup). To do that, automatic disk space reuse needs to be first disabled, so that new data is always appended at the end of the file. Then, the file can be copied (the file handle is available to the application).
@mvstore_1073_h3 @mvstore_1088_h3
#Encrypted Files #Encrypted Files
@mvstore_1074_p @mvstore_1089_p
# File encryption ensures the data can only be read with the correct password. Data can be encrypted as follows: # File encryption ensures the data can only be read with the correct password. Data can be encrypted as follows:
@mvstore_1075_p @mvstore_1090_p
# The following algorithms and settings are used: # The following algorithms and settings are used:
@mvstore_1076_li @mvstore_1091_li
#The password char array is cleared after use, to reduce the risk that the password is stolen even if the attacker has access to the main memory. #The password char array is cleared after use, to reduce the risk that the password is stolen even if the attacker has access to the main memory.
@mvstore_1077_li @mvstore_1092_li
#The password is hashed according to the PBKDF2 standard, using the SHA-256 hash algorithm. #The password is hashed according to the PBKDF2 standard, using the SHA-256 hash algorithm.
@mvstore_1078_li @mvstore_1093_li
#The length of the salt is 64 bits, so that an attacker can not use a pre-calculated password hash table (rainbow table). It is generated using a cryptographically secure random number generator. #The length of the salt is 64 bits, so that an attacker can not use a pre-calculated password hash table (rainbow table). It is generated using a cryptographically secure random number generator.
@mvstore_1079_li @mvstore_1094_li
#To speed up opening an encrypted stores on Android, the number of PBKDF2 iterations is 10. The higher the value, the better the protection against brute-force password cracking attacks, but the slower is opening a file. #To speed up opening an encrypted stores on Android, the number of PBKDF2 iterations is 10. The higher the value, the better the protection against brute-force password cracking attacks, but the slower is opening a file.
@mvstore_1080_li @mvstore_1095_li
#The file itself is encrypted using the standardized disk encryption mode XTS-AES. Only little more than one AES-128 round per block is needed. #The file itself is encrypted using the standardized disk encryption mode XTS-AES. Only little more than one AES-128 round per block is needed.
@mvstore_1081_h3 @mvstore_1096_h3
#Tools #Tools
@mvstore_1082_p @mvstore_1097_p
# There is a tool (<code>MVStoreTool</code>) to dump the contents of a file. # There is a tool (<code>MVStoreTool</code>) to dump the contents of a file.
@mvstore_1083_h3 @mvstore_1098_h3
#Exception Handling #Exception Handling
@mvstore_1084_p @mvstore_1099_p
# This tool does not throw checked exceptions. Instead, unchecked exceptions are thrown if needed. The error message always contains the version of the tool. The following exceptions can occur: # This tool does not throw checked exceptions. Instead, unchecked exceptions are thrown if needed. The error message always contains the version of the tool. The following exceptions can occur:
@mvstore_1085_code @mvstore_1100_code
#IllegalStateException #IllegalStateException
@mvstore_1086_li @mvstore_1101_li
# if a map was already closed or an IO exception occurred, for example if the file was locked, is already closed, could not be opened or closed, if reading or writing failed, if the file is corrupt, or if there is an internal error in the tool. # if a map was already closed or an IO exception occurred, for example if the file was locked, is already closed, could not be opened or closed, if reading or writing failed, if the file is corrupt, or if there is an internal error in the tool.
@mvstore_1087_code @mvstore_1102_code
#IllegalArgumentException #IllegalArgumentException
@mvstore_1088_li @mvstore_1103_li
# if a method was called with an illegal argument. # if a method was called with an illegal argument.
@mvstore_1089_code @mvstore_1104_code
#UnsupportedOperationException #UnsupportedOperationException
@mvstore_1090_li @mvstore_1105_li
# if a method was called that is not supported, for example trying to modify a read-only map or view. # if a method was called that is not supported, for example trying to modify a read-only map or view.
@mvstore_1091_code @mvstore_1106_code
#ConcurrentModificationException #ConcurrentModificationException
@mvstore_1092_li @mvstore_1107_li
# if the object is modified concurrently. # if the object is modified concurrently.
@mvstore_1093_h2 @mvstore_1108_h2
#Similar Projects and Differences to Other Storage Engines #Similar Projects and Differences to Other Storage Engines
@mvstore_1094_p @mvstore_1109_p
# Unlike similar storage engines like LevelDB and Kyoto Cabinet, the MVStore is written in Java and can easily be embedded in a Java and Android application. # Unlike similar storage engines like LevelDB and Kyoto Cabinet, the MVStore is written in Java and can easily be embedded in a Java and Android application.
@mvstore_1095_p @mvstore_1110_p
# The MVStore is somewhat similar to the Berkeley DB Java Edition because it is also written in Java, and is also a log structured storage, but the H2 license is more liberal. # The MVStore is somewhat similar to the Berkeley DB Java Edition because it is also written in Java, and is also a log structured storage, but the H2 license is more liberal.
@mvstore_1096_p @mvstore_1111_p
# Like SQLite, the MVStore keeps all data in one file. Unlike SQLite, the MVStore uses is a log structured storage. The plan is to make the MVStore both easier to use as well as faster than SQLite. In a recent (very simple) test, the MVStore was about twice as fast as SQLite on Android. # Like SQLite, the MVStore keeps all data in one file. Unlike SQLite, the MVStore uses is a log structured storage. The plan is to make the MVStore both easier to use as well as faster than SQLite. In a recent (very simple) test, the MVStore was about twice as fast as SQLite on Android.
@mvstore_1097_p @mvstore_1112_p
# The API of the MVStore is similar to MapDB (previously known as JDBM) from Jan Kotek, and some code is shared between MapDB and JDBM. However, unlike MapDB, the MVStore uses is a log structured storage. The MVStore does not have a record size limit. # The API of the MVStore is similar to MapDB (previously known as JDBM) from Jan Kotek, and some code is shared between MapDB and JDBM. However, unlike MapDB, the MVStore uses is a log structured storage. The MVStore does not have a record size limit.
@mvstore_1098_h2 @mvstore_1113_h2
#Current State #Current State
@mvstore_1099_p @mvstore_1114_p
# The code is still experimental at this stage. The API as well as the behavior may partially change. Features may be added and removed (even thought the main features will stay). # The code is still experimental at this stage. The API as well as the behavior may partially change. Features may be added and removed (even thought the main features will stay).
@mvstore_1100_h2 @mvstore_1115_h2
必要条件 必要条件
@mvstore_1101_p @mvstore_1116_p
# The MVStore is included in the latest H2 jar file. # The MVStore is included in the latest H2 jar file.
@mvstore_1102_p @mvstore_1117_p
# There are no special requirements to use it. The MVStore should run on any JVM as well as on Android. # There are no special requirements to use it. The MVStore should run on any JVM as well as on Android.
@mvstore_1103_p @mvstore_1118_p
# To build just the MVStore (without the database engine), run: # To build just the MVStore (without the database engine), run:
@mvstore_1104_p @mvstore_1119_p
# This will create the file <code>bin/h2mvstore-1.3.170.jar</code> (about 130 KB). # This will create the file <code>bin/h2mvstore-1.3.170.jar</code> (about 130 KB).
@performance_1000_h1 @performance_1000_h1
......
...@@ -401,34 +401,35 @@ advanced_1399_li=The maximum number of rows per table is 2^64. ...@@ -401,34 +401,35 @@ advanced_1399_li=The maximum number of rows per table is 2^64.
advanced_1400_li=Main memory requirements\: The larger the database, the more main memory is required. With the version 1.1 storage mechanism, the minimum main memory required for a 12 GB database was around 240 MB. With the current page store, the minimum main memory required is much lower, around 1 MB for each 8 GB database file size. advanced_1400_li=Main memory requirements\: The larger the database, the more main memory is required. With the version 1.1 storage mechanism, the minimum main memory required for a 12 GB database was around 240 MB. With the current page store, the minimum main memory required is much lower, around 1 MB for each 8 GB database file size.
advanced_1401_li=Limit on the complexity of SQL statements. Statements of the following form will result in a stack overflow exception\: advanced_1401_li=Limit on the complexity of SQL statements. Statements of the following form will result in a stack overflow exception\:
advanced_1402_li=There is no limit for the following entities, except the memory and storage capacity\: maximum identifier length (table name, column name, and so on); maximum number of tables, columns, indexes, triggers, and other database objects; maximum statement length, number of parameters per statement, tables per statement, expressions in order by, group by, having, and so on; maximum rows per query; maximum columns per table, columns per index, indexes per table, lob columns per table, and so on; maximum row length, index row length, select row length; maximum length of a varchar column, decimal column, literal in a statement. advanced_1402_li=There is no limit for the following entities, except the memory and storage capacity\: maximum identifier length (table name, column name, and so on); maximum number of tables, columns, indexes, triggers, and other database objects; maximum statement length, number of parameters per statement, tables per statement, expressions in order by, group by, having, and so on; maximum rows per query; maximum columns per table, columns per index, indexes per table, lob columns per table, and so on; maximum row length, index row length, select row length; maximum length of a varchar column, decimal column, literal in a statement.
advanced_1403_li=For limitations on data types, see the documentation of the respective Java data type or the data type documentation of this database. advanced_1403_li=Querying from the metadata tables is slow if there are many tables (thousands).
advanced_1404_h2=Glossary and Links advanced_1404_li=For limitations on data types, see the documentation of the respective Java data type or the data type documentation of this database.
advanced_1405_th=Term advanced_1405_h2=Glossary and Links
advanced_1406_th=Description advanced_1406_th=Term
advanced_1407_td=AES-128 advanced_1407_th=Description
advanced_1408_td=A block encryption algorithm. See also\: <a href\="http\://en.wikipedia.org/wiki/Advanced_Encryption_Standard">Wikipedia\: AES</a> advanced_1408_td=AES-128
advanced_1409_td=Birthday Paradox advanced_1409_td=A block encryption algorithm. See also\: <a href\="http\://en.wikipedia.org/wiki/Advanced_Encryption_Standard">Wikipedia\: AES</a>
advanced_1410_td=Describes the higher than expected probability that two persons in a room have the same birthday. Also valid for randomly generated UUIDs. See also\: <a href\="http\://en.wikipedia.org/wiki/Birthday_paradox">Wikipedia\: Birthday Paradox</a> advanced_1410_td=Birthday Paradox
advanced_1411_td=Digest advanced_1411_td=Describes the higher than expected probability that two persons in a room have the same birthday. Also valid for randomly generated UUIDs. See also\: <a href\="http\://en.wikipedia.org/wiki/Birthday_paradox">Wikipedia\: Birthday Paradox</a>
advanced_1412_td=Protocol to protect a password (but not to protect data). See also\: <a href\="http\://www.faqs.org/rfcs/rfc2617.html">RFC 2617\: HTTP Digest Access Authentication</a> advanced_1412_td=Digest
advanced_1413_td=GCJ advanced_1413_td=Protocol to protect a password (but not to protect data). See also\: <a href\="http\://www.faqs.org/rfcs/rfc2617.html">RFC 2617\: HTTP Digest Access Authentication</a>
advanced_1414_td=Compiler for Java. <a href\="http\://gcc.gnu.org/java">GNU Compiler for the Java</a> and <a href\="http\://www.dobysoft.com/products/nativej">NativeJ (commercial)</a> advanced_1414_td=GCJ
advanced_1415_td=HTTPS advanced_1415_td=Compiler for Java. <a href\="http\://gcc.gnu.org/java">GNU Compiler for the Java</a> and <a href\="http\://www.dobysoft.com/products/nativej">NativeJ (commercial)</a>
advanced_1416_td=A protocol to provide security to HTTP connections. See also\: <a href\="http\://www.ietf.org/rfc/rfc2818.txt">RFC 2818\: HTTP Over TLS</a> advanced_1416_td=HTTPS
advanced_1417_td=Modes of Operation advanced_1417_td=A protocol to provide security to HTTP connections. See also\: <a href\="http\://www.ietf.org/rfc/rfc2818.txt">RFC 2818\: HTTP Over TLS</a>
advanced_1418_a=Wikipedia\: Block cipher modes of operation advanced_1418_td=Modes of Operation
advanced_1419_td=Salt advanced_1419_a=Wikipedia\: Block cipher modes of operation
advanced_1420_td=Random number to increase the security of passwords. See also\: <a href\="http\://en.wikipedia.org/wiki/Key_derivation_function">Wikipedia\: Key derivation function</a> advanced_1420_td=Salt
advanced_1421_td=SHA-256 advanced_1421_td=Random number to increase the security of passwords. See also\: <a href\="http\://en.wikipedia.org/wiki/Key_derivation_function">Wikipedia\: Key derivation function</a>
advanced_1422_td=A cryptographic one-way hash function. See also\: <a href\="http\://en.wikipedia.org/wiki/SHA_family">Wikipedia\: SHA hash functions</a> advanced_1422_td=SHA-256
advanced_1423_td=SQL Injection advanced_1423_td=A cryptographic one-way hash function. See also\: <a href\="http\://en.wikipedia.org/wiki/SHA_family">Wikipedia\: SHA hash functions</a>
advanced_1424_td=A security vulnerability where an application embeds SQL statements or expressions in user input. See also\: <a href\="http\://en.wikipedia.org/wiki/SQL_injection">Wikipedia\: SQL Injection</a> advanced_1424_td=SQL Injection
advanced_1425_td=Watermark Attack advanced_1425_td=A security vulnerability where an application embeds SQL statements or expressions in user input. See also\: <a href\="http\://en.wikipedia.org/wiki/SQL_injection">Wikipedia\: SQL Injection</a>
advanced_1426_td=Security problem of certain encryption programs where the existence of certain data can be proven without decrypting. For more information, search in the internet for 'watermark attack cryptoloop' advanced_1426_td=Watermark Attack
advanced_1427_td=SSL/TLS advanced_1427_td=Security problem of certain encryption programs where the existence of certain data can be proven without decrypting. For more information, search in the internet for 'watermark attack cryptoloop'
advanced_1428_td=Secure Sockets Layer / Transport Layer Security. See also\: <a href\="http\://java.sun.com/products/jsse/">Java Secure Socket Extension (JSSE)</a> advanced_1428_td=SSL/TLS
advanced_1429_td=XTEA advanced_1429_td=Secure Sockets Layer / Transport Layer Security. See also\: <a href\="http\://java.sun.com/products/jsse/">Java Secure Socket Extension (JSSE)</a>
advanced_1430_td=A block encryption algorithm. See also\: <a href\="http\://en.wikipedia.org/wiki/XTEA">Wikipedia\: XTEA</a> advanced_1430_td=XTEA
advanced_1431_td=A block encryption algorithm. See also\: <a href\="http\://en.wikipedia.org/wiki/XTEA">Wikipedia\: XTEA</a>
build_1000_h1=Build build_1000_h1=Build
build_1001_a=\ Portability build_1001_a=\ Portability
build_1002_a=\ Environment build_1002_a=\ Environment
...@@ -1281,7 +1282,7 @@ features_1385_p=\ Backslashes within the init script (for example within a runsc ...@@ -1281,7 +1282,7 @@ features_1385_p=\ Backslashes within the init script (for example within a runsc
features_1386_h2=Ignore Unknown Settings features_1386_h2=Ignore Unknown Settings
features_1387_p=\ Some applications (for example OpenOffice.org Base) pass some additional parameters when connecting to the database. Why those parameters are passed is unknown. The parameters <code>PREFERDOSLIKELINEENDS</code> and <code>IGNOREDRIVERPRIVILEGES</code> are such examples; they are simply ignored to improve the compatibility with OpenOffice.org. If an application passes other parameters when connecting to the database, usually the database throws an exception saying the parameter is not supported. It is possible to ignored such parameters by adding <code>;IGNORE_UNKNOWN_SETTINGS\=TRUE</code> to the database URL. features_1387_p=\ Some applications (for example OpenOffice.org Base) pass some additional parameters when connecting to the database. Why those parameters are passed is unknown. The parameters <code>PREFERDOSLIKELINEENDS</code> and <code>IGNOREDRIVERPRIVILEGES</code> are such examples; they are simply ignored to improve the compatibility with OpenOffice.org. If an application passes other parameters when connecting to the database, usually the database throws an exception saying the parameter is not supported. It is possible to ignored such parameters by adding <code>;IGNORE_UNKNOWN_SETTINGS\=TRUE</code> to the database URL.
features_1388_h2=Changing Other Settings when Opening a Connection features_1388_h2=Changing Other Settings when Opening a Connection
features_1389_p=\ In addition to the settings already described, other database settings can be passed in the database URL. Adding <code>;setting\=value</code> at the end of a database URL is the same as executing the statement <code>SET setting value</code> just after connecting. For a list of supported settings, see <a href\="grammar.html">SQL Grammar</a>. features_1389_p=\ In addition to the settings already described, other database settings can be passed in the database URL. Adding <code>;setting\=value</code> at the end of a database URL is the same as executing the statement <code>SET setting value</code> just after connecting. For a list of supported settings, see <a href\="grammar.html">SQL Grammar</a> or the <a href\="../javadoc/org/h2/constant/DbSettings.html">DbSettings</a> javadoc.
features_1390_h2=Custom File Access Mode features_1390_h2=Custom File Access Mode
features_1391_p=\ Usually, the database opens the database file with the access mode <code>rw</code>, meaning read-write (except for read only databases, where the mode <code>r</code> is used). To open a database in read-only mode if the database file is not read-only, use <code>ACCESS_MODE_DATA\=r</code>. Also supported are <code>rws</code> and <code>rwd</code>. This setting must be specified in the database URL\: features_1391_p=\ Usually, the database opens the database file with the access mode <code>rw</code>, meaning read-write (except for read only databases, where the mode <code>r</code> is used). To open a database in read-only mode if the database file is not read-only, use <code>ACCESS_MODE_DATA\=r</code>. Also supported are <code>rws</code> and <code>rwd</code>. This setting must be specified in the database URL\:
features_1392_p=\ For more information see <a href\="advanced.html\#durability_problems">Durability Problems</a>. On many operating systems the access mode <code>rws</code> does not guarantee that the data is written to the disk. features_1392_p=\ For more information see <a href\="advanced.html\#durability_problems">Durability Problems</a>. On many operating systems the access mode <code>rws</code> does not guarantee that the data is written to the disk.
...@@ -2265,105 +2266,120 @@ mvstore_1002_a=\ Example Code ...@@ -2265,105 +2266,120 @@ mvstore_1002_a=\ Example Code
mvstore_1003_a=\ Store Builder mvstore_1003_a=\ Store Builder
mvstore_1004_a=\ R-Tree mvstore_1004_a=\ R-Tree
mvstore_1005_a=\ Features mvstore_1005_a=\ Features
mvstore_1006_a=\ Similar Projects and Differences to Other Storage Engines mvstore_1006_div=\ - <a href\="\#maps">Maps</a>
mvstore_1007_a=\ Current State mvstore_1007_div=\ - <a href\="\#versions">Versions</a>
mvstore_1008_a=\ Requirements mvstore_1008_div=\ - <a href\="\#transactions">Transactions</a>
mvstore_1009_h2=Overview mvstore_1009_div=\ - <a href\="\#inMemory">In-Memory Performance and Usage</a>
mvstore_1010_p=\ The MVStore is work in progress, and is planned to be the next storage subsystem of H2. But it can be also directly within an application, without using JDBC or SQL. mvstore_1010_div=\ - <a href\="\#dataTypes">Pluggable Data Types</a>
mvstore_1011_li=MVStore stands for "multi-version store". mvstore_1011_div=\ - <a href\="\#blob">BLOB Support</a>
mvstore_1012_li=Each store contains a number of maps (using the <code>java.util.Map</code> interface). mvstore_1012_div=\ - <a href\="\#pluggableMap">R-Tree and Pluggable Map Implementations</a>
mvstore_1013_li=Both file-based persistence and in-memory operation are supported. mvstore_1013_div=\ - <a href\="\#caching">Concurrent Operations and Caching</a>
mvstore_1014_li=It is intended to be fast, simple to use, and small. mvstore_1014_div=\ - <a href\="\#logStructured">Log Structured Storage</a>
mvstore_1015_li=Old versions of the data can be read concurrently with all other operations. mvstore_1015_div=\ - <a href\="\#fileSystem">File System Abstraction, File Locking and Online Backup</a>
mvstore_1016_li=Transaction are supported (currently only one transaction at a time). mvstore_1016_div=\ - <a href\="\#encryption">Encrypted Files</a>
mvstore_1017_li=Transactions (even if they are persisted) can be rolled back. mvstore_1017_div=\ - <a href\="\#tools">Tools</a>
mvstore_1018_li=The tool is very modular. It supports pluggable data types / serialization, pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage, and a file system abstraction to support encrypted files and zip files. mvstore_1018_div=\ - <a href\="\#exceptionHandling">Exception Handling</a>
mvstore_1019_h2=Example Code mvstore_1019_a=\ Similar Projects and Differences to Other Storage Engines
mvstore_1020_p=\ The following sample code show how to create a store, open a map, add some data, and access the current and an old version\: mvstore_1020_a=\ Current State
mvstore_1021_h2=Store Builder mvstore_1021_a=\ Requirements
mvstore_1022_p=\ The <code>MVStore.Builder</code> provides a fluid interface to build a store if more complex configuration options are used. The following code contains all supported configuration options\: mvstore_1022_h2=Overview
mvstore_1023_li=cacheSizeMB\: the cache size in MB. mvstore_1023_p=\ The MVStore is work in progress, and is planned to be the next storage subsystem of H2. But it can be also directly within an application, without using JDBC or SQL.
mvstore_1024_li=compressData\: compress the data when storing. mvstore_1024_li=MVStore stands for "multi-version store".
mvstore_1025_li=encryptionKey\: the encryption key for file encryption. mvstore_1025_li=Each store contains a number of maps (using the <code>java.util.Map</code> interface).
mvstore_1026_li=fileName\: the name of the file, for file based stores. mvstore_1026_li=Both file-based persistence and in-memory operation are supported.
mvstore_1027_li=readOnly\: open the file in read-only mode. mvstore_1027_li=It is intended to be fast, simple to use, and small.
mvstore_1028_li=writeBufferSize\: the size of the write buffer in MB. mvstore_1028_li=Old versions of the data can be read concurrently with all other operations.
mvstore_1029_li=writeDelay\: the maximum delay until committed changes are stored (unless stored explicitly). mvstore_1029_li=Transaction are supported.
mvstore_1030_h2=R-Tree mvstore_1030_li=The tool is very modular. It supports pluggable data types / serialization, pluggable map implementations (B-tree, R-tree, concurrent B-tree currently), BLOB storage, and a file system abstraction to support encrypted files and zip files.
mvstore_1031_p=\ The <code>MVRTreeMap</code> is an R-tree implementation that supports fast spatial queries. It can be used as follows\: mvstore_1031_h2=Example Code
mvstore_1032_p=\ The default number of dimensions is 2. To use a different number of dimensions, call <code>new MVRTreeMap.Builder&lt;String&gt;().dimensions(3)</code>. The minimum number of dimensions is 1, the maximum is 255. mvstore_1032_p=\ The following sample code show how to create a store, open a map, add some data, and access the current and an old version\:
mvstore_1033_h2=Features mvstore_1033_h2=Store Builder
mvstore_1034_h3=Maps mvstore_1034_p=\ The <code>MVStore.Builder</code> provides a fluid interface to build a store if more complex configuration options are used. The following code contains all supported configuration options\:
mvstore_1035_p=\ Each store supports a set of named maps. A map is sorted by key, and supports the common lookup operations, including access to the first and last key, iterate over some or all keys, and so on. mvstore_1035_li=cacheSizeMB\: the cache size in MB.
mvstore_1036_p=\ Also supported, and very uncommon for maps, is fast index lookup\: the keys of the map can be accessed like a list (get the key at the given index, get the index of a certain key). That means getting the median of two keys is trivial, and range of keys can be counted very quickly. The iterator supports fast skipping. This is possible because internally, each map is organized in the form of a counted B+-tree. mvstore_1036_li=compressData\: compress the data when storing.
mvstore_1037_p=\ In database terms, a map can be used like a table, where the key of the map is the primary key of the table, and the value is the row. A map can also represent an index, where the key of the map is the key of the index, and the value of the map is the primary key of the table (for non-unique indexes, the key of the map must also contain the primary key). mvstore_1037_li=encryptionKey\: the encryption key for file encryption.
mvstore_1038_h3=Versions / Transactions mvstore_1038_li=fileName\: the name of the file, for file based stores.
mvstore_1039_p=\ Multiple versions are supported. A version is a snapshot of all the data of all maps at a given point in time. A transaction is a number of actions between two versions. mvstore_1039_li=readOnly\: open the file in read-only mode.
mvstore_1040_p=\ Versions / transactions are not immediately persisted; instead, only the version counter is incremented. If there is a change after switching to a new version, a snapshot of the old version is kept in memory, so that it can still be read. mvstore_1040_li=writeBufferSize\: the size of the write buffer in MB.
mvstore_1041_p=\ Old persisted versions are readable until the old data was explicitly overwritten. Creating a snapshot is fast\: only the pages that are changed after a snapshot are copied. This behavior is also called COW (copy on write). mvstore_1041_li=writeDelay\: the maximum delay until committed changes are stored (unless stored explicitly).
mvstore_1042_p=\ Rollback is supported (rollback to any old in-memory version or an old persisted version). mvstore_1042_h2=R-Tree
mvstore_1043_h3=In-Memory Performance and Usage mvstore_1043_p=\ The <code>MVRTreeMap</code> is an R-tree implementation that supports fast spatial queries. It can be used as follows\:
mvstore_1044_p=\ Performance of in-memory operations is comparable with <code>java.util.TreeMap</code> (many operations are actually faster), but usually slower than <code>java.util.HashMap</code>. mvstore_1044_p=\ The default number of dimensions is 2. To use a different number of dimensions, call <code>new MVRTreeMap.Builder&lt;String&gt;().dimensions(3)</code>. The minimum number of dimensions is 1, the maximum is 255.
mvstore_1045_p=\ The memory overhead for large maps is slightly better than for the regular map implementations, but there is a higher overhead per map. For maps with less than 25 entries, the regular map implementations use less memory on average. mvstore_1045_h2=Features
mvstore_1046_p=\ If no file name is specified, the store operates purely in memory. Except for persisting data, all features are supported in this mode (multi-versioning, index lookup, R-tree and so on). If a file name is specified, all operations occur in memory (with the same performance characteristics) until data is persisted. mvstore_1046_h3=Maps
mvstore_1047_h3=Pluggable Data Types mvstore_1047_p=\ Each store supports a set of named maps. A map is sorted by key, and supports the common lookup operations, including access to the first and last key, iterate over some or all keys, and so on.
mvstore_1048_p=\ Serialization is pluggable. The default serialization currently supports many common data types, and uses Java serialization for other objects. The following classes are currently directly supported\: <code>Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal, String, UUID, Date</code> and arrays (both primitive arrays and object arrays). mvstore_1048_p=\ Also supported, and very uncommon for maps, is fast index lookup\: the keys of the map can be accessed like a list (get the key at the given index, get the index of a certain key). That means getting the median of two keys is trivial, and range of keys can be counted very quickly. The iterator supports fast skipping. This is possible because internally, each map is organized in the form of a counted B+-tree.
mvstore_1049_p=\ Parameterized data types are supported (for example one could build a string data type that limits the length for some reason). mvstore_1049_p=\ In database terms, a map can be used like a table, where the key of the map is the primary key of the table, and the value is the row. A map can also represent an index, where the key of the map is the key of the index, and the value of the map is the primary key of the table (for non-unique indexes, the key of the map must also contain the primary key).
mvstore_1050_p=\ The storage engine itself does not have any length limits, so that keys, values, pages, and chunks can be very big (as big as fits in memory). Also, there is no inherent limit to the number of maps and chunks. Due to using a log structured storage, there is no special case handling for large keys or pages. mvstore_1050_h3=Versions
mvstore_1051_h3=BLOB Support mvstore_1051_p=\ Multiple versions are supported. A version is a snapshot of all the data of all maps at a given point in time. A transaction is a number of actions between two versions.
mvstore_1052_p=\ There is a mechanism that stores large binary objects by splitting them into smaller blocks. This allows to store objects that don't fit in memory. Streaming as well as random access reads on such objects are supported. This tool is written on top of the store (only using the map interface). mvstore_1052_p=\ Versions are not immediately persisted; instead, only the version counter is incremented. If there is a change after switching to a new version, a snapshot of the old version is kept in memory, so that it can still be read.
mvstore_1053_h3=R-Tree and Pluggable Map Implementations mvstore_1053_p=\ Old persisted versions are readable until the old data was explicitly overwritten. Creating a snapshot is fast\: only the pages that are changed after a snapshot are copied. This behavior is also called COW (copy on write).
mvstore_1054_p=\ The map implementation is pluggable. In addition to the default <code>MVMap</code> (multi-version map), there is a multi-version R-tree map implementation for spatial operations (contain and intersection; nearest neighbor is not yet implemented). mvstore_1054_p=\ Rollback is supported (rollback to any old in-memory version or an old persisted version).
mvstore_1055_h3=Concurrent Operations and Caching mvstore_1055_h3=Transactions
mvstore_1056_p=\ The default map implementation supports concurrent reads on old versions of the data. All such read operations can occur in parallel. Concurrent reads from the page cache, as well as concurrent reads from the file system are supported. mvstore_1056_p=\ The multi-version support is the basis for the transaction support. In the simple case, when only one transaction is open at a time, rolling back the transaction only requires to revert to an old version.
mvstore_1057_p=\ Storing changes can occur concurrently to modifying the data, as it operates on a snapshot. mvstore_1057_p=\ To support multiple concurrent open transactions, a transaction utility is included, the <code>TransactionStore</code>. This utility stores the changed entries in a separate map, similar to a transaction log (except that only the key of a changed row is stored, and the entries of a transaction are removed when the transaction is committed). The storage overhead of this utility is very small compared to the overhead of a regular transaction log. The tool supports PostgreSQL style "read committed" transaction isolation. There is no limit on the size of a transaction (the log is not kept in memory).
mvstore_1058_p=\ Caching is done on the page level. The page cache is a concurrent LIRS cache, which should be resistant against scan operations. mvstore_1058_h3=In-Memory Performance and Usage
mvstore_1059_p=\ The default map implementation does not support concurrent modification operations on a map (the same as <code>HashMap</code> and <code>TreeMap</code>). Similar to those classes, the map tries to detect concurrent modification. mvstore_1059_p=\ Performance of in-memory operations is comparable with <code>java.util.TreeMap</code> (many operations are actually faster), but usually slower than <code>java.util.HashMap</code>.
mvstore_1060_p=\ With the <code>MVMapConcurrent</code> implementation, read operations even on the newest version can happen concurrently with all other operations, without risk of corruption. This comes with slightly reduced speed in single threaded mode, the same as with other <code>ConcurrentHashMap</code> implementations. Write operations first read the relevant area from disk to memory (this can happen concurrently), and only then modify the data. The in-memory part of write operations is synchronized. mvstore_1060_p=\ The memory overhead for large maps is slightly better than for the regular map implementations, but there is a higher overhead per map. For maps with less than 25 entries, the regular map implementations use less memory on average.
mvstore_1061_p=\ For fully scalable concurrent write operations to a map (in-memory and to disk), the map could be split into multiple maps in different stores ('sharding'). The plan is to add such a mechanism later when needed. mvstore_1061_p=\ If no file name is specified, the store operates purely in memory. Except for persisting data, all features are supported in this mode (multi-versioning, index lookup, R-tree and so on). If a file name is specified, all operations occur in memory (with the same performance characteristics) until data is persisted.
mvstore_1062_h3=Log Structured Storage mvstore_1062_h3=Pluggable Data Types
mvstore_1063_p=\ Changes are buffered in memory, and once enough changes have accumulated, they are written in one continuous disk write operation. (According to a test, write throughput of a common SSD gets higher the larger the block size, until a block size of 2 MB, and then does not further increase.) By default, committed changes are automatically written once every second in a background thread, even if only little data was changed. Changes can also be written explicitly by calling <code>store()</code>. To avoid out of memory, uncommitted changes are also written when needed, however they are rolled back when closing the store, or at the latest (when the store was not correctly closed) when opening the store. mvstore_1063_p=\ Serialization is pluggable. The default serialization currently supports many common data types, and uses Java serialization for other objects. The following classes are currently directly supported\: <code>Boolean, Byte, Short, Character, Integer, Long, Float, Double, BigInteger, BigDecimal, String, UUID, Date</code> and arrays (both primitive arrays and object arrays).
mvstore_1064_p=\ When storing, all changed pages are serialized, optionally compressed using the LZF algorithm, and written sequentially to a free area of the file. Each such change set is called a chunk. All parent pages of the changed B-trees are stored in this chunk as well, so that each chunk also contains the root of each changed map (which is the entry point to read this version of the data). There is no separate index\: all data is stored as a list of pages. Per store, there is one additional map that contains the metadata (the list of maps, where the root page of each map is stored, and the list of chunks). mvstore_1064_p=\ Parameterized data types are supported (for example one could build a string data type that limits the length for some reason).
mvstore_1065_p=\ There are usually two write operations per chunk\: one to store the chunk data (the pages), and one to update the file header (so it points to the latest chunk). If the chunk is appended at the end of the file, the file header is only written at the end of the chunk. mvstore_1065_p=\ The storage engine itself does not have any length limits, so that keys, values, pages, and chunks can be very big (as big as fits in memory). Also, there is no inherent limit to the number of maps and chunks. Due to using a log structured storage, there is no special case handling for large keys or pages.
mvstore_1066_p=\ There is no transaction log, no undo log, and there are no in-place updates (however unused chunks are overwritten by default). mvstore_1066_h3=BLOB Support
mvstore_1067_p=\ Old data is kept for at least 45 seconds (configurable), so that there are no explicit sync operations required to guarantee data consistency, but an application can also sync explicitly when needed. To reuse disk space, the chunks with the lowest amount of live data are compacted (the live data is simply stored again in the next chunk). To improve data locality and disk space usage, the plan is to automatically defragment and compact data. mvstore_1067_p=\ There is a mechanism that stores large binary objects by splitting them into smaller blocks. This allows to store objects that don't fit in memory. Streaming as well as random access reads on such objects are supported. This tool is written on top of the store (only using the map interface).
mvstore_1068_p=\ Compared to traditional storage engines (that use a transaction log, undo log, and main storage area), the log structured storage is simpler, more flexible, and typically needs less disk operations per change, as data is only written once instead of twice or 3 times, and because the B-tree pages are always full (they are stored next to each other) and can be easily compressed. But temporarily, disk space usage might actually be a bit higher than for a regular database, as disk space is not immediately re-used (there are no in-place updates). mvstore_1068_h3=R-Tree and Pluggable Map Implementations
mvstore_1069_h3=File System Abstraction, File Locking and Online Backup mvstore_1069_p=\ The map implementation is pluggable. In addition to the default <code>MVMap</code> (multi-version map), there is a multi-version R-tree map implementation for spatial operations (contain and intersection; nearest neighbor is not yet implemented).
mvstore_1070_p=\ The file system is pluggable (the same file system abstraction is used as H2 uses). The file can be encrypted using an encrypting file system. Other file system implementations support reading from a compressed zip or jar file. mvstore_1070_h3=Concurrent Operations and Caching
mvstore_1071_p=\ Each store may only be opened once within a JVM. When opening a store, the file is locked in exclusive mode, so that the file can only be changed from within one process. Files can be opened in read-only mode, in which case a shared lock is used. mvstore_1071_p=\ The default map implementation supports concurrent reads on old versions of the data. All such read operations can occur in parallel. Concurrent reads from the page cache, as well as concurrent reads from the file system are supported.
mvstore_1072_p=\ The persisted data can be backed up to a different file at any time, even during write operations (online backup). To do that, automatic disk space reuse needs to be first disabled, so that new data is always appended at the end of the file. Then, the file can be copied (the file handle is available to the application). mvstore_1072_p=\ Storing changes can occur concurrently to modifying the data, as it operates on a snapshot.
mvstore_1073_h3=Encrypted Files mvstore_1073_p=\ Caching is done on the page level. The page cache is a concurrent LIRS cache, which should be resistant against scan operations.
mvstore_1074_p=\ File encryption ensures the data can only be read with the correct password. Data can be encrypted as follows\: mvstore_1074_p=\ The default map implementation does not support concurrent modification operations on a map (the same as <code>HashMap</code> and <code>TreeMap</code>). Similar to those classes, the map tries to detect concurrent modification.
mvstore_1075_p=\ The following algorithms and settings are used\: mvstore_1075_p=\ With the <code>MVMapConcurrent</code> implementation, read operations even on the newest version can happen concurrently with all other operations, without risk of corruption. This comes with slightly reduced speed in single threaded mode, the same as with other <code>ConcurrentHashMap</code> implementations. Write operations first read the relevant area from disk to memory (this can happen concurrently), and only then modify the data. The in-memory part of write operations is synchronized.
mvstore_1076_li=The password char array is cleared after use, to reduce the risk that the password is stolen even if the attacker has access to the main memory. mvstore_1076_p=\ For fully scalable concurrent write operations to a map (in-memory and to disk), the map could be split into multiple maps in different stores ('sharding'). The plan is to add such a mechanism later when needed.
mvstore_1077_li=The password is hashed according to the PBKDF2 standard, using the SHA-256 hash algorithm. mvstore_1077_h3=Log Structured Storage
mvstore_1078_li=The length of the salt is 64 bits, so that an attacker can not use a pre-calculated password hash table (rainbow table). It is generated using a cryptographically secure random number generator. mvstore_1078_p=\ Changes are buffered in memory, and once enough changes have accumulated, they are written in one continuous disk write operation. (According to a test, write throughput of a common SSD gets higher the larger the block size, until a block size of 2 MB, and then does not further increase.) By default, committed changes are automatically written once every second in a background thread, even if only little data was changed. Changes can also be written explicitly by calling <code>store()</code>. To avoid out of memory, uncommitted changes are also written when needed, however they are rolled back when closing the store, or at the latest (when the store was not correctly closed) when opening the store.
mvstore_1079_li=To speed up opening an encrypted stores on Android, the number of PBKDF2 iterations is 10. The higher the value, the better the protection against brute-force password cracking attacks, but the slower is opening a file. mvstore_1079_p=\ When storing, all changed pages are serialized, optionally compressed using the LZF algorithm, and written sequentially to a free area of the file. Each such change set is called a chunk. All parent pages of the changed B-trees are stored in this chunk as well, so that each chunk also contains the root of each changed map (which is the entry point to read this version of the data). There is no separate index\: all data is stored as a list of pages. Per store, there is one additional map that contains the metadata (the list of maps, where the root page of each map is stored, and the list of chunks).
mvstore_1080_li=The file itself is encrypted using the standardized disk encryption mode XTS-AES. Only little more than one AES-128 round per block is needed. mvstore_1080_p=\ There are usually two write operations per chunk\: one to store the chunk data (the pages), and one to update the file header (so it points to the latest chunk). If the chunk is appended at the end of the file, the file header is only written at the end of the chunk.
mvstore_1081_h3=Tools mvstore_1081_p=\ There is no transaction log, no undo log, and there are no in-place updates (however unused chunks are overwritten by default).
mvstore_1082_p=\ There is a tool (<code>MVStoreTool</code>) to dump the contents of a file. mvstore_1082_p=\ Old data is kept for at least 45 seconds (configurable), so that there are no explicit sync operations required to guarantee data consistency, but an application can also sync explicitly when needed. To reuse disk space, the chunks with the lowest amount of live data are compacted (the live data is simply stored again in the next chunk). To improve data locality and disk space usage, the plan is to automatically defragment and compact data.
mvstore_1083_h3=Exception Handling mvstore_1083_p=\ Compared to traditional storage engines (that use a transaction log, undo log, and main storage area), the log structured storage is simpler, more flexible, and typically needs less disk operations per change, as data is only written once instead of twice or 3 times, and because the B-tree pages are always full (they are stored next to each other) and can be easily compressed. But temporarily, disk space usage might actually be a bit higher than for a regular database, as disk space is not immediately re-used (there are no in-place updates).
mvstore_1084_p=\ This tool does not throw checked exceptions. Instead, unchecked exceptions are thrown if needed. The error message always contains the version of the tool. The following exceptions can occur\: mvstore_1084_h3=File System Abstraction, File Locking and Online Backup
mvstore_1085_code=IllegalStateException mvstore_1085_p=\ The file system is pluggable (the same file system abstraction is used as H2 uses). The file can be encrypted using an encrypting file system. Other file system implementations support reading from a compressed zip or jar file.
mvstore_1086_li=\ if a map was already closed or an IO exception occurred, for example if the file was locked, is already closed, could not be opened or closed, if reading or writing failed, if the file is corrupt, or if there is an internal error in the tool. mvstore_1086_p=\ Each store may only be opened once within a JVM. When opening a store, the file is locked in exclusive mode, so that the file can only be changed from within one process. Files can be opened in read-only mode, in which case a shared lock is used.
mvstore_1087_code=IllegalArgumentException mvstore_1087_p=\ The persisted data can be backed up to a different file at any time, even during write operations (online backup). To do that, automatic disk space reuse needs to be first disabled, so that new data is always appended at the end of the file. Then, the file can be copied (the file handle is available to the application).
mvstore_1088_li=\ if a method was called with an illegal argument. mvstore_1088_h3=Encrypted Files
mvstore_1089_code=UnsupportedOperationException mvstore_1089_p=\ File encryption ensures the data can only be read with the correct password. Data can be encrypted as follows\:
mvstore_1090_li=\ if a method was called that is not supported, for example trying to modify a read-only map or view. mvstore_1090_p=\ The following algorithms and settings are used\:
mvstore_1091_code=ConcurrentModificationException mvstore_1091_li=The password char array is cleared after use, to reduce the risk that the password is stolen even if the attacker has access to the main memory.
mvstore_1092_li=\ if the object is modified concurrently. mvstore_1092_li=The password is hashed according to the PBKDF2 standard, using the SHA-256 hash algorithm.
mvstore_1093_h2=Similar Projects and Differences to Other Storage Engines mvstore_1093_li=The length of the salt is 64 bits, so that an attacker can not use a pre-calculated password hash table (rainbow table). It is generated using a cryptographically secure random number generator.
mvstore_1094_p=\ Unlike similar storage engines like LevelDB and Kyoto Cabinet, the MVStore is written in Java and can easily be embedded in a Java and Android application. mvstore_1094_li=To speed up opening an encrypted stores on Android, the number of PBKDF2 iterations is 10. The higher the value, the better the protection against brute-force password cracking attacks, but the slower is opening a file.
mvstore_1095_p=\ The MVStore is somewhat similar to the Berkeley DB Java Edition because it is also written in Java, and is also a log structured storage, but the H2 license is more liberal. mvstore_1095_li=The file itself is encrypted using the standardized disk encryption mode XTS-AES. Only little more than one AES-128 round per block is needed.
mvstore_1096_p=\ Like SQLite, the MVStore keeps all data in one file. Unlike SQLite, the MVStore uses is a log structured storage. The plan is to make the MVStore both easier to use as well as faster than SQLite. In a recent (very simple) test, the MVStore was about twice as fast as SQLite on Android. mvstore_1096_h3=Tools
mvstore_1097_p=\ The API of the MVStore is similar to MapDB (previously known as JDBM) from Jan Kotek, and some code is shared between MapDB and JDBM. However, unlike MapDB, the MVStore uses is a log structured storage. The MVStore does not have a record size limit. mvstore_1097_p=\ There is a tool (<code>MVStoreTool</code>) to dump the contents of a file.
mvstore_1098_h2=Current State mvstore_1098_h3=Exception Handling
mvstore_1099_p=\ The code is still experimental at this stage. The API as well as the behavior may partially change. Features may be added and removed (even thought the main features will stay). mvstore_1099_p=\ This tool does not throw checked exceptions. Instead, unchecked exceptions are thrown if needed. The error message always contains the version of the tool. The following exceptions can occur\:
mvstore_1100_h2=Requirements mvstore_1100_code=IllegalStateException
mvstore_1101_p=\ The MVStore is included in the latest H2 jar file. mvstore_1101_li=\ if a map was already closed or an IO exception occurred, for example if the file was locked, is already closed, could not be opened or closed, if reading or writing failed, if the file is corrupt, or if there is an internal error in the tool.
mvstore_1102_p=\ There are no special requirements to use it. The MVStore should run on any JVM as well as on Android. mvstore_1102_code=IllegalArgumentException
mvstore_1103_p=\ To build just the MVStore (without the database engine), run\: mvstore_1103_li=\ if a method was called with an illegal argument.
mvstore_1104_p=\ This will create the file <code>bin/h2mvstore-1.3.170.jar</code> (about 130 KB). mvstore_1104_code=UnsupportedOperationException
mvstore_1105_li=\ if a method was called that is not supported, for example trying to modify a read-only map or view.
mvstore_1106_code=ConcurrentModificationException
mvstore_1107_li=\ if the object is modified concurrently.
mvstore_1108_h2=Similar Projects and Differences to Other Storage Engines
mvstore_1109_p=\ Unlike similar storage engines like LevelDB and Kyoto Cabinet, the MVStore is written in Java and can easily be embedded in a Java and Android application.
mvstore_1110_p=\ The MVStore is somewhat similar to the Berkeley DB Java Edition because it is also written in Java, and is also a log structured storage, but the H2 license is more liberal.
mvstore_1111_p=\ Like SQLite, the MVStore keeps all data in one file. Unlike SQLite, the MVStore uses is a log structured storage. The plan is to make the MVStore both easier to use as well as faster than SQLite. In a recent (very simple) test, the MVStore was about twice as fast as SQLite on Android.
mvstore_1112_p=\ The API of the MVStore is similar to MapDB (previously known as JDBM) from Jan Kotek, and some code is shared between MapDB and JDBM. However, unlike MapDB, the MVStore uses is a log structured storage. The MVStore does not have a record size limit.
mvstore_1113_h2=Current State
mvstore_1114_p=\ The code is still experimental at this stage. The API as well as the behavior may partially change. Features may be added and removed (even thought the main features will stay).
mvstore_1115_h2=Requirements
mvstore_1116_p=\ The MVStore is included in the latest H2 jar file.
mvstore_1117_p=\ There are no special requirements to use it. The MVStore should run on any JVM as well as on Android.
mvstore_1118_p=\ To build just the MVStore (without the database engine), run\:
mvstore_1119_p=\ This will create the file <code>bin/h2mvstore-1.3.170.jar</code> (about 130 KB).
performance_1000_h1=Performance performance_1000_h1=Performance
performance_1001_a=\ Performance Comparison performance_1001_a=\ Performance Comparison
performance_1002_a=\ PolePosition Benchmark performance_1002_a=\ PolePosition Benchmark
......
...@@ -1011,9 +1011,9 @@ public class MVMap<K, V> extends AbstractMap<K, V> ...@@ -1011,9 +1011,9 @@ public class MVMap<K, V> extends AbstractMap<K, V>
Page newest = null; Page newest = null;
// need to copy because it can change // need to copy because it can change
Page r = root; Page r = root;
if (version >= r.getVersion() && if (version >= r.getVersion() &&
(r.getVersion() >= 0 || (r.getVersion() >= 0 ||
version <= createVersion || version <= createVersion ||
store.getFile() == null)) { store.getFile() == null)) {
newest = r; newest = r;
} else { } else {
......
...@@ -95,6 +95,7 @@ TODO: ...@@ -95,6 +95,7 @@ TODO:
- to save space when persisting very small transactions, - to save space when persisting very small transactions,
-- use a transaction log where only the deltas are stored -- use a transaction log where only the deltas are stored
- serialization for lists, sets, sets, sorted sets, maps, sorted maps - serialization for lists, sets, sets, sorted sets, maps, sorted maps
- maybe rename 'rollback' to 'revert'
*/ */
......
/*
* Copyright 2004-2013 H2 Group. Multiple-Licensed under the H2 License,
* Version 1.0, and under the Eclipse Public License, Version 1.0
* (http://h2database.com/html/license.html).
* Initial Developer: H2 Group
*/
package org.h2.mvstore;
import java.util.Map;
/**
* A store that supports concurrent transactions.
*/
public class TransactionStore {
/**
* The store.
*/
final MVStore store;
/**
* The map of open transaction.
* Key: transactionId, value: baseVersion.
*/
final MVMap<Long, Long> openTransactions;
/**
* The undo log.
* Key: [ transactionId, logId ], value: [ baseVersion, mapId, key ].
*/
final MVMap<long[], Object[]> undoLog;
/**
* The lock timeout in milliseconds. 0 means timeout immediately.
*/
long lockTimeout;
/**
* The transaction settings. "lastTransaction" the last transaction id.
*/
private final MVMap<String, String> settings;
private long lastTransactionId;
/**
* Create a new transaction store.
*
* @param store the store
*/
public TransactionStore(MVStore store) {
this.store = store;
settings = store.openMap("settings");
openTransactions = store.openMap("openTransactions",
new MVMapConcurrent.Builder<Long, Long>());
// TODO one undo log per transaction to speed up commit
// (alternative: add a range delete operation for maps)
undoLog = store.openMap("undoLog",
new MVMapConcurrent.Builder<long[], Object[]>());
init();
}
private void init() {
String s = settings.get("lastTransaction");
if (s != null) {
lastTransactionId = Long.parseLong(s);
}
Long t = openTransactions.lastKey();
if (t != null) {
if (t.longValue() > lastTransactionId) {
throw DataUtils.newIllegalStateException("Last transaction not stored");
}
// TODO rollback all old, stored transactions (if there are any)
}
}
/**
* Close the transaction store.
*/
public synchronized void close() {
settings.put("lastTransaction", "" + lastTransactionId);
}
/**
* Begin a new transaction.
*
* @return the transaction
*/
public synchronized Transaction begin() {
long baseVersion = store.getCurrentVersion();
store.incrementVersion();
long transactionId = lastTransactionId++;
if (lastTransactionId % 32 == 0) {
settings.put("lastTransaction", "" + lastTransactionId + 32);
}
openTransactions.put(transactionId, baseVersion);
return new Transaction(this, transactionId);
}
/**
* Commit a transaction.
*
* @param transactionId the transaction id
* @param maxLogId the last log id
*/
void commit(long transactionId, long maxLogId) {
// TODO commit should be much faster
store.incrementVersion();
for (long logId = 0; logId < maxLogId; logId++) {
Object[] op = undoLog.get(new long[] {
transactionId, logId });
int mapId = ((Integer) op[1]).intValue();
Map<String, String> meta = store.getMetaMap();
String m = meta.get("map." + mapId);
String mapName = DataUtils.parseMap(m).get("name");
MVMap<Object, Object[]> map = store.openMap(mapName);
Object key = op[2];
Object[] value = map.get(key);
if (value == null) {
// already removed
} else if (value[2] == null) {
// remove the value
map.remove(key);
}
undoLog.remove(logId);
}
openTransactions.remove(transactionId);
store.commit();
}
/**
* Roll a transaction back.
*
* @param transactionId the transaction id
* @param maxLogId the last log id
*/
void rollback(long transactionId, long maxLogId) {
rollbackTo(transactionId, maxLogId, 0);
openTransactions.remove(transactionId);
store.commit();
}
/**
* Rollback to an old savepoint.
*
* @param transactionId the transaction id
* @param maxLogId the last log id
* @param toLogId the log id to roll back to
*/
void rollbackTo(long transactionId, long maxLogId, long toLogId) {
store.incrementVersion();
for (long logId = maxLogId - 1; logId >= toLogId; logId--) {
Object[] op = undoLog.get(new long[] {
transactionId, logId });
int mapId = ((Integer) op[1]).intValue();
Map<String, String> meta = store.getMetaMap();
String m = meta.get("map." + mapId);
String mapName = DataUtils.parseMap(m).get("name");
MVMap<Object, Object[]> map = store.openMap(mapName);
Object key = op[2];
Object[] value = map.get(key);
if (value != null) {
Long oldVersion = (Long) value[1];
if (oldVersion == null) {
// this transaction added the value
map.remove(key);
} else if (oldVersion < map.getCreateVersion()) {
map.remove(key);
} else {
// this transaction updated the value
MVMap<Object, Object[]> mapOld = map
.openVersion(oldVersion);
Object[] old = mapOld.get(key);
if (old == null) {
map.remove(key);
} else {
map.put(key, old);
}
}
}
undoLog.remove(logId);
}
store.commit();
}
/**
* A transaction.
*/
public static class Transaction {
/**
* The transaction store.
*/
final TransactionStore store;
/**
* The transaction id.
*/
final long transactionId;
private long logId;
private boolean closed;
Transaction(TransactionStore store, long transactionId) {
this.store = store;
this.transactionId = transactionId;
}
/**
* Create a new savepoint.
*
* @return the savepoint id
*/
public long setSavepoint() {
store.store.incrementVersion();
return logId;
}
/**
* Add a log entry.
*
* @param baseVersion the old version
* @param mapId the map id
* @param key the key
*/
void log(long baseVersion, int mapId, Object key) {
long[] undoKey = { transactionId, logId++ };
Object[] log = new Object[] { baseVersion, mapId, key };
store.undoLog.put(undoKey, log);
}
/**
* Open a data map.
*
* @param <K> the key type
* @param <V> the value type
* @param name the name of the map
* @return the transaction map
*/
public <K, V> TransactionMap<K, V> openMap(String name) {
return new TransactionMap<K, V>(this, name);
}
/**
* Commit the transaction. Afterwards, this transaction is closed.
*/
public void commit() {
closed = true;
store.commit(transactionId, logId);
}
/**
* Roll the transaction back. Afterwards, this transaction is closed.
*/
public void rollback() {
closed = true;
store.rollback(transactionId, logId);
}
/**
* Roll back to the given savepoint.
*
* @param savepointId the savepoint id
*/
public void rollbackToSavepoint(long savepointId) {
store.rollbackTo(transactionId, this.logId, savepointId);
this.logId = savepointId;
}
/**
* Check whether this transaction is still open.
*/
void checkOpen() {
if (closed) {
throw DataUtils.newIllegalStateException("Transaction is closed");
}
}
}
/**
* A map that supports transactions.
*
* @param <K> the key type
* @param <V> the value type
*/
public static class TransactionMap<K, V> {
private Transaction transaction;
/**
* The newest version of the data.
* Key: key.
* Value: { transactionId, oldVersion, value }
*/
private final MVMap<K, Object[]> map;
private final int mapId;
TransactionMap(Transaction transaction, String name) {
this.transaction = transaction;
map = transaction.store.store.openMap(name);
mapId = map.getId();
}
/**
* Get the size of the map as seen by this transaction.
*
* @return the size
*/
public long size() {
// TODO this method is very slow
long size = 0;
Cursor<K> cursor = map.keyIterator(null);
while (cursor.hasNext()) {
K key = cursor.next();
if (get(key) != null) {
size++;
}
}
return size;
}
private void checkOpen() {
transaction.checkOpen();
}
/**
* Update the value for the given key. If the row is locked, this method
* will retry until the row could be updated or until a lock timeout.
*
* @param key the key
* @param value the new value (null to remove the row)
* @throws IllegalStateException if a lock timeout occurs
*/
public void put(K key, V value) {
checkOpen();
long start = 0;
while (true) {
boolean ok = tryPut(key, value);
if (ok) {
return;
}
// an uncommitted transaction:
// wait until it is committed, or until the lock timeout
long timeout = transaction.store.lockTimeout;
if (timeout == 0) {
throw DataUtils.newIllegalStateException("Lock timeout");
}
if (start == 0) {
start = System.currentTimeMillis();
} else {
long t = System.currentTimeMillis() - start;
if (t > timeout) {
throw DataUtils.newIllegalStateException("Lock timeout");
}
// TODO use wait/notify instead
try {
Thread.sleep(1);
} catch (InterruptedException e) {
// ignore
}
}
}
}
/**
* Try to update the value for the given key. This will fail if the row
* is not locked by another transaction (that means, if another open
* transaction added or updated the row).
*
* @param key the key
* @param value the new value
* @return whether the value could be updated
*/
public boolean tryPut(K key, V value) {
Object[] current = map.get(key);
long oldVersion = transaction.store.store.getCurrentVersion() - 1;
Object[] newValue = { transaction.transactionId, oldVersion, value };
if (current == null) {
// a new value
newValue[1] = null;
Object[] old = map.putIfAbsent(key, newValue);
if (old == null) {
transaction.log(oldVersion, mapId, key);
return true;
}
return false;
}
long tx = ((Long) current[0]).longValue();
if (tx == transaction.transactionId) {
// added or updated by this transaction
if (map.replace(key, current, newValue)) {
if (current[1] == null) {
transaction.log(oldVersion, mapId, key);
} else {
long c = (Long) current[1];
if (c != oldVersion) {
transaction.log(oldVersion, mapId, key);
}
}
return true;
}
// strange, somebody overwrite the value
// even thought the change was not committed
return false;
}
// added or updated by another transaction
Long base = transaction.store.openTransactions.get(tx);
if (base == null) {
// the transaction is committed:
// overwrite the value
if (map.replace(key, current, newValue)) {
transaction.log(oldVersion, mapId, key);
return true;
}
// somebody else was faster
return false;
}
// the transaction is not yet committed
return false;
}
/**
* Get the value for the given key.
*
* @param key the key
* @return the value or null
*/
@SuppressWarnings("unchecked")
public
V get(K key) {
checkOpen();
MVMap<K, Object[]> m = map;
while (true) {
Object[] data = m.get(key);
long tx;
if (data == null) {
// doesn't exist or deleted by a committed transaction
return null;
}
tx = ((Long) data[0]).longValue();
if (tx == transaction.transactionId) {
// added by this transaction
return (V) data[2];
}
// added or updated by another transaction
Long base = transaction.store.openTransactions.get(tx);
if (base == null) {
// it is committed
return (V) data[2];
}
tx = ((Long) data[0]).longValue();
// get the value before the uncommitted transaction
if (data[1] == null) {
// a new entry
return null;
}
long oldVersion = (Long) data[1];
m = map.openVersion(oldVersion);
}
}
}
}
...@@ -114,7 +114,7 @@ import org.h2.test.store.TestMVTableEngine; ...@@ -114,7 +114,7 @@ import org.h2.test.store.TestMVTableEngine;
import org.h2.test.store.TestObjectDataType; import org.h2.test.store.TestObjectDataType;
import org.h2.test.store.TestSpinLock; import org.h2.test.store.TestSpinLock;
import org.h2.test.store.TestStreamStore; import org.h2.test.store.TestStreamStore;
import org.h2.test.store.TestTransactionMap; import org.h2.test.store.TestTransactionStore;
import org.h2.test.synth.TestBtreeIndex; import org.h2.test.synth.TestBtreeIndex;
import org.h2.test.synth.TestCrashAPI; import org.h2.test.synth.TestCrashAPI;
import org.h2.test.synth.TestDiskFull; import org.h2.test.synth.TestDiskFull;
...@@ -689,7 +689,7 @@ kill -9 `jps -l | grep "org.h2.test." | cut -d " " -f 1` ...@@ -689,7 +689,7 @@ kill -9 `jps -l | grep "org.h2.test." | cut -d " " -f 1`
new TestObjectDataType().runTest(this); new TestObjectDataType().runTest(this);
new TestSpinLock().runTest(this); new TestSpinLock().runTest(this);
new TestStreamStore().runTest(this); new TestStreamStore().runTest(this);
new TestTransactionMap().runTest(this); new TestTransactionStore().runTest(this);
// unit // unit
new TestAutoReconnect().runTest(this); new TestAutoReconnect().runTest(this);
......
...@@ -733,7 +733,7 @@ public class TestMVStore extends TestBase { ...@@ -733,7 +733,7 @@ public class TestMVStore extends TestBase {
assertEquals("[10, 11, 12, 13, 14, 50, 100, 90, 91, 92]", list.toString()); assertEquals("[10, 11, 12, 13, 14, 50, 100, 90, 91, 92]", list.toString());
s.close(); s.close();
} }
private void testOldVersion() { private void testOldVersion() {
MVStore s; MVStore s;
for (int op = 0; op <= 1; op++) { for (int op = 0; op <= 1; op++) {
......
...@@ -12,25 +12,23 @@ import java.sql.ResultSet; ...@@ -12,25 +12,23 @@ import java.sql.ResultSet;
import java.sql.SQLException; import java.sql.SQLException;
import java.sql.Statement; import java.sql.Statement;
import java.util.ArrayList; import java.util.ArrayList;
import java.util.Map;
import java.util.Random; import java.util.Random;
import org.h2.mvstore.Cursor;
import org.h2.mvstore.DataUtils;
import org.h2.mvstore.MVMap;
import org.h2.mvstore.MVMapConcurrent;
import org.h2.mvstore.MVStore; import org.h2.mvstore.MVStore;
import org.h2.mvstore.TransactionStore;
import org.h2.mvstore.TransactionStore.Transaction;
import org.h2.mvstore.TransactionStore.TransactionMap;
import org.h2.test.TestBase; import org.h2.test.TestBase;
import org.h2.util.New; import org.h2.util.New;
/** /**
* Test concurrent transactions. * Test concurrent transactions.
*/ */
public class TestTransactionMap extends TestBase { public class TestTransactionStore extends TestBase {
/** /**
* Run just this test. * Run just this test.
* *
* @param a ignored * @param a ignored
*/ */
public static void main(String... a) throws Exception { public static void main(String... a) throws Exception {
...@@ -43,13 +41,13 @@ public class TestTransactionMap extends TestBase { ...@@ -43,13 +41,13 @@ public class TestTransactionMap extends TestBase {
testSingleConnection(); testSingleConnection();
testCompareWithPostgreSQL(); testCompareWithPostgreSQL();
} }
private void testSavepoint() throws Exception { private void testSavepoint() throws Exception {
MVStore s = MVStore.open(null); MVStore s = MVStore.open(null);
TransactionalStore ts = new TransactionalStore(s); TransactionStore ts = new TransactionStore(s);
Transaction tx; Transaction tx;
TransactionalMap<String, String> m; TransactionMap<String, String> m;
tx = ts.begin(); tx = ts.begin();
m = tx.openMap("test"); m = tx.openMap("test");
m.put("1", "Hello"); m.put("1", "Hello");
...@@ -61,25 +59,26 @@ public class TestTransactionMap extends TestBase { ...@@ -61,25 +59,26 @@ public class TestTransactionMap extends TestBase {
m.put("1", "Hi"); m.put("1", "Hi");
m.put("2", "."); m.put("2", ".");
m.put("3", null); m.put("3", null);
tx.rollbackTo(logId); tx.rollbackToSavepoint(logId);
assertEquals("Hallo", m.get("1")); assertEquals("Hallo", m.get("1"));
assertNull(m.get("2")); assertNull(m.get("2"));
assertEquals("!", m.get("3")); assertEquals("!", m.get("3"));
tx.rollback(); tx.rollback();
tx = ts.begin(); tx = ts.begin();
m = tx.openMap("test"); m = tx.openMap("test");
assertNull(m.get("1")); assertNull(m.get("1"));
assertNull(m.get("2")); assertNull(m.get("2"));
assertNull(m.get("3")); assertNull(m.get("3"));
ts.close();
s.close(); s.close();
} }
private void testCompareWithPostgreSQL() throws Exception { private void testCompareWithPostgreSQL() throws Exception {
ArrayList<Statement> statements = New.arrayList(); ArrayList<Statement> statements = New.arrayList();
ArrayList<Transaction> transactions = New.arrayList(); ArrayList<Transaction> transactions = New.arrayList();
ArrayList<TransactionalMap<Integer, String>> maps = New.arrayList(); ArrayList<TransactionMap<Integer, String>> maps = New.arrayList();
int connectionCount = 3, opCount = 1000, rowCount = 10; int connectionCount = 3, opCount = 1000, rowCount = 10;
try { try {
Class.forName("org.postgresql.Driver"); Class.forName("org.postgresql.Driver");
...@@ -96,9 +95,9 @@ public class TestTransactionMap extends TestBase { ...@@ -96,9 +95,9 @@ public class TestTransactionMap extends TestBase {
"drop table if exists test"); "drop table if exists test");
statements.get(0).execute( statements.get(0).execute(
"create table test(id int primary key, name varchar(255))"); "create table test(id int primary key, name varchar(255))");
MVStore s = MVStore.open(null); MVStore s = MVStore.open(null);
TransactionalStore ts = new TransactionalStore(s); TransactionStore ts = new TransactionStore(s);
for (int i = 0; i < connectionCount; i++) { for (int i = 0; i < connectionCount; i++) {
Statement stat = statements.get(i); Statement stat = statements.get(i);
// 100 ms to avoid blocking (the test is single threaded) // 100 ms to avoid blocking (the test is single threaded)
...@@ -108,25 +107,25 @@ public class TestTransactionMap extends TestBase { ...@@ -108,25 +107,25 @@ public class TestTransactionMap extends TestBase {
c.setAutoCommit(false); c.setAutoCommit(false);
Transaction transaction = ts.begin(); Transaction transaction = ts.begin();
transactions.add(transaction); transactions.add(transaction);
TransactionalMap<Integer, String> map; TransactionMap<Integer, String> map;
map = transaction.openMap("test"); map = transaction.openMap("test");
maps.add(map); maps.add(map);
} }
StringBuilder buff = new StringBuilder(); StringBuilder buff = new StringBuilder();
Random r = new Random(1); Random r = new Random(1);
try { try {
for (int i = 0; i < opCount; i++) { for (int i = 0; i < opCount; i++) {
int connIndex = r.nextInt(connectionCount); int connIndex = r.nextInt(connectionCount);
Statement stat = statements.get(connIndex); Statement stat = statements.get(connIndex);
Transaction transaction = transactions.get(connIndex); Transaction transaction = transactions.get(connIndex);
TransactionalMap<Integer, String> map = maps.get(connIndex); TransactionMap<Integer, String> map = maps.get(connIndex);
if (transaction == null) { if (transaction == null) {
transaction = ts.begin(); transaction = ts.begin();
map = transaction.openMap("test"); map = transaction.openMap("test");
transactions.set(connIndex, transaction); transactions.set(connIndex, transaction);
maps.set(connIndex, map); maps.set(connIndex, map);
// read all data, to get a snapshot // read all data, to get a snapshot
ResultSet rs = stat.executeQuery( ResultSet rs = stat.executeQuery(
"select * from test order by id"); "select * from test order by id");
...@@ -224,15 +223,17 @@ public class TestTransactionMap extends TestBase { ...@@ -224,15 +223,17 @@ public class TestTransactionMap extends TestBase {
for (Statement stat : statements) { for (Statement stat : statements) {
stat.getConnection().close(); stat.getConnection().close();
} }
ts.close();
s.close();
} }
public void testConcurrentTransactionsReadCommitted() { private void testConcurrentTransactionsReadCommitted() {
MVStore s = MVStore.open(null); MVStore s = MVStore.open(null);
TransactionalStore ts = new TransactionalStore(s); TransactionStore ts = new TransactionStore(s);
Transaction tx1, tx2; Transaction tx1, tx2;
TransactionalMap<String, String> m1, m2; TransactionMap<String, String> m1, m2;
tx1 = ts.begin(); tx1 = ts.begin();
m1 = tx1.openMap("test"); m1 = tx1.openMap("test");
...@@ -250,14 +251,14 @@ public class TestTransactionMap extends TestBase { ...@@ -250,14 +251,14 @@ public class TestTransactionMap extends TestBase {
// start new transaction to read old data // start new transaction to read old data
tx2 = ts.begin(); tx2 = ts.begin();
m2 = tx2.openMap("test"); m2 = tx2.openMap("test");
// start transaction tx1, update/delete/add // start transaction tx1, update/delete/add
tx1 = ts.begin(); tx1 = ts.begin();
m1 = tx1.openMap("test"); m1 = tx1.openMap("test");
m1.put("1", "Hallo"); m1.put("1", "Hallo");
m1.put("2", null); m1.put("2", null);
m1.put("3", "!"); m1.put("3", "!");
assertEquals("Hello", m2.get("1")); assertEquals("Hello", m2.get("1"));
assertEquals("World", m2.get("2")); assertEquals("World", m2.get("2"));
assertNull(m2.get("3")); assertNull(m2.get("3"));
...@@ -267,11 +268,11 @@ public class TestTransactionMap extends TestBase { ...@@ -267,11 +268,11 @@ public class TestTransactionMap extends TestBase {
assertEquals("Hallo", m2.get("1")); assertEquals("Hallo", m2.get("1"));
assertNull(m2.get("2")); assertNull(m2.get("2"));
assertEquals("!", m2.get("3")); assertEquals("!", m2.get("3"));
tx1 = ts.begin(); tx1 = ts.begin();
m1 = tx1.openMap("test"); m1 = tx1.openMap("test");
m1.put("2", "World"); m1.put("2", "World");
assertNull(m2.get("2")); assertNull(m2.get("2"));
assertFalse(m2.tryPut("2", null)); assertFalse(m2.tryPut("2", null));
assertFalse(m2.tryPut("2", "Welt")); assertFalse(m2.tryPut("2", "Welt"));
...@@ -282,29 +283,30 @@ public class TestTransactionMap extends TestBase { ...@@ -282,29 +283,30 @@ public class TestTransactionMap extends TestBase {
m1.put("2", null); m1.put("2", null);
assertNull(m2.get("2")); assertNull(m2.get("2"));
tx1.commit(); tx1.commit();
tx1 = ts.begin(); tx1 = ts.begin();
m1 = tx1.openMap("test"); m1 = tx1.openMap("test");
assertNull(m1.get("2")); assertNull(m1.get("2"));
m1.put("2", "World"); m1.put("2", "World");
m1.put("2", "Welt"); m1.put("2", "Welt");
tx1.rollback(); tx1.rollback();
tx1 = ts.begin(); tx1 = ts.begin();
m1 = tx1.openMap("test"); m1 = tx1.openMap("test");
assertNull(m1.get("2")); assertNull(m1.get("2"));
ts.close();
s.close(); s.close();
} }
public void testSingleConnection() { private void testSingleConnection() {
MVStore s = MVStore.open(null); MVStore s = MVStore.open(null);
TransactionalStore ts = new TransactionalStore(s); TransactionStore ts = new TransactionStore(s);
Transaction tx; Transaction tx;
TransactionalMap<String, String> m; TransactionMap<String, String> m;
// add, rollback // add, rollback
tx = ts.begin(); tx = ts.begin();
m = tx.openMap("test"); m = tx.openMap("test");
...@@ -317,7 +319,7 @@ public class TestTransactionMap extends TestBase { ...@@ -317,7 +319,7 @@ public class TestTransactionMap extends TestBase {
m = tx.openMap("test"); m = tx.openMap("test");
assertNull(m.get("1")); assertNull(m.get("1"));
assertNull(m.get("2")); assertNull(m.get("2"));
// add, commit // add, commit
tx = ts.begin(); tx = ts.begin();
m = tx.openMap("test"); m = tx.openMap("test");
...@@ -330,7 +332,7 @@ public class TestTransactionMap extends TestBase { ...@@ -330,7 +332,7 @@ public class TestTransactionMap extends TestBase {
m = tx.openMap("test"); m = tx.openMap("test");
assertEquals("Hello", m.get("1")); assertEquals("Hello", m.get("1"));
assertEquals("World", m.get("2")); assertEquals("World", m.get("2"));
// update+delete+insert, rollback // update+delete+insert, rollback
tx = ts.begin(); tx = ts.begin();
m = tx.openMap("test"); m = tx.openMap("test");
...@@ -363,347 +365,8 @@ public class TestTransactionMap extends TestBase { ...@@ -363,347 +365,8 @@ public class TestTransactionMap extends TestBase {
assertNull(m.get("2")); assertNull(m.get("2"));
assertEquals("!", m.get("3")); assertEquals("!", m.get("3"));
ts.close();
s.close(); s.close();
} }
/**
* A store that supports concurrent transactions.
*/
static class TransactionalStore {
final MVStore store;
/**
* The transaction settings. "lastTransaction" the last transaction id.
*/
final MVMap<String, String> settings;
// key: transactionId, value: baseVersion
final MVMap<Long, Long> openTransactions;
// key: [ transactionId, logId ], value: [ baseVersion, mapId, key ]
final MVMap<long[], Object[]> undoLog;
long lastTransactionId;
/**
* The lock timeout in milliseconds. 0 means timeout immediately.
*/
long lockTimeout;
TransactionalStore(MVStore store) {
this.store = store;
settings = store.openMap("settings");
openTransactions = store.openMap("openTransactions",
new MVMapConcurrent.Builder<Long, Long>());
// TODO one undo log per transaction to speed up commit
// (alternative: add a range delete operation for maps)
undoLog = store.openMap("undoLog",
new MVMapConcurrent.Builder<long[], Object[]>());
}
synchronized void init() {
String s = settings.get("lastTransaction");
if (s != null) {
lastTransactionId = Long.parseLong(s);
}
Long t = openTransactions.lastKey();
if (t != null) {
if (t.longValue() > lastTransactionId) {
throw DataUtils.newIllegalStateException("Last transaction not stored");
}
// TODO rollback all old, stored transactions (if there are any)
}
}
synchronized void close() {
settings.put("lastTransaction", "" + lastTransactionId);
}
synchronized Transaction begin() {
long baseVersion = store.getCurrentVersion();
store.incrementVersion();
long transactionId = lastTransactionId++;
if (lastTransactionId % 32 == 0) {
settings.put("lastTransaction", "" + lastTransactionId + 32);
}
openTransactions.put(transactionId, baseVersion);
return new Transaction(this, transactionId);
}
public void commit(long transactionId, long maxLogId) {
// TODO commit should be much faster
store.incrementVersion();
for (long logId = 0; logId < maxLogId; logId++) {
Object[] op = undoLog.get(new long[] {
transactionId, logId });
int mapId = ((Integer) op[1]).intValue();
Map<String, String> meta = store.getMetaMap();
String m = meta.get("map." + mapId);
String mapName = DataUtils.parseMap(m).get("name");
MVMap<Object, Object[]> map = store.openMap(mapName);
Object key = op[2];
Object[] value = map.get(key);
if (value == null) {
// already removed
} else if (value[2] == null) {
// remove the value
map.remove(key);
}
undoLog.remove(logId);
}
openTransactions.remove(transactionId);
store.commit();
}
public void rollback(long transactionId, long maxLogId) {
rollbackTo(transactionId, maxLogId, 0);
openTransactions.remove(transactionId);
store.commit();
}
public void rollbackTo(long transactionId, long maxLogId, long toLogId) {
store.incrementVersion();
for (long logId = maxLogId - 1; logId >= toLogId; logId--) {
Object[] op = undoLog.get(new long[] {
transactionId, logId });
int mapId = ((Integer) op[1]).intValue();
Map<String, String> meta = store.getMetaMap();
String m = meta.get("map." + mapId);
String mapName = DataUtils.parseMap(m).get("name");
MVMap<Object, Object[]> map = store.openMap(mapName);
Object key = op[2];
Object[] value = map.get(key);
if (value != null) {
Long oldVersion = (Long) value[1];
if (oldVersion == null) {
// this transaction added the value
map.remove(key);
} else if (oldVersion < map.getCreateVersion()) {
map.remove(key);
} else {
// this transaction updated the value
MVMap<Object, Object[]> mapOld = map
.openVersion(oldVersion);
Object[] old = mapOld.get(key);
if (old == null) {
map.remove(key);
} else {
map.put(key, old);
}
}
}
undoLog.remove(logId);
}
store.commit();
}
}
/**
* A transaction.
*/
static class Transaction {
final TransactionalStore store;
final long transactionId;
long logId;
private boolean closed;
Transaction(TransactionalStore store, long transactionId) {
this.store = store;
this.transactionId = transactionId;
}
public long setSavepoint() {
store.store.incrementVersion();
return logId;
}
void log(long baseVersion, int mapId, Object key) {
long[] undoKey = { transactionId, logId++ };
Object[] log = new Object[] { baseVersion, mapId, key };
store.undoLog.put(undoKey, log);
}
<K, V> TransactionalMap<K, V> openMap(String name) {
return new TransactionalMap<K, V>(this, name);
}
void commit() {
closed = true;
store.commit(transactionId, logId);
}
void rollback() {
closed = true;
store.rollback(transactionId, logId);
}
public void rollbackTo(long logId) {
store.rollbackTo(transactionId, this.logId, logId);
this.logId = logId;
}
void checkOpen() {
if (closed) {
throw DataUtils.newIllegalStateException("Transaction is closed");
}
}
}
/**
* A map that supports transactions.
*
* @param <K> the key type
* @param <V> the value type
*/
static class TransactionalMap<K, V> {
private Transaction transaction;
/**
* The newest version of the data.
* Key: key.
* Value: { transactionId, oldVersion, value }
*/
private final MVMap<K, Object[]> map;
private final int mapId;
TransactionalMap(Transaction transaction, String name) {
this.transaction = transaction;
map = transaction.store.store.openMap(name);
mapId = map.getId();
}
public long size() {
// TODO this method is very slow
long size = 0;
Cursor<K> cursor = map.keyIterator(null);
while (cursor.hasNext()) {
K key = cursor.next();
if (get(key) != null) {
size++;
}
}
return size;
}
private void checkOpen() {
transaction.checkOpen();
}
void put(K key, V value) {
checkOpen();
long start = 0;
while (true) {
boolean ok = tryPut(key, value);
if (ok) {
return;
}
// an uncommitted transaction:
// wait until it is committed, or until the lock timeout
long timeout = transaction.store.lockTimeout;
if (timeout == 0) {
throw DataUtils.newIllegalStateException("Lock timeout");
}
if (start == 0) {
start = System.currentTimeMillis();
} else {
long t = System.currentTimeMillis() - start;
if (t > timeout) {
throw DataUtils.newIllegalStateException("Lock timeout");
}
try {
Thread.sleep(1);
} catch (InterruptedException e) {
// ignore
}
}
}
}
public boolean tryPut(K key, V value) {
Object[] current = map.get(key);
long oldVersion = transaction.store.store.getCurrentVersion() - 1;
Object[] newValue = { transaction.transactionId, oldVersion, value };
if (current == null) {
// a new value
newValue[1] = null;
Object[] old = map.putIfAbsent(key, newValue);
if (old == null) {
transaction.log(oldVersion, mapId, key);
return true;
}
return false;
}
long tx = ((Long) current[0]).longValue();
if (tx == transaction.transactionId) {
// added or updated by this transaction
if (map.replace(key, current, newValue)) {
if (current[1] == null) {
transaction.log(oldVersion, mapId, key);
} else {
long c = (Long) current[1];
if (c != oldVersion) {
transaction.log(oldVersion, mapId, key);
}
}
return true;
}
// strange, somebody overwrite the value
// even thought the change was not committed
return false;
}
// added or updated by another transaction
Long base = transaction.store.openTransactions.get(tx);
if (base == null) {
// the transaction is committed:
// overwrite the value
if (map.replace(key, current, newValue)) {
transaction.log(oldVersion, mapId, key);
return true;
}
// somebody else was faster
return false;
}
// the transaction is not yet committed
return false;
}
@SuppressWarnings("unchecked")
V get(K key) {
checkOpen();
MVMap<K, Object[]> m = map;
while (true) {
Object[] data = m.get(key);
long tx;
if (data == null) {
// doesn't exist or deleted by a committed transaction
return null;
}
tx = ((Long) data[0]).longValue();
if (tx == transaction.transactionId) {
// added by this transaction
return (V) data[2];
}
// added or updated by another transaction
Long base = transaction.store.openTransactions.get(tx);
if (base == null) {
// it is committed
return (V) data[2];
}
tx = ((Long) data[0]).longValue();
// get the value before the uncommitted transaction
if (data[1] == null) {
// a new entry
return null;
}
long oldVersion = (Long) data[1];
m = map.openVersion(oldVersion);
}
}
}
} }
...@@ -723,4 +723,4 @@ versioning sector survives goes ssd ambiguity sizing perspective jumps ...@@ -723,4 +723,4 @@ versioning sector survives goes ssd ambiguity sizing perspective jumps
incompressible distinguished factories throughput vectors tripodi cracking incompressible distinguished factories throughput vectors tripodi cracking
brown tweak pbkdf sharding ieee galois otterstrom sharded hruda argaul gaul brown tweak pbkdf sharding ieee galois otterstrom sharded hruda argaul gaul
simo unpredictable overtakes conditionally decreases warned coupled spin simo unpredictable overtakes conditionally decreases warned coupled spin
unsynchronized reality cores effort slice addleman koskela ville unsynchronized reality cores effort slice addleman koskela ville blocking seen
\ No newline at end of file \ No newline at end of file
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论