提交 fa2f2473 authored 作者: Thomas Mueller's avatar Thomas Mueller

Documentation.

上级 0fec2501
......@@ -18,7 +18,13 @@ Change Log
<h1>Change Log</h1>
<h2>Next Version (unreleased)</h2>
<ul><li>Improved error detection when starting a server with invalid arguments,
<ul><li>MVCC: the probability of lock timeouts is now lower if multiple threads try to update the same rows.
</li><li>Building only the documentation (without compiling all classes) didn't work, specially: ./build.sh clean javadocImpl.
</li><li>Documentation on how data is stored internally and how indexes work (in the performance section).
</li><li>Some people reported NullPointerException in FileObjectDiskMapped.
The most likely explanation is that multiple threads access the same object at the same time.
Therefore, the public methods in this class are now synchronized.
</li><li>Improved error detection when starting a server with invalid arguments,
such as "-tcpPort=9091" or "-tcpPort 9091" (as one parameter) instead of "-tcpPort", "9091".
</li><li>The function STRINGDECODE ignored characters after a non-escaped double quote.
This is no longer the case.
......
......@@ -28,6 +28,8 @@ Performance
Application Profiling</a><br />
<a href="#database_profiling">
Database Profiling</a><br />
<a href="#storage_and_indexes">
How Data is Stored and How Indexes Work</a><br />
<a href="#explain_plan">
Statement Execution Plans</a><br />
<a href="#fast_import">
......@@ -609,6 +611,116 @@ following profiling data (results vary):
-- 0% 100% 0 1 0 SET TRACE_LEVEL_FILE 3;
</pre>
<h2 id="storage_and_indexes">How Data is Stored and How Indexes Work</h2>
<p>
Internally, each row in a table is identified by a unique number, the row id.
The rows of a table are stored with the row id as the key.
The row id is a number of type long.
If a table has a single column primary key of type <code>INT</code> or <code>BIGINT</code>,
then the value of this column is the row id, otherwise the database generates the row id automatically.
There is a (non-standard) way to access the row id: using the <code>_ROWID_</code> pseudo-column:
</p>
<pre>
CREATE TABLE ADDRESS(FIRST_NAME VARCHAR, NAME VARCHAR, CITY VARCHAR, PHONE VARCHAR);
INSERT INTO ADDRESS VALUES('John', 'Miller', 'Berne', '123 456 789');
INSERT INTO ADDRESS VALUES('Philip', 'Jones', 'Berne', '123 012 345');
SELECT _ROWID_, * FROM ADDRESS;
</pre>
<p>
The data is stored in the database as follows:
</p>
<table>
<tr><th>_ROWID_</th><th>FIRST_NAME</th><th>NAME</th><th>CITY</th><th>PHONE</th></tr>
<tr><td>1</td><td>John</td><td>Miller</td><td>Berne</td><td>123 456 789</td></tr>
<tr><td>2</td><td>Philip</td><td>Jones</td><td>Berne</td><td>123 012 345</td></tr>
</table>
<p>
Access by row id is fast because the data is sorted by this key.
If the query condition does not contain the row id (and if no other index can be used), then all rows of the table are scanned.
A table scan iterates over all rows in the table, in the order of the row id.
To find out what strategy the database uses to retrieve the data, use <code>EXPLAIN SELECT</code>:
</p>
<pre>
SELECT * FROM ADDRESS WHERE NAME = 'Miller';
EXPLAIN SELECT PHONE FROM ADDRESS WHERE NAME = 'Miller';
SELECT
PHONE
FROM PUBLIC.ADDRESS
/* PUBLIC.ADDRESS.tableScan */
WHERE NAME = 'Miller';
</pre>
<p>
An index internally is basically just a table that contains the indexed column(s), plus the row id:
</p>
<pre>
CREATE INDEX INDEX_PLACE ON ADDRESS(CITY, NAME, FIRST_NAME);
</pre>
<p>
In the index, the data is sorted by the indexed columns.
So this index contains the following data:
</p>
<table>
<tr><th>CITY</th><th>NAME</th><th>FIRST_NAME</th><th>_ROWID_</th></tr>
<tr><td>Berne</td><td>Jones</td><td>Philip</td><td>2</td></tr>
<tr><td>Berne</td><td>Miller</td><td>John</td><td>1</td></tr>
</table>
<p>
When the database uses an index to query the data, it searches the index for the given data,
and (if required) reads the remaining columns in the main data table (retrieved using the row id).
An index on city, name, and first name allows to quickly search for rows when the city, name, and first name are known.
If only the city and name, or only the city is known, then the index is also used.
This index is also used when reading all rows, sorted by the indexed columns.
However, if only the first name is known, then this index is not used:
</p>
<pre>
EXPLAIN SELECT PHONE FROM ADDRESS WHERE CITY = 'Berne' AND NAME = 'Miller' AND FIRST_NAME = 'John';
SELECT
PHONE
FROM PUBLIC.ADDRESS
/* PUBLIC.INDEX_PLACE: FIRST_NAME = 'John'
AND CITY = 'Berne'
AND NAME = 'Miller'
*/
WHERE (FIRST_NAME = 'John')
AND ((CITY = 'Berne')
AND (NAME = 'Miller'));
EXPLAIN SELECT PHONE FROM ADDRESS WHERE CITY = 'Berne';
SELECT
PHONE
FROM PUBLIC.ADDRESS
/* PUBLIC.INDEX_PLACE: CITY = 'Berne' */
WHERE CITY = 'Berne';
EXPLAIN SELECT * FROM ADDRESS ORDER BY CITY, NAME, FIRST_NAME;
SELECT
ADDRESS.FIRST_NAME,
ADDRESS.NAME,
ADDRESS.CITY,
ADDRESS.PHONE
FROM PUBLIC.ADDRESS
/* PUBLIC.INDEX_PLACE */
ORDER BY 3, 2, 1
/* index sorted */;
EXPLAIN SELECT PHONE FROM ADDRESS WHERE FIRST_NAME = 'John';
SELECT
PHONE
FROM PUBLIC.ADDRESS
/* PUBLIC.ADDRESS.tableScan */
WHERE FIRST_NAME = 'John';
</pre>
<p>
If your application often queries the table for a phone number, then it makes sense to create
an additional index on it, which then contains the following data:
</p>
<table>
<tr><th>PHONE</th><th>_ROWID_</th></tr>
<tr><td>123 012 345</td><td>2</td></tr>
<tr><td>123 456 789</td><td>1</td></tr>
</table>
<h2 id="explain_plan">Statement Execution Plans</h2>
<p>
The SQL statement <code>EXPLAIN</code> displays the indexes and optimizations the database uses for a statement.
......
......@@ -557,6 +557,7 @@ See also <a href="build.html#providing_patches">Providing Patches</a>.
</li><li>Compatibility with IBM DB2: SQL cursors.
</li><li>Single-column primary key values are always stored explicitly. This is not required.
</li><li>Compatibility with MySQL: support CREATE TABLE TEST(NAME VARCHAR(255) CHARACTER SET UTF8).
</li><li>CALL is incompatible with other databases because it returns a result set, so that CallableStatement.execute() returns true.
</li></ul>
<h2>Not Planned</h2>
......
Markdown 格式
0%
您添加了 0 到此讨论。请谨慎行事。
请先完成此评论的编辑!
注册 或者 后发表评论