performance.html 27.5 KB
Newer Older
1 2
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<!--
Thomas Mueller's avatar
Thomas Mueller committed
3
Copyright 2004-2009 H2 Group. Multiple-Licensed under the H2 License, Version 1.0,
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27
and under the Eclipse Public License, Version 1.0
(http://h2database.com/html/license.html).
Initial Developer: H2 Group
-->
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head><meta http-equiv="Content-Type" content="text/html;charset=utf-8" /><title>
Performance
</title><link rel="stylesheet" type="text/css" href="stylesheet.css" />
<!-- [search] { -->
<script type="text/javascript" src="navigation.js"></script>
</head><body onload="frameMe();">
<table class="content"><tr class="content"><td class="content"><div class="contentDiv">
<!-- } -->

<h1>Performance</h1>
<a href="#performance_comparison">
    Performance Comparison</a><br />
<a href="#poleposition_benchmark">
    PolePosition Benchmark</a><br />
<a href="#application_profiling">
    Application Profiling</a><br />
<a href="#database_profiling">
    Database Profiling</a><br />
<a href="#database_performance_tuning">
28
    Database Performance Tuning</a><br />
Thomas Mueller's avatar
Thomas Mueller committed
29 30
<a href="#fast_import">
    Fast Database Import</a><br />
31

Thomas Mueller's avatar
Thomas Mueller committed
32
<h2 id="performance_comparison">Performance Comparison</h2>
33
<p>
34
In many cases H2 is faster than other
35 36 37 38 39 40 41
(open source and not open source) database engines.
Please note this is mostly a single connection benchmark run on one computer.
</p>

<h3>Embedded</h3>
<table border="1" class="bar">
<tr><th>Test Case</th><th>Unit</th><th>H2</th><th>HSQLDB</th><th>Derby</th></tr>
Thomas Mueller's avatar
Thomas Mueller committed
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59
<tr><td>Simple: Init</td><td>ms</td><td>547</td><td>532</td><td>2594</td></tr>
<tr><td>Simple: Query (random)</td><td>ms</td><td>250</td><td>391</td><td>1515</td></tr>
<tr><td>Simple: Query (sequential)</td><td>ms</td><td>188</td><td>313</td><td>1406</td></tr>
<tr><td>Simple: Update (random)</td><td>ms</td><td>812</td><td>1750</td><td>17704</td></tr>
<tr><td>Simple: Delete (sequential)</td><td>ms</td><td>203</td><td>250</td><td>8843</td></tr>
<tr><td>Simple: Memory Usage</td><td>MB</td><td>7</td><td>11</td><td>11</td></tr>
<tr><td>BenchA: Init</td><td>ms</td><td>578</td><td>719</td><td>3328</td></tr>
<tr><td>BenchA: Transactions</td><td>ms</td><td>3047</td><td>2406</td><td>12907</td></tr>
<tr><td>BenchA: Memory Usage</td><td>MB</td><td>10</td><td>15</td><td>10</td></tr>
<tr><td>BenchB: Init</td><td>ms</td><td>2141</td><td>2406</td><td>11562</td></tr>
<tr><td>BenchB: Transactions</td><td>ms</td><td>1125</td><td>1375</td><td>3625</td></tr>
<tr><td>BenchB: Memory Usage</td><td>MB</td><td>9</td><td>11</td><td>8</td></tr>
<tr><td>BenchC: Init</td><td>ms</td><td>688</td><td>594</td><td>4500</td></tr>
<tr><td>BenchC: Transactions</td><td>ms</td><td>1906</td><td>64062</td><td>6047</td></tr>
<tr><td>BenchC: Memory Usage</td><td>MB</td><td>11</td><td>17</td><td>11</td></tr>
<tr><td>Executed statements</td><td>#</td><td>322929</td><td>322929</td><td>322929</td></tr>
<tr><td>Total time</td><td>ms</td><td>11485</td><td>74798</td><td>74031</td></tr>
<tr><td>Statements per second</td><td>#</td><td>28117</td><td>4317</td><td>4362</td></tr>
60 61 62 63 64
</table>

<h3>Client-Server</h3>
<table border="1" class="bar">
<tr><th>Test Case</th><th>Unit</th><th>H2</th><th>HSQLDB</th><th>Derby</th><th>PostgreSQL</th><th>MySQL</th></tr>
Thomas Mueller's avatar
Thomas Mueller committed
65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82
<tr><td>Simple: Init</td><td>ms</td><td>2782</td><td>2656</td><td>5625</td><td>4563</td><td>3484</td></tr>
<tr><td>Simple: Query (random)</td><td>ms</td><td>3093</td><td>2703</td><td>6688</td><td>4812</td><td>3860</td></tr>
<tr><td>Simple: Query (sequential)</td><td>ms</td><td>2969</td><td>2594</td><td>6437</td><td>4719</td><td>3625</td></tr>
<tr><td>Simple: Update (random)</td><td>ms</td><td>2969</td><td>3531</td><td>18250</td><td>5953</td><td>5125</td></tr>
<tr><td>Simple: Delete (sequential)</td><td>ms</td><td>1047</td><td>1250</td><td>6875</td><td>2485</td><td>2390</td></tr>
<tr><td>Simple: Memory Usage</td><td>MB</td><td>7</td><td>11</td><td>14</td><td>0</td><td>0</td></tr>
<tr><td>BenchA: Init</td><td>ms</td><td>2250</td><td>2453</td><td>6031</td><td>4328</td><td>3625</td></tr>
<tr><td>BenchA: Transactions</td><td>ms</td><td>10250</td><td>9016</td><td>21484</td><td>15609</td><td>11172</td></tr>
<tr><td>BenchA: Memory Usage</td><td>MB</td><td>10</td><td>15</td><td>10</td><td>0</td><td>1</td></tr>
<tr><td>BenchB: Init</td><td>ms</td><td>9500</td><td>10672</td><td>22609</td><td>19609</td><td>13406</td></tr>
<tr><td>BenchB: Transactions</td><td>ms</td><td>2734</td><td>2656</td><td>3875</td><td>4688</td><td>2531</td></tr>
<tr><td>BenchB: Memory Usage</td><td>MB</td><td>10</td><td>11</td><td>11</td><td>1</td><td>1</td></tr>
<tr><td>BenchC: Init</td><td>ms</td><td>1860</td><td>1484</td><td>6890</td><td>2219</td><td>3438</td></tr>
<tr><td>BenchC: Transactions</td><td>ms</td><td>9046</td><td>63266</td><td>18641</td><td>11703</td><td>7421</td></tr>
<tr><td>BenchC: Memory Usage</td><td>MB</td><td>12</td><td>17</td><td>13</td><td>0</td><td>1</td></tr>
<tr><td>Executed statements</td><td>#</td><td>322929</td><td>322929</td><td>322929</td><td>322929</td><td>322929</td></tr>
<tr><td>Total time</td><td>ms</td><td>48500</td><td>102281</td><td>123405</td><td>80688</td><td>60077</td></tr>
<tr><td>Statements per second</td><td>#</td><td>6658</td><td>3157</td><td>2616</td><td>4002</td><td>5375</td></tr>
83 84 85 86 87 88
</table>

<h3>Benchmark Results and Comments</h3>

<h4>H2</h4>
<p>
Thomas Mueller's avatar
Thomas Mueller committed
89
Version 1.1.114 (2009-06-01) was used for the test.
90 91 92 93 94 95
For simpler operations, the performance of H2 is about the same as for HSQLDB.
For more complex queries, the query optimizer is very important.
However H2 is not very fast in every case, certain kind of queries may still be slow.
One situation where is H2 is slow is large result sets, because they are buffered to
disk if more than a certain number of records are returned.
The advantage of buffering is, there is no limit on the result set size.
Thomas Mueller's avatar
Thomas Mueller committed
96
The open/close time is almost fixed, because of the file locking protocol: the engine waits
Thomas Mueller's avatar
Thomas Mueller committed
97
some time after opening a database to ensure the database files are not opened by another process.
98 99 100 101 102 103
</p>

<h4>HSQLDB</h4>
<p>
Version 1.8.0.10 was used for the test.
Cached tables are used in this test (hsqldb.default_table_type=cached),
104
and the write delay is 1 second (<code>SET WRITE_DELAY 1</code>).
105 106 107 108
HSQLDB is fast when using simple operations.
HSQLDB is very slow in the last test (BenchC: Transactions), probably because is has a bad query optimizer.
One query where HSQLDB is slow is a two-table join:
</p>
109
<pre>
110
SELECT COUNT(DISTINCT S_I_ID) FROM ORDER_LINE, STOCK
111
WHERE OL_W_ID=? AND OL_D_ID=? AND OL_O_ID&lt;? AND OL_O_ID&gt;=?
112 113 114 115
AND S_W_ID=? AND S_I_ID=OL_I_ID AND S_QUANTITY&lt;?
</pre>
<p>
The PolePosition benchmark also shows that the query optimizer does not do a very good job for some queries.
Thomas Mueller's avatar
Thomas Mueller committed
116 117
Another disadvantage of HSQLDB is the slow startup / shutdown time (currently not listed) when using bigger databases.
The reason is, a backup of the whole data is made whenever the database is opened or closed.
118 119 120 121 122 123
</p>

<h4>Derby</h4>
<p>
Version 10.4.2.0 was used for the test. Derby is clearly the slowest embedded database in this test.
This seems to be a structural problem, because all operations are really slow.
Thomas Mueller's avatar
Thomas Mueller committed
124
It will be hard for the developers of Derby to improve the performance to a reasonable level.
Thomas Mueller's avatar
Thomas Mueller committed
125
A few problems have been identified: leaving autocommit on is a problem for Derby.
126
If it is switched off during the whole test, the results are about 20% better for Derby.
127
Derby supports a testing mode (system property <code>derby.system.durability=test</code>) where durability is
128 129 130 131 132 133 134 135
disabled. According to the documentation, this setting should be used for testing only,
as the database may not recover after a crash. Enabling this setting improves performance
by a factor of 2.6 (embedded mode) or 1.4 (server mode). Even if enabled, Derby is still less
than half as fast as H2 in default mode.
</p>

<h4>PostgreSQL</h4>
<p>
Thomas Mueller's avatar
Thomas Mueller committed
136
Version 8.3.7 was used for the test.
137
The following options where changed in <code>postgresql.conf:
138
fsync = off, commit_delay = 1000</code>.
139 140 141 142 143 144 145
PostgreSQL is run in server mode. It looks like the base performance is slower than
MySQL, the reason could be the network layer.
The memory usage number is incorrect, because only the memory usage of the JDBC driver is measured.
</p>

<h4>MySQL</h4>
<p>
Thomas Mueller's avatar
Thomas Mueller committed
146
Version 5.1.34-community was used for the test.
147
MySQL was run with the InnoDB backend.
148 149
The setting <code>innodb_flush_log_at_trx_commit</code>
(found in the <code>my.ini</code> file) was set to 0. Otherwise (and by default), MySQL is really slow
150 151 152 153 154
(around 140 statements per second in this test) because it tries to flush the data to disk for each commit.
For small transactions (when autocommit is on) this is really slow.
But many use cases use small or relatively small transactions.
Too bad this setting is not listed in the configuration wizard,
and it always overwritten when using the wizard.
155
You need to change this setting manually in the file <code>my.ini</code>, and then restart the service.
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175
The memory usage number is incorrect, because only the memory usage of the JDBC driver is measured.
</p>

<h4>Firebird</h4>
<p>
Firebird 1.5 (default installation) was tested, but the results are not published currently.
It is possible to run the performance test with the Firebird database,
and any information on how to configure Firebird for higher performance are welcome.
</p>

<h4>Why Oracle / MS SQL Server / DB2 are Not Listed</h4>
<p>
The license of these databases does not allow to publish benchmark results.
This doesn't mean that they are fast. They are in fact quite slow,
and need a lot of memory. But you will need to test this yourself.
SQLite was not tested because the JDBC driver doesn't support transactions.
</p>

<h3>About this Benchmark</h3>

176 177 178 179
<h4>How to Run</h4>
<p>
This test was executed as follows:
</p>
180
<pre>
181 182 183 184 185 186 187 188 189
build benchmark
</pre>

<h4>Separate Process per Database</h4>
<p>
For each database, a new process is started, to ensure the previous test does not impact
the current test.
</p>

190 191 192 193 194 195 196 197
<h4>Number of Connections</h4>
<p>
This is mostly a single-connection benchmark.
BenchB uses multiple connections; the other tests use one connection.
</p>

<h4>Real-World Tests</h4>
<p>
Thomas Mueller's avatar
Thomas Mueller committed
198 199
Good benchmarks emulate real-world use cases. This benchmark includes 4 test cases:
BenchSimple uses one table and many small updates / deletes.
200 201 202 203 204 205 206 207
BenchA is similar to the TPC-A test, but single connection / single threaded (see also: www.tpc.org).
BenchB is similar to the TPC-B test, using multiple connections (one thread per connection).
BenchC is similar to the TPC-C test, but single connection / single threaded.
</p>

<h4>Comparing Embedded with Server Databases</h4>
<p>
This is mainly a benchmark for embedded databases (where the application runs in the same
Thomas Mueller's avatar
Thomas Mueller committed
208
virtual machine as the database engine). However MySQL and PostgreSQL are not Java
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229
databases and cannot be embedded into a Java application.
For the Java databases, both embedded and server modes are tested.
</p>

<h4>Test Platform</h4>
<p>
This test is run on Windows XP with the virus scanner switched off.
The VM used is Sun JDK 1.5.
</p>

<h4>Multiple Runs</h4>
<p>
When a Java benchmark is run first, the code is not fully compiled and
therefore runs slower than when running multiple times. A benchmark
should always run the same test multiple times and ignore the first run(s).
This benchmark runs three times, but only the last run is measured.
</p>

<h4>Memory Usage</h4>
<p>
It is not enough to measure the time taken, the memory usage is important as well.
Thomas Mueller's avatar
Thomas Mueller committed
230
Performance can be improved by using a bigger cache, but the amount of memory is limited.
231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247
HSQLDB tables are kept fully in memory by default; this benchmark
uses 'disk based' tables for all databases.
Unfortunately, it is not so easy to calculate the memory usage of PostgreSQL
and MySQL, because they run in a different process than the test. This benchmark currently
does not print memory usage of those databases.
</p>

<h4>Delayed Operations</h4>
<p>
Some databases delay some operations (for example flushing the buffers)
until after the benchmark is run. This benchmark waits between
each database tested, and each database runs in a different process (sequentially).
</p>

<h4>Transaction Commit / Durability</h4>
<p>
Durability means transaction committed to the database will not be lost.
248 249
Some databases (for example MySQL) try to enforce this by default by
calling <code>fsync()</code> to flush the buffers, but
250
most hard drives don't actually flush all data. Calling the method slows down transaction commit a lot,
251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273
but doesn't always make data durable. When comparing the results, it is important to
think about the effect. Many database suggest to 'batch' operations when possible.
This benchmark switches off autocommit when loading the data, and calls commit after each 1000
inserts. However many applications need 'short' transactions at runtime (a commit after each update).
This benchmark commits after each update / delete in the simple benchmark, and after each
business transaction in the other benchmarks. For databases that support delayed commits,
a delay of one second is used.
</p>

<h4>Using Prepared Statements</h4>
<p>
Wherever possible, the test cases use prepared statements.
</p>

<h4>Currently Not Tested: Startup Time</h4>
<p>
The startup time of a database engine is important as well for embedded use.
This time is not measured currently.
Also, not tested is the time used to create a database and open an existing database.
Here, one (wrapper) connection is opened at the start,
and for each step a new connection is opened and then closed.
</p>

Thomas Mueller's avatar
Thomas Mueller committed
274
<h2 id="poleposition_benchmark">PolePosition Benchmark</h2>
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301
<p>
The PolePosition is an open source benchmark. The algorithms are all quite simple.
It was developed / sponsored by db4o.
</p>
<table border="1" class="bar">
<tr><th>Test Case</th><th>Unit</th><th>H2</th><th>HSQLDB</th><th>MySQL</th></tr>
<tr><td>Melbourne write</td><td>ms</td><td>369</td><td>249</td><td>2022</td></tr>
<tr><td>Melbourne read</td><td>ms</td><td>47</td><td>49</td><td>93</td></tr>
<tr><td>Melbourne read_hot</td><td>ms</td><td>24</td><td>43</td><td>95</td></tr>
<tr><td>Melbourne delete</td><td>ms</td><td>147</td><td>133</td><td>176</td></tr>
<tr><td>Sepang write</td><td>ms</td><td>965</td><td>1201</td><td>3213</td></tr>
<tr><td>Sepang read</td><td>ms</td><td>765</td><td>948</td><td>3455</td></tr>
<tr><td>Sepang read_hot</td><td>ms</td><td>789</td><td>859</td><td>3563</td></tr>
<tr><td>Sepang delete</td><td>ms</td><td>1384</td><td>1596</td><td>6214</td></tr>
<tr><td>Bahrain write</td><td>ms</td><td>1186</td><td>1387</td><td>6904</td></tr>
<tr><td>Bahrain query_indexed_string</td><td>ms</td><td>336</td><td>170</td><td>693</td></tr>
<tr><td>Bahrain query_string</td><td>ms</td><td>18064</td><td>39703</td><td>41243</td></tr>
<tr><td>Bahrain query_indexed_int</td><td>ms</td><td>104</td><td>134</td><td>678</td></tr>
<tr><td>Bahrain update</td><td>ms</td><td>191</td><td>87</td><td>159</td></tr>
<tr><td>Bahrain delete</td><td>ms</td><td>1215</td><td>729</td><td>6812</td></tr>
<tr><td>Imola retrieve</td><td>ms</td><td>198</td><td>194</td><td>4036</td></tr>
<tr><td>Barcelona write</td><td>ms</td><td>413</td><td>832</td><td>3191</td></tr>
<tr><td>Barcelona read</td><td>ms</td><td>119</td><td>160</td><td>1177</td></tr>
<tr><td>Barcelona query</td><td>ms</td><td>20</td><td>5169</td><td>101</td></tr>
<tr><td>Barcelona delete</td><td>ms</td><td>388</td><td>319</td><td>3287</td></tr>
<tr><td>Total</td><td>ms</td><td>26724</td><td>53962</td><td>87112</td></tr>
</table>
302 303 304
<p>
There are a few problems with the PolePosition test:
</p>
305 306 307
<ul><li>
HSQLDB uses in-memory tables by default while H2 uses persistent tables. The HSQLDB version
included in PolePosition does not support changing this, so you need to replace
308 309
<code>poleposition-0.20/lib/hsqldb.jar</code> with a newer version (for example
<code>hsqldb-1.8.0.7.jar</code>),
310
and then use the setting
311 312
<code>hsqldb.connecturl=jdbc:hsqldb:file:data/hsqldb/dbbench2;hsqldb.default_table_type=cached;sql.enforce_size=true</code>
in the file <code>Jdbc.properties</code>.
313
</li><li>HSQLDB keeps the database open between tests, while H2 closes the database (losing all the cache).
314
To change that, use the database URL <code>jdbc:h2:file:data/h2/dbbench;DB_CLOSE_DELAY=-1</code>
315
</li><li>The amount of cache memory is quite important, specially for the PolePosition test.
316 317
Unfortunately, the PolePosition test does not take this into account.
</li></ul>
318

Thomas Mueller's avatar
Thomas Mueller committed
319
<h2 id="application_profiling">Application Profiling</h2>
320 321 322

<h3>Analyze First</h3>
<p>
323 324 325
Before trying to optimize performance, it is important to understand where the problem is (what part of the application is slow).
Blind optimization or optimization based on guesses should be avoided, because usually it is not an efficient strategy.
There are various ways to analyze an application. Sometimes two implementations can be compared using
326
<code>System.currentTimeMillis()</code>. But this does not work for complex applications with many modules, and for memory problems.
327 328
</p>
<p>
329 330
A good tool to measure both memory usage and performance is the
<a href="http://www.yourkit.com">YourKit Java Profiler</a>.
331 332 333 334
</p>
<p>
A simple way to profile an application is to use the built-in profiling tool of java. Example:
</p>
335
<pre>
336 337 338
java -Xrunhprof:cpu=samples,depth=16 com.acme.Test
</pre>
<p>
339
Unfortunately, it is only possible to profile the application from start to end. Another solution is to create
340 341
a number of full thread dumps. To do that, first run <code>jps -l</code> to get the process id, and then
run <code>jstack &lt;pid&gt;</code> or <code>kill -QUIT &lt;pid&gt;</code> (Linux) or press
342
Ctrl+C (Windows).
343 344
</p>

Thomas Mueller's avatar
Thomas Mueller committed
345
<h2 id="database_profiling">Database Profiling</h2>
346
<p>
347 348
The <code>ConvertTraceFile</code> tool generates SQL statement statistics at the end of the SQL script file.
The format used is similar to the profiling data generated when using <code>java -Xrunhprof</code>.
349 350
As an example, execute the the following script using the H2 Console:
</p>
351
<pre>
352 353 354 355 356 357 358
SET TRACE_LEVEL_FILE 3;
DROP TABLE IF EXISTS TEST;
CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR(255));
@LOOP 1000 INSERT INTO TEST VALUES(?, ?);
SET TRACE_LEVEL_FILE 0;
</pre>
<p>
359
Now convert the <code>.trace.db</code> file using the <code>ConvertTraceFile</code> tool:
360
</p>
361
<pre>
362
java -cp h2*.jar org.h2.tools.ConvertTraceFile
Thomas Mueller's avatar
Thomas Mueller committed
363
    -traceFile "~/test.trace.db" -script "~/test.sql"
364 365
</pre>
<p>
366
The generated file <code>test.sql</code> will contain the SQL statements as well as the
367 368
following profiling data (results vary):
</p>
369
<pre>
370 371 372 373 374 375 376 377 378 379 380 381 382
-----------------------------------------
-- SQL Statement Statistics
-- time: total time in milliseconds (accumulated)
-- count: how many times the statement ran
-- result: total update count or row count
-----------------------------------------
-- self accu    time   count  result sql
--  62%  62%     158    1000    1000 INSERT INTO TEST VALUES(?, ?);
--  37% 100%      93       1       0 CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR(255));
--   0% 100%       0       1       0 DROP TABLE IF EXISTS TEST;
--   0% 100%       0       1       0 SET TRACE_LEVEL_FILE 3;
</pre>

Thomas Mueller's avatar
Thomas Mueller committed
383
<h2 id="database_performance_tuning">Database Performance Tuning</h2>
384

385 386
<h3>Keep Connections Open or Use a Connection Pool</h3>
<p>
Thomas Mueller's avatar
Thomas Mueller committed
387
If your application opens and closes connections a lot (for example, for each request),
388 389 390 391
you should consider using a  <a href="tutorial.html#connection_pool">connection pool</a>.
Opening a connection using <code>DriverManager.getConnection</code> is specially slow
if the database is closed. By default the database is closed if the last connection is closed.
</p><p>
Thomas Mueller's avatar
Thomas Mueller committed
392 393 394
If you open and close connections a lot but don't want to use a connection pool,
consider keeping a 'sentinel' connection open for as long as the application runs,
or use delayed database closing. See also
395 396 397
<a href="features.html#closing_a_database">Closing a database</a>.
</p>

398 399
<h3>Use a Modern JVM</h3>
<p>
Thomas Mueller's avatar
Thomas Mueller committed
400
Newer JVMs are faster. Upgrading to the latest version of your JVM can provide a "free" boost to performance.
401
Switching from the default Client JVM to the Server JVM using the <code>-server</code> command-line
402
option improves performance at the cost of a slight increase in start-up time.
403 404
</p>

405 406 407 408
<h3>Virus Scanners</h3>
<p>
Some virus scanners scan files every time they are accessed.
It is very important for performance that database files are not scanned for viruses.
409
The database engine never interprets the data stored in the files as programs,
410 411
that means even if somebody would store a virus in a database file, this would
be harmless (when the virus does not run, it cannot spread).
412
Some virus scanners allow to exclude files by suffix. Ensure files ending with <code>.db</code> are not scanned.
413 414 415 416
</p>

<h3>Using the Trace Options</h3>
<p>
417
If the performance hot spots are in the database engine, in many cases the performance
418 419 420 421 422 423 424 425
can be optimized by creating additional indexes, or changing the schema. Sometimes the
application does not directly generate the SQL statements, for example if an O/R mapping tool
is used. To view the SQL statements and JDBC API calls, you can use the trace options.
For more information, see <a href="features.html#trace_options">Using the Trace Options</a>.
</p>

<h3>Index Usage</h3>
<p>
426 427 428
This database uses indexes to improve the performance of
<code>SELECT, UPDATE, DELETE</code>.
If a column is used in the <code>WHERE</code> clause of a query, and if an index exists on this column,
429 430
then the index can be used. Multi-column indexes are used if all or the first columns of the index are used.
Both equality lookup and range scans are supported.
431 432
Indexes are used to order result sets, but only if the condition uses the same index or no index at all.
The results are sorted in memory if required.
433 434
Indexes are created automatically for primary key and unique constraints.
Indexes are also created for foreign key constraints, if required.
435
For other columns, indexes need to be created manually using the <code>CREATE INDEX</code> statement.
436 437 438 439 440 441 442 443 444 445 446 447 448 449 450 451 452
</p>

<h3>Optimizer</h3>
<p>
This database uses a cost based optimizer. For simple and queries and queries with medium complexity
(less than 7 tables in the join), the expected cost (running time) of all possible plans is calculated,
and the plan with the lowest cost is used. For more complex queries, the algorithm first tries
all possible combinations for the first few tables, and the remaining tables added using a greedy algorithm
(this works well for most joins). Afterwards a genetic algorithm is used to test at most 2000 distinct plans.
Only left-deep plans are evaluated.
</p>

<h3>Expression Optimization</h3>
<p>
After the statement is parsed, all expressions are simplified automatically if possible. Operations
are evaluated only once if all parameters are constant. Functions are also optimized, but only
if the function is constant (always returns the same result for the same parameter values).
453
If the <code>WHERE</code> clause is always false, then the table is not accessed at all.
454 455 456 457 458
</p>

<h3>COUNT(*) Optimization</h3>
<p>
If the query only counts all rows of a table, then the data is not accessed.
459 460
However, this is only possible if no <code>WHERE</code> clause is used, that means it only works for
queries of the form <code>SELECT COUNT(*) FROM table</code>.
461 462 463 464 465 466
</p>

<h3>Updating Optimizer Statistics / Column Selectivity</h3>
<p>
When executing a query, at most one index per joined table can be used.
If the same table is joined multiple times, for each join only one index is used.
467 468
Example: for the query
<code>SELECT * FROM TEST T1, TEST T2 WHERE T1.NAME='A' AND T2.ID=T1.ID</code>,
469 470 471
two index can be used, in this case the index on NAME for T1 and the index on ID for T2.
</p><p>
If a table has multiple indexes, sometimes more than one index could be used.
472 473
Example: if there is a table <code>TEST(ID, NAME, FIRSTNAME)</code> and an index on each column,
then two indexes could be used for the query <code>SELECT * FROM TEST WHERE NAME='A' AND FIRSTNAME='B'</code>,
474 475 476 477 478 479
the index on NAME or the index on FIRSTNAME. It is not possible to use both indexes at the same time.
Which index is used depends on the selectivity of the column. The selectivity describes the 'uniqueness' of
values in a column. A selectivity of 100 means each value appears only once, and a selectivity of 1 means
the same value appears in many or most rows. For the query above, the index on NAME should be used
if the table contains more distinct names than first names.
</p><p>
480
The SQL statement <code>ANALYZE</code> can be used to automatically estimate the selectivity of the columns in the tables.
481 482 483
This command should be run from time to time to improve the query plans generated by the optimizer.
</p>

484 485 486 487 488 489 490
<h3>In-Memory (Hash) Indexes</h3>
<p>
Using in-memory indexes, specially in-memory hash indexes, can speed up
queries and data manipulation.
</p>
<p>In-memory indexes are automatically used
for in-memory databases, but can also be created for persistent databases
491
using <code>CREATE MEMORY TABLE</code>. In many cases,
492 493 494 495 496
the rows itself will also be kept in-memory. Please note this may cause memory
problems for large tables.
</p>
<p>
In-memory hash indexes are backed by a hash table and are usually faster than
497 498 499 500
regular indexes. However, hash indexes only supports direct lookup (<code>WHERE ID = ?</code>)
but not range scan (<code>WHERE ID &lt; ?</code>). To use hash indexes, use HASH as in:
<code>CREATE UNIQUE HASH INDEX</code> and
<code>CREATE TABLE ...(ID INT PRIMARY KEY HASH,...)</code>.
501 502
</p>

503 504
<h3>Optimization Examples</h3>
<p>
505
See <code>src/test/org/h2/samples/optimizations.sql</code> for a few examples of queries
506 507 508
that benefit from special optimizations built into the database.
</p>

509 510 511 512 513 514
<h3>Cache Size and Type</h3>
<p>
By default the cache size of H2 is quite small. Consider using a larger cache size, or enable
the second level soft reference cache. See also <a href="features.html#cache_settings">Cache Settings</a>.
</p>

515 516 517
<h3>Data Types</h3>
<p>
Each data type has different storage and performance characteristics:
Thomas Mueller's avatar
Thomas Mueller committed
518
</p>
519 520
<ul><li>The <code>DECIMAL/NUMERIC</code> type is slower
    and requires more storage than the <code>REAL</code> and <code>DOUBLE</code> types.
521
</li><li>Text types are slower to read, write, and compare than numeric types and generally require more storage.
522 523 524 525 526 527 528 529
</li><li>See <a href="advanced.html#large_objects">Large Objects</a> for information on
    <code>BINARY</code> vs. <code>BLOB</code>
    and <code>VARCHAR</code> vs. <code>CLOB</code> performance.
</li><li>Parsing and formatting takes longer for the
    <code>TIME</code>, <code>DATE</code>, and
    <code>TIMESTAMP</code> types than the numeric types.
</li><li><code>SMALLINT/TINYINT/BOOLEAN</code> are not significantly smaller or faster
    to work with than <code>INTEGER</code> in most modes.
Thomas Mueller's avatar
Thomas Mueller committed
530
</li></ul>
531

Thomas Mueller's avatar
Thomas Mueller committed
532 533 534 535 536 537 538 539 540 541 542 543 544 545
<h3>Sorted Insert Optimization</h3>
<p>
To reduce disk space usage and speed up table creation, an
optimization for sorted inserts is available. When used, b-tree pages
are split at the insertion point. To use this optimization, add <code>SORTED</code>
before the <code>SELECT</code> statement:
</p>
<pre>
CREATE TABLE TEST(ID INT PRIMARY KEY, NAME VARCHAR) AS
    SORTED SELECT X, SPACE(100) FROM SYSTEM_RANGE(1, 100);
INSERT INTO TEST
    SORTED SELECT X, SPACE(100) FROM SYSTEM_RANGE(101, 200);
</pre>

Thomas Mueller's avatar
Thomas Mueller committed
546
<h2 id="fast_import">Fast Database Import</h2>
Thomas Mueller's avatar
Thomas Mueller committed
547 548 549
<p>
To speed up large imports, consider using the following options temporarily:
</p>
550 551 552 553
<ul><li><code>SET CACHE_SIZE</code> (a large cache is faster)
</li><li><code>SET LOCK_MODE 0</code> (disable locking)
</li><li><code>SET LOG 0</code> (disable the transaction log)
</li><li><code>SET UNDO_LOG 0</code> (disable the session undo log)
Thomas Mueller's avatar
Thomas Mueller committed
554 555 556
</li></ul>
<p>
These options can be set in the database URL:
557
<code>jdbc:h2:~/test;CACHE_SIZE=65536;LOCK_MODE=0;LOG=0;UNDO_LOG=0</code>.
Thomas Mueller's avatar
Thomas Mueller committed
558 559 560
Most of those options are not recommended for regular use, that means you need to reset them after use.
</p>

561
<!-- [close] { --></div></td></tr></table><!-- } --><!-- analytics --></body></html>