Skip to content
项目
群组
代码片段
帮助
正在加载...
帮助
为 GitLab 提交贡献
登录/注册
切换导航
H
h2database
项目
项目
详情
活动
周期分析
仓库
仓库
文件
提交
分支
标签
贡献者
分枝图
比较
统计图
议题
0
议题
0
列表
看板
标记
里程碑
合并请求
0
合并请求
0
CI / CD
CI / CD
流水线
作业
计划
统计图
Wiki
Wiki
代码片段
代码片段
成员
成员
折叠边栏
关闭边栏
活动
分枝图
统计图
创建新议题
作业
提交
议题看板
打开侧边栏
Administrator
h2database
Commits
912789cb
提交
912789cb
authored
10 年前
作者:
Thomas Mueller
浏览文件
操作
浏览文件
下载
电子邮件补丁
差异文件
Reduced memory usage for reading; improved documentation.
上级
ee9a2829
显示空白字符变更
内嵌
并排
正在显示
1 个修改的文件
包含
99 行增加
和
66 行删除
+99
-66
MinimalPerfectHash.java
h2/src/tools/org/h2/dev/hash/MinimalPerfectHash.java
+99
-66
没有找到文件。
h2/src/tools/org/h2/dev/hash/MinimalPerfectHash.java
浏览文件 @
912789cb
...
@@ -15,30 +15,30 @@ import java.util.zip.Inflater;
...
@@ -15,30 +15,30 @@ import java.util.zip.Inflater;
/**
/**
* A minimal perfect hash function tool. It needs about 2.0 bits per key.
* A minimal perfect hash function tool. It needs about 2.0 bits per key.
* <p>
* <p>
* Generating the hash function takes about 2.5 second per million keys with 8
* cores (multithreaded).
* <p>
* The algorithm is recursive: sets that contain no or only one entry are not
* The algorithm is recursive: sets that contain no or only one entry are not
* processed as no conflicts are possible. Sets that contain between 2 and 12
* processed as no conflicts are possible. For sets that contain between 2 and
* entries, a number of hash functions are tested to check if they can store the
* 12 entries, a number of hash functions are tested to check if they can store
* data without conflict. If no function was found, and for larger sets, the set
* the data without conflict. If no function was found, and for larger sets, the
* is split into a (possibly high) number of smaller set, which are processed
* set is split into a (possibly high) number of smaller set, which are
* recursively.
* processed recursively. The average size of a top-level bucket is about 216
* entries, and the maximum recursion level is typically 5.
* <p>
* <p>
* At the end of the generation process, the data is compressed using a general
* At the end of the generation process, the data is compressed using a general
* purpose compression tool (Deflate / Huffman coding). The uncompressed data is
* purpose compression tool (Deflate / Huffman coding) down to 2.0 bits per key.
* around 2.2 bits per key. With arithmetic coding, about 1.9 bits per key are
* The uncompressed data is around 2.2 bits per key. With arithmetic coding,
* needed.
* about 1.9 bits per key are needed. Generating the hash function takes about
* <p>
* 2.5 second per million keys with 8 cores (multithreaded). At the expense of
* The algorithm automatically scales with the number of available CPUs (using
* processing time, a lower number of bits per key would be possible (for
* as many threads as there are processors).
* example 1.85 bits per key with 33000 keys, using 10 seconds generation time,
* with Huffman coding). The algorithm automatically scales with the number of
* available CPUs (using as many threads as there are processors).
* <p>
* <p>
*
At the expense of processing time, a lower number of bits per key would be
*
The memory usage to efficiently calculate hash values is around 2.5 bits per
*
possible (for example 1.85 bits per key with 33000 keys, using 10 seconds
*
key (the space needed for the uncompressed description, plus 8 bytes for
*
generation time, with Huffman coding
).
*
every top-level bucket
).
* <p>
* <p>
* In-place updating of the hash table is
possible in theory, by patching the
* In-place updating of the hash table is
not implemented but possible in
*
hash function description. This is not implemented
.
*
theory, by patching the hash function description
.
*/
*/
public
class
MinimalPerfectHash
{
public
class
MinimalPerfectHash
{
...
@@ -87,14 +87,14 @@ public class MinimalPerfectHash {
...
@@ -87,14 +87,14 @@ public class MinimalPerfectHash {
private
final
byte
[]
data
;
private
final
byte
[]
data
;
/**
/**
* The
offset of the result of the hash function at the given offset within
* The
size up to the given top-level bucket in the data array. Used to
*
the data array. Used for
calculating the hash of a key.
*
speed up
calculating the hash of a key.
*/
*/
private
final
int
[]
plus
;
private
final
int
[]
topSize
;
/**
/**
* The position of the given top-level bucket in the data array
(in case
* The position of the given top-level bucket in the data array
. Used to
*
this bucket needs to be skipped). Used for
calculating the hash of a key.
*
speed up
calculating the hash of a key.
*/
*/
private
final
int
[]
topPos
;
private
final
int
[]
topPos
;
...
@@ -105,51 +105,51 @@ public class MinimalPerfectHash {
...
@@ -105,51 +105,51 @@ public class MinimalPerfectHash {
*/
*/
public
MinimalPerfectHash
(
byte
[]
desc
)
{
public
MinimalPerfectHash
(
byte
[]
desc
)
{
byte
[]
b
=
data
=
expand
(
desc
);
byte
[]
b
=
data
=
expand
(
desc
);
plus
=
new
int
[
data
.
length
];
for
(
int
pos
=
0
,
p
=
0
;
pos
<
data
.
length
;)
{
plus
[
pos
]
=
p
;
int
n
=
readVarInt
(
b
,
pos
);
pos
+=
getVarIntLength
(
b
,
pos
);
if
(
n
<
2
)
{
p
+=
n
;
}
else
if
(
n
>
SPLIT_MANY
)
{
int
size
=
getSize
(
n
);
p
+=
size
;
}
else
if
(
n
==
SPLIT_MANY
)
{
pos
+=
getVarIntLength
(
b
,
pos
);
}
}
if
(
b
[
0
]
==
SPLIT_MANY
)
{
if
(
b
[
0
]
==
SPLIT_MANY
)
{
int
split
=
readVarInt
(
b
,
1
);
int
split
=
readVarInt
(
b
,
1
);
topSize
=
new
int
[
split
];
topPos
=
new
int
[
split
];
topPos
=
new
int
[
split
];
int
pos
=
1
+
getVarIntLength
(
b
,
1
);
int
pos
=
1
+
getVarIntLength
(
b
,
1
);
int
sizeSum
=
0
;
for
(
int
i
=
0
;
i
<
split
;
i
++)
{
for
(
int
i
=
0
;
i
<
split
;
i
++)
{
topSize
[
i
]
=
sizeSum
;
topPos
[
i
]
=
pos
;
topPos
[
i
]
=
pos
;
pos
=
read
(
pos
);
int
start
=
pos
;
pos
=
getNextPos
(
pos
);
sizeSum
+=
getSizeSum
(
start
,
pos
);
}
}
}
else
{
}
else
{
topSize
=
null
;
topPos
=
null
;
topPos
=
null
;
}
}
}
}
/**
/**
* Calculate the hash
from the
key.
* Calculate the hash
value for the given
key.
*
*
* @param x the key
* @param x the key
* @return the hash
* @return the hash
value
*/
*/
public
int
get
(
int
x
)
{
public
int
get
(
int
x
)
{
return
get
(
0
,
x
,
0
);
return
get
(
0
,
x
,
0
);
}
}
/**
* Get the hash value for the given key, starting at a certain position and level.
*
* @param pos the start position
* @param x the key
* @param level the level
* @return the hash value
*/
private
int
get
(
int
pos
,
int
x
,
int
level
)
{
private
int
get
(
int
pos
,
int
x
,
int
level
)
{
int
n
=
readVarInt
(
data
,
pos
);
int
n
=
readVarInt
(
data
,
pos
);
if
(
n
<
2
)
{
if
(
n
<
2
)
{
return
plus
[
pos
]
;
return
0
;
}
else
if
(
n
>
SPLIT_MANY
)
{
}
else
if
(
n
>
SPLIT_MANY
)
{
int
size
=
getSize
(
n
);
int
size
=
getSize
(
n
);
int
offset
=
getOffset
(
n
,
size
);
int
offset
=
getOffset
(
n
,
size
);
return
plus
[
pos
]
+
hash
(
x
,
level
,
offset
,
size
);
return
hash
(
x
,
level
,
offset
,
size
);
}
}
pos
++;
pos
++;
int
split
;
int
split
;
...
@@ -160,14 +160,66 @@ public class MinimalPerfectHash {
...
@@ -160,14 +160,66 @@ public class MinimalPerfectHash {
split
=
n
;
split
=
n
;
}
}
int
h
=
hash
(
x
,
level
,
0
,
split
);
int
h
=
hash
(
x
,
level
,
0
,
split
);
int
s
;
if
(
level
==
0
&&
topPos
!=
null
)
{
if
(
level
==
0
&&
topPos
!=
null
)
{
s
=
topSize
[
h
];
pos
=
topPos
[
h
];
pos
=
topPos
[
h
];
}
else
{
}
else
{
int
start
=
pos
;
for
(
int
i
=
0
;
i
<
h
;
i
++)
{
for
(
int
i
=
0
;
i
<
h
;
i
++)
{
pos
=
read
(
pos
);
pos
=
getNextPos
(
pos
);
}
}
s
=
getSizeSum
(
start
,
pos
);
}
}
return
get
(
pos
,
x
,
level
+
1
);
return
s
+
get
(
pos
,
x
,
level
+
1
);
}
/**
* Get the position of the next sibling.
*
* @param pos the position of this branch
* @return the position of the next sibling
*/
private
int
getNextPos
(
int
pos
)
{
int
n
=
readVarInt
(
data
,
pos
);
pos
+=
getVarIntLength
(
data
,
pos
);
if
(
n
<
2
||
n
>
SPLIT_MANY
)
{
return
pos
;
}
int
split
;
if
(
n
==
SPLIT_MANY
)
{
split
=
readVarInt
(
data
,
pos
);
pos
+=
getVarIntLength
(
data
,
pos
);
}
else
{
split
=
n
;
}
for
(
int
i
=
0
;
i
<
split
;
i
++)
{
pos
=
getNextPos
(
pos
);
}
return
pos
;
}
/**
* The sum of the sizes between the start and end position.
*
* @param start the start position
* @param end the end position (excluding)
* @return the sizes
*/
private
int
getSizeSum
(
int
start
,
int
end
)
{
int
s
=
0
;
for
(
int
pos
=
start
;
pos
<
end
;)
{
int
n
=
readVarInt
(
data
,
pos
);
pos
+=
getVarIntLength
(
data
,
pos
);
if
(
n
<
2
)
{
s
+=
n
;
}
else
if
(
n
>
SPLIT_MANY
)
{
s
+=
getSize
(
n
);
}
else
if
(
n
==
SPLIT_MANY
)
{
pos
+=
getVarIntLength
(
data
,
pos
);
}
}
return
s
;
}
}
private
static
void
writeSizeOffset
(
ByteArrayOutputStream
out
,
int
size
,
private
static
void
writeSizeOffset
(
ByteArrayOutputStream
out
,
int
size
,
...
@@ -188,25 +240,6 @@ public class MinimalPerfectHash {
...
@@ -188,25 +240,6 @@ public class MinimalPerfectHash {
return
0
;
return
0
;
}
}
private
int
read
(
int
pos
)
{
int
n
=
readVarInt
(
data
,
pos
);
pos
+=
getVarIntLength
(
data
,
pos
);
if
(
n
<
2
||
n
>
SPLIT_MANY
)
{
return
pos
;
}
int
split
;
if
(
n
==
SPLIT_MANY
)
{
split
=
readVarInt
(
data
,
pos
);
pos
+=
getVarIntLength
(
data
,
pos
);
}
else
{
split
=
n
;
}
for
(
int
i
=
0
;
i
<
split
;
i
++)
{
pos
=
read
(
pos
);
}
return
pos
;
}
/**
/**
* Generate the minimal perfect hash function data from the given set of
* Generate the minimal perfect hash function data from the given set of
* integers.
* integers.
...
@@ -340,7 +373,7 @@ public class MinimalPerfectHash {
...
@@ -340,7 +373,7 @@ public class MinimalPerfectHash {
* @return the hash (a value between 0, including, and the size, excluding)
* @return the hash (a value between 0, including, and the size, excluding)
*/
*/
private
static
int
hash
(
int
x
,
int
level
,
int
offset
,
int
size
)
{
private
static
int
hash
(
int
x
,
int
level
,
int
offset
,
int
size
)
{
x
+=
level
*
16
+
offset
;
x
+=
level
+
offset
*
16
;
x
=
((
x
>>>
16
)
^
x
)
*
0x45d9f3b
;
x
=
((
x
>>>
16
)
^
x
)
*
0x45d9f3b
;
x
=
((
x
>>>
16
)
^
x
)
*
0x45d9f3b
;
x
=
((
x
>>>
16
)
^
x
)
*
0x45d9f3b
;
x
=
(
x
>>>
16
)
^
x
;
x
=
(
x
>>>
16
)
^
x
;
...
...
This diff is collapsed.
Click to expand it.
编写
预览
Markdown
格式
0%
重试
或
添加新文件
添加附件
取消
您添加了
0
人
到此讨论。请谨慎行事。
请先完成此评论的编辑!
取消
请
注册
或者
登录
后发表评论