Skip to content

Commit 049abba

Browse files
[PUBLISHER] Merge #155
1 parent 8e4122a commit 049abba

File tree

1 file changed

+252
-0
lines changed

1 file changed

+252
-0
lines changed
Lines changed: 252 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,252 @@
1+
---
2+
title: 傲来大数据方向HBase组优化日报-Day7
3+
date: 2025-05-14 14:01:42 +08:00
4+
filename: 2025-05-14-EulixOS-HBase-Day7
5+
categories:
6+
- Study
7+
- HBase
8+
tags:
9+
- BigData
10+
- DataBase
11+
- EulixOS
12+
dir: Study/HBase
13+
share: true
14+
---
15+
尝试使用JNI修改布隆过滤器部分组件,验证可行性
16+
17+
以下命令分别是在MacOS和Linux平台上生成对应动态链接库的命令,后续可能考虑要通过cmake构建动态链接库。
18+
19+
```shell
20+
clang++ -shared -std=c++11 \
21+
-I"$JAVA_HOME/include" \
22+
-I"$JAVA_HOME/include/darwin" \
23+
-o hbase-server/src/main/native/libhbase_bloom_filter_jni.dylib \
24+
hbase-server/src/main/native/BloomFilterUtilJni.cpp
25+
```
26+
27+
```
28+
g++ -shared -std=c++11 -fPIC \
29+
-I"$JAVA_HOME/include" \
30+
-I"$JAVA_HOME/include/linux" \
31+
-o hbase-server/src/main/native/libhbase_bloom_filter_jni.so \
32+
hbase-server/src/main/native/BloomFilterUtilJni.cpp
33+
```
34+
35+
如果java代码没有涉及到更改的话,只更改cpp代码是不需要用mvn重新构建的,只需要在此运行上述命令即可。(毕竟动态链接库)
36+
37+
在hbase-server的pom.xml中添加头文件自动生成的字段(第一个plugin包裹),还需要动态的指定库文件的path(第二个plugin包裹)(重新生成头文件需要先用hbase-server里的pom clean一下 再compile)
38+
39+
```xml
40+
<plugin>
41+
<groupId>org.apache.maven.plugins</groupId>
42+
<artifactId>maven-compiler-plugin</artifactId>
43+
<configuration>
44+
<compilerArgs>
45+
<arg>-h</arg>
46+
<arg>/root/hb/hbase-2.5.11/hbase-server/src/main/native</arg> <!-- 或者希望的输出目录 -->
47+
</compilerArgs>
48+
</configuration>
49+
</plugin>
50+
<plugin>
51+
<groupId>org.apache.maven.plugins</groupId>
52+
<artifactId>maven-surefire-plugin</artifactId>
53+
<configuration>
54+
<argLine>-Djava.library.path=/root/hb/hbase-2.5.11/hbase-server/src/main/native</argLine>
55+
<!-- 或者 -->
56+
<!-- <systemPropertyVariables>
57+
<java.library.path>/root/hb/hbase-2.5.11/hbase-server/src/main/native</java.library.path>
58+
</systemPropertyVariables> -->
59+
</configuration>
60+
</plugin>
61+
```
62+
63+
64+
```shell
65+
//conf/hbase-env.sh
66+
export HBASE_OPTS="-Djava.library.path=/root/hb/hbase-2.5.11/hbase-server/src/main/native"
67+
```
68+
69+
记得加上我们的共享链接库路径,不然运行起来会报如下的错误
70+
71+
```shell
72+
ERROR: Zookeeper GET could not be completed in 10000 ms
73+
```
74+
75+
76+
记得指定rootdir和tmpdir,方便每次测试完之后清除数据,不然在多版本测试中会出现缓存没更新导致的错误。
77+
78+
```xml
79+
<property>
80+
<name>hbase.tmp.dir</name>
81+
<value>/root/hb/hbase-2.5.11/hbase-data/tmp</value>
82+
</property>
83+
<property>
84+
<name>hbase.master.hostname</name>
85+
<value>localhost</value>
86+
</property>
87+
<property>
88+
<name>hbase.regionserver.hostname</name>
89+
<value>localhost</value>
90+
</property>
91+
<property>
92+
<name>hbase.rootdir</name>
93+
<value>file:///root/hb/hbase-2.5.11/hbase-data/hbase</value>
94+
</property>
95+
```
96+
97+
同样的,也需要指定HBase运行的hostname,不然在docker中运行,很可能会错误的解析hostname为docker内的网卡配置。
98+
##
99+
100+
### 坑点 1:HBase 服务注册的主机名与实际监听地址不一致或客户端无法正确访问
101+
102+
- 具体表现与报错信息:
103+
104+
1. HBase Performance Evaluation (PE) 工具启动后,长时间卡住,最终报错 hbase:meta,,1 is not online on legion-eulix,...。
105+
106+
```shell
107+
2025-05-14T09:07:54,037 INFO [main] client.RpcRetryingCallerImpl: ion: hbase:meta,,1 is not online on localhost,16020,1747213592524
108+
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegionByEncodedName(HRegionServer.java:3539)
109+
at org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:3517)
110+
at org.apache.hadoop.hbase.regionserver.RSRpcServices.getRegion(RSRpcServices.java:1489)
111+
at org.apache.hadoop.hbase.regionserver.RSRpcServices.get(RSRpcServices.java:2556)
112+
at org.apache.hadoop.hbase.shaded.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:45002)
113+
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:415)
114+
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:124)
115+
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:102)
116+
at org.apache.hadoop.hbase.ipc.RpcHandler.run(RpcHandler.java:82)
117+
, details=row 'TestTable' on table 'hbase:meta' at region=hbase:meta,,1.1588230740, hostname=legion-eulix,16020,1747209674622, seqNum=-1, see https://s.apache.org/timeout
118+
```
119+
120+
121+
2. 后续的错误日志显示客户端尝试连接 Legion-Eulix/172.17.0.2:16020 但连接被拒绝 (Connection refused)。
122+
123+
```shell
124+
2025-05-14T09:16:44,494 WARN [RPCClient-NioEventLoopGroup-1-1] ipc.NettyRpcConnection: Exception encountered while connecting to the server Legion-Eulix:16020
125+
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: Legion-Eulix/172.17.0.2:16020
126+
Caused by: java.net.ConnectException: Connection refused
127+
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) ~[?:1.8.0_452]
128+
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716) ~[?:1.8.0_452]
129+
at org.apache.hbase.thirdparty.io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:337) ~[hbase-shaded-netty-4.1.5.jar:?]
130+
at org.apache.hbase.thirdparty.io.netty.channel.nio.AbstractNioChannel$AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:334) ~[hbase-shaded-netty-4.1.5.jar:?]
131+
at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:776) ~[hbase-shaded-netty-4.1.5.jar:?]
132+
at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[hbase-shaded-netty-4.1.5.jar:?]
133+
at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[hbase-shaded-netty-4.1.5.jar:?]
134+
at org.apache.hbase.thirdparty.io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[hbase-shaded-netty-4.1.5.jar:?]
135+
at org.apache.hbase.thirdparty.io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[hbase-shaded-netty-4.1.5.jar:?]
136+
at org.apache.hbase.thirdparty.io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[hbase-shaded-netty-4.1.5.jar:?]
137+
at org.apache.hbase.thirdparty.io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[hbase-shaded-netty-4.1.5.jar:?]
138+
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_452]
139+
```
140+
141+
142+
3. netstat 命令显示 HBase RegionServer (Java 进程) 监听在 172.17.0.2:16020。
143+
144+
```shell
145+
root@Legion-Eulix [09:14:39] [~/hb/hbase-cmp]
146+
-> # netstat -npl|grep 16020
147+
tcp6 0 0 172.17.0.2:16020 :::* LISTEN 121562/java
148+
```
149+
150+
- 原因分析:
151+
152+
- HBase Master 和 RegionServer 在启动时,如果没有在 conf/hbase-site.xml 中明确配置 hbase.master.hostname 和 hbase.regionserver.hostname,它们会尝试自动检测主机名。在这个案例中,它们可能将主机名解析为 legion-eulix,对应 IP 172.17.0.2,并以此地址向 Zookeeper 注册。
153+
154+
- 客户端从 Zookeeper 获取到的是 legion-eulix:16020 这个地址。
155+
156+
- 如果 HBase 服务实际监听的 IP 接口与客户端尝试连接的 IP 不符,或者该 IP 对于客户端不可达(例如,172.17.0.2 是一个内部 Docker/WSL 网络 IP,而客户端从另一个网络视角访问),就会导致连接失败。
157+
158+
- 解决方案:
159+
160+
1. 在 conf/hbase-site.xml 文件中,明确指定 HBase Master 和 RegionServer 绑定的主机名。对于单机部署,推荐使用 localhost:
161+
162+
```xml
163+
<property>
164+
<name>hbase.master.hostname</name>
165+
<value>localhost</value>
166+
</property>
167+
<property>
168+
<name>hbase.regionserver.hostname</name>
169+
<value>localhost</value>
170+
</property>
171+
```
172+
173+
174+
2. 彻底停止并重启 HBase 服务 (./bin/stop-hbase.sh 后接 ./bin/start-hbase.sh),以确保 HBase 使用新的配置启动并向 Zookeeper 注册。
175+
176+
### 坑点 2:Zookeeper 中存在陈旧的 HBase元数据
177+
178+
- 具体表现与报错信息:
179+
180+
- 即使 HBase 服务通过 netstat 确认已在 127.0.0.1:16000 (Master) 和 127.0.0.1:16020 (RegionServer) 上正确监听,PE 工具的错误日志依然显示它尝试连接旧的、错误的主机名 Legion-Eulix/172.17.0.2:16020。
181+
182+
```shell
183+
2025-05-14T09:16:44,494 WARN [RPCClient-NioEventLoopGroup-1-1] ipc.NettyRpcConnection: Exception encountered while connecting to the server Legion-Eulix:16020
184+
org.apache.hbase.thirdparty.io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: Legion-Eulix/172.17.0.2:16020
185+
```
186+
187+
- 错误详情中仍然包含旧的主机名信息:hostname=legion-eulix,16020,1747209674622。
188+
189+
- 原因分析:
190+
191+
- 客户端通过 Zookeeper 查找 hbase:meta Region 的位置。
192+
193+
- 如果 HBase 服务在配置更改(例如,从 legion-eulix 改为 localhost)后重启,但 Zookeeper 中关于 hbase:meta Region Server 的信息没有被正确更新或清除,Zookeeper 就会向客户端返回这些陈旧的、错误的地址信息。
194+
195+
- 解决方案:
196+
197+
1. 清理 HBase 的临时数据目录 hbase.tmp.dir (您配置的路径为 /root/hb/hbase-cmp/hbase-data/tmp) 和数据目录 hbase.rootdir(如果这是全新的测试并且可以容忍数据丢失)。这有助于确保一个完全干净的启动。
198+
199+
### 坑点 3:测试成功前出现 "NoNode for /hbase/hbaseid" 或 "No meta znode available"
200+
201+
- 具体表现与报错信息:
202+
203+
在最终成功的测试运行日志中,开头部分出现了类似以下的警告:
204+
```shell
205+
2025-05-14T09:18:28,818 WARN [main] client.ConnectionImplementation: Retrieve cluster id failed
206+
java.util.concurrent.ExecutionException: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid
207+
at java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:357) ~[?:1.8.0_452]
208+
at java.util.concurrent.CompletableFuture.get(CompletableFuture.java:1908) ~[?:1.8.0_452]
209+
at org.apache.hadoop.hbase.client.ConnectionImplementation.retrieveClusterId(ConnectionImplementation.java:666) ~[hbase-client-2.5.11.jar:2.5.11]
210+
at org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:325) ~[hbase-client-2.5.11.jar:2.5.11]
211+
at org.apache.hadoop.hbase.client.ConnectionImplementation.(ConnectionImplementation.java:272) ~[hbase-client-2.5.11.jar:2.5.11]
212+
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_452]
213+
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_452]
214+
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_452]
215+
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_452]
216+
at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$null$0(ConnectionFactory.java:233) ~[hbase-client-2.5.11.jar:2.5.11]
217+
at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_452]
218+
at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_452]
219+
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1938) ~[hadoop-common-2.10.2.jar:?]
220+
at org.apache.hadoop.hbase.security.User$SecureHadoopUser.runAs(User.java:328) ~[hbase-common-2.5.11.jar:2.5.11]
221+
at org.apache.hadoop.hbase.client.ConnectionFactory.lambda$createConnection$1(ConnectionFactory.java:232) ~[hbase-client-2.5.11.jar:2.5.11]
222+
at org.apache.hadoop.hbase.trace.TraceUtil.trace(TraceUtil.java:216) ~[hbase-common-2.5.11.jar:2.5.11]
223+
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:218) ~[hbase-client-2.5.11.jar:2.5.11]
224+
at org.apache.hadoop.hbase.client.ConnectionFactory.createConnection(ConnectionFactory.java:131) ~[hbase-client-2.5.11.jar:2.5.11]
225+
at org.apache.hadoop.hbase.PerformanceEvaluation.runTest(PerformanceEvaluation.java:2589) ~[hbase-mapreduce-2.5.11-tests.jar:2.5.11]
226+
at org.apache.hadoop.hbase.PerformanceEvaluation.run(PerformanceEvaluation.java:3137) ~[hbase-mapreduce-2.5.11-tests.jar:2.5.11]
227+
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) ~[hadoop-common-2.10.2.jar:?]
228+
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) ~[hadoop-common-2.10.2.jar:?]
229+
at org.apache.hadoop.hbase.PerformanceEvaluation.main(PerformanceEvaluation.java:3171) ~[hbase-mapreduce-2.5.11-tests.jar:2.5.11]
230+
Caused by: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/hbaseid
231+
at org.apache.zookeeper.KeeperException.create(KeeperException.java:118) ~[zookeeper-3.8.4.jar:3.8.4]
232+
at org.apache.zookeeper.KeeperException.create(KeeperException.java:54) ~[zookeeper-3.8.4.jar:3.8.4]
233+
at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient$ZKTask$1.exec(ReadOnlyZKClient.java:185) ~[hbase-client-2.5.11.jar:2.5.11]
234+
at org.apache.hadoop.hbase.zookeeper.ReadOnlyZKClient.run(ReadOnlyZKClient.java:386) ~[hbase-client-2.5.11.jar:2.5.11]
235+
at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_452]
236+
2025-05-14T09:18:33,158 INFO [main] client.RpcRetryingCallerImpl: Call exception, tries=6, retries=16, started=4206 ms ago, cancelled=false, msg=No meta znode available, details=row 'TestTable' on table 'hbase:meta' at null, see https://s.apache.org/timeout
237+
```
238+
- 原因分析:
239+
240+
- 这些信息通常出现在 HBase Master 正在初始化或刚刚完成 Zookeeper 状态重建的过程中。当客户端(PE 工具)尝试连接时,HBase Master 可能尚未完全在 Zookeeper 中创建 /hbase/hbaseid(集群唯一标识)或 /hbase/meta-region-server(指向 hbase:meta 表所在的 RegionServer)等关键 znode。
241+
242+
- 这在 HBase 刚启动或 Zookeeper 数据被清理后首次启动时是正常的,客户端会进行重试。
243+
244+
- 解决方案:
245+
246+
- 这通常不是一个需要用户直接干预的“错误”,而是 HBase 启动过程中的一个暂时状态。客户端的重试机制最终会成功连接。
247+
248+
- 确保 HBase Master 有足够的时间完成其在 Zookeeper 中的初始化过程。在自动化脚本中,可以在启动 HBase 后加入短暂的等待时间,然后再运行客户端程序。
249+
250+
### 验证JNI正确的被调用
251+
252+
在CPP文件中添加一些DEBUG信息,然后再去看LOG,如果有对应LOG产生,那么便可说明正确的被启用。

0 commit comments

Comments
 (0)