-
Notifications
You must be signed in to change notification settings - Fork 340
Description
Hi Everyone,
I'm using the cp-kafka-connect-base:7.50
docker image with the kafka-connect-s3:10.5.13
plugin installed inside it. We can write data in Parquet format from Kafka topic to S3.
When I'm trying to use the configuration parquet.code: zstd
for writing the data in a compressed way, I'm facing an error. With a stack trace -
"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:618)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:336)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:237)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:206)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:204)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:259)\n\tat org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:181)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.lang.RuntimeException: native zStandard library not available: this version of libhadoop was built without zstd support.\n\tat org.apache.hadoop.io.compress.ZStandardCodec.checkNativeCodeLoaded(ZStandardCodec.java:65)\n\tat org.apache.hadoop.io.compress.ZStandardCodec.getCompressorType(ZStandardCodec.java:153)\n\tat org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)\n\tat org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168)\n\tat org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:144)\n\tat org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)\n\tat org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)\n\tat org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:287)\n\tat org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:564)\n\tat io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider$1.write(ParquetRecordWriterProvider.java:102)\n\tat io.confluent.connect.s3.format.S3RetriableRecordWriter.write(S3RetriableRecordWriter.java:51)\n\tat io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.write(KeyValueHeaderRecordWriterProvider.java:114)\n\tat io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:592)\n\tat io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:327)\n\tat io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:267)\n\tat io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:218)\n\tat io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:244)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)\n\t...
I see there is already an issue raised earlier - #570
Question -
- Looking at the pom.xml file, it is mentioned Hadoop Version 3.3.6 is used and libhadoop.so.1.0.0 is already zstd compatible. Then why we are facing this issue with the latest version?
- Even after replicating the solution mentioned by one of the users (github-louis-fruleux), I'm still facing the same issue.
P.S. What I did was
- Downloaded - https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
- Unzip and copy libhadoop.so.1.0.0 in the docker image
- Dockerfile -
FROM confluentinc/cp-kafka-connect-base:7.50 ENV CONNECT_PLUGIN_PATH=/usr/share/java,/usr/share/confluent-hub-components RUN confluent-hub install --no-prompt confluentinc/kafka-connect-s3:10.5.13 COPY libhadoop.so.1.0.0 /usr/lib64/libhadoop.so.1.0.0 COPY scripts/libhadoop.so.1.0.0 /usr/lib64/libhadoop.so RUN ls -ltr /usr/lib64/ ENV KAFKA_OPTS="${KAFKA_OPTS} -Djava.library.path=/usr/lib64/" RUN echo "KAFKA_OPTS value: $KAFKA_OPTS" CMD ["/etc/confluent/docker/run"]
How should I fix this issue?