Unable to use Zstd compression with parquet 

Hi Everyone,

I'm using the `cp-kafka-connect-base:7.50` docker image with the `kafka-connect-s3:10.5.13` plugin installed inside it. We can write data in Parquet format from Kafka topic to S3. 
When I'm trying to use the configuration `parquet.code: zstd` for writing the data in a compressed way, I'm facing an error. With a stack trace - 
`"org.apache.kafka.connect.errors.ConnectException: Exiting WorkerSinkTask due to unrecoverable exception.\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:618)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:336)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:237)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:206)\n\tat org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:204)\n\tat org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:259)\n\tat org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:181)\n\tat java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)\n\tat java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)\n\tat java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)\n\tat java.base/java.lang.Thread.run(Thread.java:829)\nCaused by: java.lang.RuntimeException: native zStandard library not available: this version of libhadoop was built without zstd support.\n\tat org.apache.hadoop.io.compress.ZStandardCodec.checkNativeCodeLoaded(ZStandardCodec.java:65)\n\tat org.apache.hadoop.io.compress.ZStandardCodec.getCompressorType(ZStandardCodec.java:153)\n\tat org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:150)\n\tat org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168)\n\tat org.apache.parquet.hadoop.CodecFactory$HeapBytesCompressor.<init>(CodecFactory.java:144)\n\tat org.apache.parquet.hadoop.CodecFactory.createCompressor(CodecFactory.java:206)\n\tat org.apache.parquet.hadoop.CodecFactory.getCompressor(CodecFactory.java:189)\n\tat org.apache.parquet.hadoop.ParquetWriter.<init>(ParquetWriter.java:287)\n\tat org.apache.parquet.hadoop.ParquetWriter$Builder.build(ParquetWriter.java:564)\n\tat io.confluent.connect.s3.format.parquet.ParquetRecordWriterProvider$1.write(ParquetRecordWriterProvider.java:102)\n\tat io.confluent.connect.s3.format.S3RetriableRecordWriter.write(S3RetriableRecordWriter.java:51)\n\tat io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.write(KeyValueHeaderRecordWriterProvider.java:114)\n\tat io.confluent.connect.s3.TopicPartitionWriter.writeRecord(TopicPartitionWriter.java:592)\n\tat io.confluent.connect.s3.TopicPartitionWriter.checkRotationOrAppend(TopicPartitionWriter.java:327)\n\tat io.confluent.connect.s3.TopicPartitionWriter.executeState(TopicPartitionWriter.java:267)\n\tat io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:218)\n\tat io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:244)\n\tat org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:587)\n\t... `

I see there is already an issue raised earlier - https://github.com/confluentinc/kafka-connect-storage-cloud/issues/570

Question - 
1. Looking at the pom.xml file, it is mentioned Hadoop Version 3.3.6  is used and libhadoop.so.1.0.0 is already zstd compatible. Then why we are facing this issue with the latest version?
2. Even after replicating the solution mentioned by one of the users (github-louis-fruleux), I'm still facing the same issue. 
P.S. What I did was 
- Downloaded - https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
- Unzip and copy libhadoop.so.1.0.0 in the docker image 
- Dockerfile - `FROM confluentinc/cp-kafka-connect-base:7.50
ENV CONNECT_PLUGIN_PATH=/usr/share/java,/usr/share/confluent-hub-components
RUN confluent-hub install --no-prompt confluentinc/kafka-connect-s3:10.5.13
COPY libhadoop.so.1.0.0 /usr/lib64/libhadoop.so.1.0.0
COPY scripts/libhadoop.so.1.0.0 /usr/lib64/libhadoop.so
RUN ls -ltr /usr/lib64/
ENV KAFKA_OPTS="${KAFKA_OPTS} -Djava.library.path=/usr/lib64/"
RUN echo "KAFKA_OPTS value: $KAFKA_OPTS"
CMD ["/etc/confluent/docker/run"]
`

How should I fix this issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unable to use Zstd compression with parquet #764

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unable to use Zstd compression with parquet #764

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions