Skip to content

Retry on S3 multipart upload failure #855

@timchenko-a

Description

@timchenko-a

Connector version: 10.5.23
Issue itself looks very similar to #81, but it was marked as resolved a long time ago.

Sometimes connector tasks fail with the next exception:

org.apache.kafka.connect.errors.ConnectException: We encountered an internal error. Please try again. (Service: Amazon S3; Status Code: 200; Error Code: InternalError; Request ID: W5B15QEASQ408SWZ; S3 Extended Request ID: Q0405gasPFau+6r5Jod0XJgj84dDHs6A1p7iHzEtSObprUKBab/KbdFyDGj9RN3BBXPz++jEgRPlGUfyHJezQDkl5q56pp4ec4ikEQQ1z4U=; Proxy: null)
	at io.confluent.connect.s3.util.S3ErrorUtils.throwConnectException(S3ErrorUtils.java:84)
	at io.confluent.connect.s3.format.S3RetriableRecordWriter.commit(S3RetriableRecordWriter.java:62)
	at java.base/java.util.Optional.ifPresent(Optional.java:178)
	at io.confluent.connect.s3.format.KeyValueHeaderRecordWriterProvider$1.commit(KeyValueHeaderRecordWriterProvider.java:139)
	at io.confluent.connect.s3.TopicPartitionWriter.commitFile(TopicPartitionWriter.java:736)
	at io.confluent.connect.s3.TopicPartitionWriter.commitFiles(TopicPartitionWriter.java:705)
	at io.confluent.connect.s3.TopicPartitionWriter.commitOnTimeIfNoData(TopicPartitionWriter.java:424)
	at io.confluent.connect.s3.TopicPartitionWriter.write(TopicPartitionWriter.java:242)
	at io.confluent.connect.s3.S3SinkTask.put(S3SinkTask.java:258)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.deliverMessages(WorkerSinkTask.java:606)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:345)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:247)
	at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:216)
	at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:226)
	at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:281)
	at org.apache.kafka.connect.runtime.isolation.Plugins.lambda$withClassLoader$1(Plugins.java:238)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
	at java.base/java.lang.Thread.run(Thread.java:840)
Caused by: java.io.IOException: We encountered an internal error. Please try again. (Service: Amazon S3; Status Code: 200; Error Code: InternalError; Request ID: W5B15QEASQ408SWZ; S3 Extended Request ID: Q0405gasPFau+6r5Jod0XJgj84dDHs6A1p7iHzEtSObprUKBab/KbdFyDGj9RN3BBXPz++jEgRPlGUfyHJezQDkl5q56pp4ec4ikEQQ1z4U=; Proxy: null)
	at io.confluent.connect.s3.storage.S3OutputStream.handleAmazonExceptions(S3OutputStream.java:255)
	at io.confluent.connect.s3.storage.S3OutputStream.access$800(S3OutputStream.java:53)
	at io.confluent.connect.s3.storage.S3OutputStream$MultipartUpload.complete(S3OutputStream.java:317)
	at io.confluent.connect.s3.storage.S3OutputStream.commit(S3OutputStream.java:175)
	at io.confluent.connect.s3.format.avro.AvroRecordWriterProvider$1.commit(AvroRecordWriterProvider.java:111)
	at io.confluent.connect.s3.format.S3RetriableRecordWriter.commit(S3RetriableRecordWriter.java:60)
	... 19 more
Caused by: com.amazonaws.services.s3.model.AmazonS3Exception: We encountered an internal error. Please try again. (Service: Amazon S3; Status Code: 200; Error Code: InternalError; Request ID: W5B15QEASQ408SWZ; S3 Extended Request ID: Q0405gasPFau+6r5Jod0XJgj84dDHs6A1p7iHzEtSObprUKBab/KbdFyDGj9RN3BBXPz++jEgRPlGUfyHJezQDkl5q56pp4ec4ikEQQ1z4U=; Proxy: null), S3 Extended Request ID: Q0405gasPFau+6r5Jod0XJgj84dDHs6A1p7iHzEtSObprUKBab/KbdFyDGj9RN3BBXPz++jEgRPlGUfyHJezQDkl5q56pp4ec4ikEQQ1z4U=
	at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser$CompleteMultipartUploadHandler.doEndElement(XmlResponsesSaxParser.java:1915)
	at com.amazonaws.services.s3.model.transform.AbstractHandler.endElement(AbstractHandler.java:52)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.endElement(AbstractSAXParser.java:618)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanEndElement(XMLDocumentFragmentScannerImpl.java:1728)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl$FragmentContentDriver.next(XMLDocumentFragmentScannerImpl.java:2899)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(XMLDocumentScannerImpl.java:605)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(XMLNSDocumentScannerImpl.java:112)
	at java.xml/com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(XMLDocumentFragmentScannerImpl.java:542)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:889)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(XML11Configuration.java:825)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(XMLParser.java:141)
	at java.xml/com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1224)
	at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseXmlInputStream(XmlResponsesSaxParser.java:168)
	at com.amazonaws.services.s3.model.transform.XmlResponsesSaxParser.parseCompleteMultipartUploadResponse(XmlResponsesSaxParser.java:503)
	at com.amazonaws.services.s3.model.transform.Unmarshallers$CompleteMultipartUploadResultUnmarshaller.unmarshall(Unmarshallers.java:316)
	at com.amazonaws.services.s3.model.transform.Unmarshallers$CompleteMultipartUploadResultUnmarshaller.unmarshall(Unmarshallers.java:313)
	at com.amazonaws.services.s3.internal.S3XmlResponseHandler.handle(S3XmlResponseHandler.java:62)
	at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:44)
	at com.amazonaws.services.s3.internal.ResponseHeaderHandlerChain.handle(ResponseHeaderHandlerChain.java:30)
	at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:69)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1795)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleSuccessResponse(AmazonHttpClient.java:1477)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1384)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1157)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:814)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:781)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:755)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:715)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:697)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:561)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:541)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5558)
	at com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:5505)
	at com.amazonaws.services.s3.AmazonS3Client.completeMultipartUpload(AmazonS3Client.java:3695)
	at io.confluent.connect.s3.storage.S3OutputStream$MultipartUpload.lambda$complete$0(S3OutputStream.java:318)
	at io.confluent.connect.s3.storage.S3OutputStream.handleAmazonExceptions(S3OutputStream.java:253)
	... 24 more

Restarting tasks helps to fix the issue.
Related connector config:

s3.part.retries = 6
s3.part.size = 5242880
s3.retry.backoff.ms = 5000

Any idea why retry might not fire?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions