600Million record table load failure issue in Airbyte with MSSQL connector #66554
Unanswered
mithleshcsahu
asked this question in
Connector Questions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Airbyte Team,
i am using Airbyte open source(1.4.0v) in kubernetes cluster. where running one job of Full Refresh | Overwrite mode with mssql db connector for a table with size of 600Million record. earlier it was scheduling on a pod with 13G of memory limit, but failing. Now we have increased the limit to 42G of memory with 4000CPU and still failing. main issue is like it is consuming all the memory and after loading some amount of data( like: 100 million) it is getting failed and starting again from 0 value (retry attempt).
same table in another region which has 50 million record, was failing on 13G memory pod but with 42G pod it has got successfully loaded. so if we add more memory then obviously it will get succeeded but what if we will have to load the Billions of records then we cant get more resource, need a way where we can manage N number of records within a given resource limit.
I tried changing the values at source config and destination config, but still memory usage is reaching to maximum.
source config: MSSQL DB and CDC enabled.
size of queue = reduced the value from 10,000 --> 5000 --> 2000 --> 1000 --> 500 and now it is having only 100 value.
Destination config: s3 bucket using parquet format.
Block size = tried value 512, 256, 128, 65, 32, 16.
compression code: snappy
Dictionary page size(KB) = 1024 --> 512 --> 256 --> 128
page size = 1024 --> 512 --> 256 --> 128
max padding size(KB) = 16 --> 8
replication job pod configuration:
CPU : request = 500m, limit = 4000m
MEMORY : request = 1.5Gi, limit = 42Gi
Please guide me to load this 600 million table data successfully in S3 bucket.
Beta Was this translation helpful? Give feedback.
All reactions