600Million record table load failure issue in Airbyte with MSSQL connector #66554

mithleshcsahu · 2025-09-20T07:34:05Z

mithleshcsahu
Sep 20, 2025

Hi Airbyte Team,

i am using Airbyte open source(1.4.0v) in kubernetes cluster. where running one job of Full Refresh | Overwrite mode with mssql db connector for a table with size of 600Million record. earlier it was scheduling on a pod with 13G of memory limit, but failing. Now we have increased the limit to 42G of memory with 4000CPU and still failing. main issue is like it is consuming all the memory and after loading some amount of data( like: 100 million) it is getting failed and starting again from 0 value (retry attempt).
same table in another region which has 50 million record, was failing on 13G memory pod but with 42G pod it has got successfully loaded. so if we add more memory then obviously it will get succeeded but what if we will have to load the Billions of records then we cant get more resource, need a way where we can manage N number of records within a given resource limit.

I tried changing the values at source config and destination config, but still memory usage is reaching to maximum.

source config: MSSQL DB and CDC enabled.
size of queue = reduced the value from 10,000 --> 5000 --> 2000 --> 1000 --> 500 and now it is having only 100 value.
Destination config: s3 bucket using parquet format.
Block size = tried value 512, 256, 128, 65, 32, 16.
compression code: snappy
Dictionary page size(KB) = 1024 --> 512 --> 256 --> 128
page size = 1024 --> 512 --> 256 --> 128
max padding size(KB) = 16 --> 8
replication job pod configuration:
CPU : request = 500m, limit = 4000m
MEMORY : request = 1.5Gi, limit = 42Gi

How can we load these 600Million records within given memory limit using open source airbyte(1.4.0v) without any failure, please suggest the best possible ways.
What could be the best values for these config so that data load wont get failed even though it uses maximum memory.
Is there any other config we have in airbyte to reduce the batch size at source end, if yes then how can we enable it.
Please guide me to load this 600 million table data successfully in S3 bucket.
Can we adjust the fetch/batch size in source connector while pulling the data from db side via Airbyte, so that data fetching will be in a pace and data loading to destination will be in a pace and there wont be any load on memory.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

600Million record table load failure issue in Airbyte with MSSQL connector #66554

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

600Million record table load failure issue in Airbyte with MSSQL connector #66554

Uh oh!

mithleshcsahu Sep 20, 2025

Replies: 0 comments

mithleshcsahu
Sep 20, 2025