Replies: 1 comment 1 reply
-
|
maybe its worth having a look at ducklakes data inlining feature: |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Problem
Transaction throughput is largely limited by latency to object storage. Transactions and manifests need to be written sequentially. And when there are conflicts, we need to do a retry loop. In practice, we can only achieve 1-4 transactions per second on object stores.
When there are writers across different processes, this is unavoidable. But often we have many writers in the same process. This might be a server handling requests to Lance, or the same embedded client writing to a local dataset. If we could buffer and combine transactions within each process, we can reduce the physical TPS and concurrency.
Idea
Instead of eagerly writing out files, we can buffer writes and commits in memory. Then flush periodically. Flushing consists of merging all buffered transactions into one, and then committing that transaction. The flush period can be short (like 1 or 2s) so that we can wait to return success to the caller until after the flush has happened. Thus, even though we only write out 1 TPS, we can handle many more from the user.
To put it another way, if a user does N appends in a single process concurrently, we should be able to coalesce those in memory and write them as a single transaction.
Example
Consider 3 single-row upsert operations (that affect different rows).
Currently, each has to:
4. (Actually it's worse than this: during conflict resolution we need to merge these deletion files again. So two of these transactions will need to rewrite their deletion files, for a total of 5 deletion files written. But we ignore this for simplicity.)
If we buffered and merged the transactions:
This means the following changes:
Where to put the buffer
Attached to a
Sessionobject, we can hold a central committer that holds the buffer.Merging transactions
We currently have basic facilities for merging trasnactions. See CommitBuilder::commit_batch(). We can use the
ConflictResolverto make this work for upsert, update, and delete transactions as well.Deferring deletion file writes
If we write out deletion files and have to read them and merge them later, this can slow us down. One option we should add is deferring writing the deletion file(s) until during the commit step. The
RoaringBitmapscan just be passed to the committer, just like theaffected_rowsparameter.Deferring data file writes
The other problem that comes with allowing higher throughput writes is we will create many small files. Unless compaction can keep up, this will makes scans slower with each new write. The overall throughput with upsert would be undermined.
To mitigate this, we should consider deferring writing data files as well. The object store write we pass to the file writer could handle this. For example, if the data written is < 5MB, instead of doing the upload, it could just return an in-memory copy of the data to be written. This could be written later in the commit step, or compacted together with other deferred writes.
Drawback
This will mess with the transaction history
Users who want to be able to see a record of transaction history won't see that reflected in the new versions. However, users who want high throughput writes likely don't care as much about this, so it could make sense to control this with a setting.
Long-term, this could be addressed by the design in #3734, which separates the idea of user-facing transactions and the Lance-level transactions.
Beta Was this translation helpful? Give feedback.
All reactions