Skip to content

perf: implement batch processing in iterateEvalTree#406

Open
cheb0 wants to merge 4 commits into
329-batching-1from
329-batching-iterate-eval-tree
Open

perf: implement batch processing in iterateEvalTree#406
cheb0 wants to merge 4 commits into
329-batching-1from
329-batching-iterate-eval-tree

Conversation

@cheb0
Copy link
Copy Markdown
Collaborator

@cheb0 cheb0 commented Apr 22, 2026

Description

Continuation of #390

  • iterateEvalTree works with batches of lids, requests batches of mids and rids
  • fixes stopwatch measurements for get_mid step
  • array based hist map is decoupled into it's own struct

I did some measurements for both patches (this combined with #390) vs main (used bitpack encoding in both branches). For small ordinary searches there is no benefit. For dense analytic queries there is a decent improvement.

For our k6 benchmark seq-db-hist.js: 2.3 sec => 650 ms
For seq-db-aggs.js: 6.1 sec => 4.7 sec
Hist over _all_ (warm query) (3 prod fractions): ~37 ms => ~15 ms

Part of #329


  • I have read and followed all requirements in CONTRIBUTING.md;
  • I used LLM/AI assistance to make this pull request;

@cheb0 cheb0 changed the base branch from main to 329-batching-1 April 22, 2026 11:55
@cheb0
Copy link
Copy Markdown
Collaborator Author

cheb0 commented Apr 22, 2026

@seqbenchbot up main search-keyword-exact-match-warm

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 22, 2026

Nice, @cheb0 <(-^,^-)=b!

Your request was successfully served.
Identificator for your ongoing benchmark - e8eefca9.

Here is a list of helpful links:

  • Take a look at Grafana dashboard;
  • Live-tailing logs are also available;

Have a great time!

@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 78.76712% with 31 lines in your changes missing coverage. Please review.
✅ Project coverage is 70.58%. Comparing base (da8604a) to head (74748b8).

Files with missing lines Patch % Lines
frac/processor/search.go 71.26% 22 Missing and 3 partials ⚠️
frac/sealed/seqids/provider.go 70.00% 2 Missing and 1 partial ⚠️
frac/sealed_index.go 72.72% 2 Missing and 1 partial ⚠️
Additional details and impacted files
@@                Coverage Diff                 @@
##           329-batching-1     #406      +/-   ##
==================================================
- Coverage           71.54%   70.58%   -0.97%     
==================================================
  Files                 220      221       +1     
  Lines               16568    20423    +3855     
==================================================
+ Hits                11854    14415    +2561     
- Misses               3840     5128    +1288     
- Partials              874      880       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@cheb0
Copy link
Copy Markdown
Collaborator Author

cheb0 commented Apr 22, 2026

@seqbenchbot down e8eefca9

@seqbenchbot
Copy link
Copy Markdown
Collaborator

seqbenchbot commented Apr 22, 2026

Nice, @cheb0 <(-^,^-)=b!

The benchmark with identificator e8eefca9 was finished.
I've prepared a summary for you. Click on Show summary button to see it:

Show summary
Query Type mean (ms) stddev (ms) p(50) (ms) p(95) (ms) p(99) (ms) iterations
base comp diff base comp diff base comp diff base comp diff base comp diff base comp diff
bulk
warm 65.92 67.00 +1.65% 25.11 26.77 +6.60% 58.00 60.00 +3.45% 118.00 124.00 +5.08% 157.50 165.00 +4.76% 2450.00 2450.00 0.00%
service:payment-backend-eu
AND k8s_namespace:prod
warm 130.57 129.30 -0.97% 115.82 112.04 -3.26% 114.00 115.00 +0.88% 324.00 319.00 -1.54% 669.50 642.50 -4.03% 8339.00 8363.00 +0.29%

Have a great time!

@cheb0 cheb0 marked this pull request as ready for review April 23, 2026 14:19
@eguguchkin eguguchkin requested review from dkharms and forshev April 27, 2026 11:03
for _, lid := range lids {
rawLid := lid.Unpack()
blockIdx := p.table.GetIDBlockIndexByLID(rawLid)
if p.midCache.blockIndex != int(blockIdx) {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: fillMIDs has this check inside. did you add it to avoid function call?

Comment thread frac/processor/search.go
// Get MIDs
if needMids > 0 {
timerMID.Start()
mids = idsIndex.GetMIDs(lidsSlice[0:needMids], mids[:0])
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: technically we can omit the lower bound if it equals 0

lidsSlice[:needMids]

@cheb0 cheb0 added the performance Features or improvements that positively affect seq-db performance label May 12, 2026
return seq.MID(p.midCache.GetValByLID(uint32(lid))), nil
}

func (p *Provider) MIDs(lids []node.LID, out []seq.MID) ([]seq.MID, error) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why Provider has method for retrieving a batch of MID but there is no similar method for RID?

Comment thread frac/processor/search.go
defer searchBuffersPool.Put(buffers)
mids := buffers.mids
rids := buffers.rids
lidsBuffer := buffers.lids
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't you reset buffers since slices are reused?

Comment thread frac/processor/search.go
lidsBuf := lidsBuf{
lids: make([]node.LID, 0, consts.LIDBlockCap),
}
return searchBuffers{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's better to return a pointer here, otherwise there will be unnecessary allocations since any is returned.

Comment thread frac/processor/search.go
Comment on lines +237 to +238
filterMIDs := sw.Timer("filter_mids")
updateHist := sw.Timer("update_hist")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filterMIDs := sw.Timer("filter_mids")
updateHist := sw.Timer("update_hist")
timerFilterMIDs := sw.Timer("filter_mids")
timerUpdateHist := sw.Timer("update_hist")

Comment thread frac/processor/search.go
}

type LIDsIter interface {
Lids(out []node.LID) []node.LID
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LIDs(out []node.LID) []node.LID

Comment thread frac/processor/search.go
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave it here since it is out of scope of this diff.

Take a look at https://github.com/ozontech/seq-db/blob/329-batching-iterate-eval-tree/frac/sealed/lids/iterator_desc.go#L121-L131 -- I guess you've introduced code duplication while performing rebase.

Comment thread frac/processor/search.go
return total, ids, hist, aggs, nil
}

func filterOutOfRangeMIDs(params SearchParams, mids []seq.MID, lidsSlice []node.LID) ([]seq.MID, []node.LID) {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure what purpose this function serves.

Per my understanding, we cannot iterate over seq.LID which correspond to seq.ID that lie outside of user-requested range [from; to] -- this is guaranteed because we calculate minLID and maxLID in getLIDsBorders and use those in all iterators to set boundaries.

Am I missing something?

Comment thread frac/processor/search.go
buffers := searchBuffersPool.Get().(searchBuffers)
defer searchBuffersPool.Put(buffers)
mids := buffers.mids
rids := buffers.rids
Copy link
Copy Markdown
Member

@dkharms dkharms May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Starting a petition to protect Vim users and their descendants — we require spaces. This is how we navigate code. Thank you for your cooperation.

Maybe something like?

	var (
		total  int
		lastID seq.ID
		ids    seq.IDSources
	)

	buffers := searchBuffersPool.Get().(searchBuffers)
	defer searchBuffersPool.Put(buffers)

Comment thread frac/processor/search.go
}
// limit how much we drain from eval tree for one-by-one flow. ignored for batched flow
need = min(need, maxLidsToDrain)
needLids = min(needLids, maxLidsToDrain)
Copy link
Copy Markdown
Member

@dkharms dkharms May 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can move this whole thing with calculating limits/offsets/etc to the batch? I mean something like:

if ok {
	evalTreeIter = func(need int, _ lidsBuf) LIDsIter {
		// batched flow: juts get a batch and return
		return batchNode.NextBatch().Trim(need)
		// Or return batchNode.NextBatch(need)
	}
} else {
	...
}

func (b LIDBatch) Trim(k int) LIDBatch {
	b.lids = b.lids[:min(k, len(b.lids))]
	return b
}

@eguguchkin eguguchkin modified the milestones: v0.74.0, v0.72.0 May 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance Features or improvements that positively affect seq-db performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants