Skip to content

Conversation

@jeswr
Copy link
Collaborator

@jeswr jeswr commented Apr 4, 2022

This acts as PR 1 defined in #44 (comment).

The updated and cleaned up implementation is approximately 2x faster than the original pre-compiling strategy of transformation.

This is ready for review @RubenVerborgh @jacoscaz

@jeswr jeswr marked this pull request as draft April 4, 2022 14:54
@coveralls
Copy link

coveralls commented Apr 4, 2022

Coverage Status

Coverage remained the same at 100.0% when pulling bc5d1cc on feat/faster-trasnforming into 358608f on main.

@jeswr jeswr requested a review from RubenVerborgh April 4, 2022 15:33
@jeswr jeswr marked this pull request as ready for review April 4, 2022 15:33
Copy link
Collaborator

@jacoscaz jacoscaz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jeswr this is an amazing take on the work done so far when it comes to synchronous transformations. Hats off to you!

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 4, 2022

Performance-wise, the following tests takes ~210 ms on my machine (MacBook Pro, Apple Silicon, M1, 16 GB) when using the main branch and ~23 ms when using this branch - 9.3x improvement.

let i = 0;
const arr = new Array(200000).fill(true).map(() => i++);

const iterator = new ArrayIterator(arr)
  .map((item: number) => item)
  .map((item: number) => item)
  .map((item: number) => item)
  .map((item: number) => item)
  .map((item: number) => item)
  .map((item: number) => item)
  .map((item: number) => item)
  .map((item: number) => item)
  .map((item: number) => item)
  .map((item: number) => item);

const now = Date.now();
iterator.on('data', () => {}).on('end', () => {
  console.log('elapsed', Date.now() - now);
});

@RubenVerborgh RubenVerborgh marked this pull request as draft April 4, 2022 21:35
Copy link
Owner

@RubenVerborgh RubenVerborgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your work!
The ideas are definitely good, but a significant amount of polishing is required.

This is a very tricky library; it is used in performance-critical code that is sensitive to race conditions. So the code has to have a certain simplicity at all places, such that developers do not have to think or wonder too much. We don't just need to write for the now, but for long-term maintenance. That 2% gain really doesn't matter if it leads to harder maintainability.

So now that we have the performance gains, let's focus on how to implement this as simple as possible; in particular the two transformation classes, which I think should become one. Then every method has to become really simple, and essentially only contain the implementation of one idea.

Co-authored-by: Ruben Verborgh <[email protected]>
@jeswr
Copy link
Collaborator Author

jeswr commented Apr 4, 2022

Thanks for your work! The ideas are definitely good, but a significant amount of polishing is required.

Of course!

That 2% gain really doesn't matter if it leads to harder maintainability.

If it was indeed 2% I would agree, but its a matter of 2x (1sec vs. 0 sec to 500maps on 1million elements) in terms of using this indexed loop structure rather than 'compiling' the function. That said, we can just write the compiled function using this nested loop and see similar results.

So now that we have the performance gains, let's focus on how to implement this as simple as possible; in particular the two transformation classes, which I think should become one. Then every method has to become really simple, and essentially only contain the implementation of one idea.

Sounds good :)

@RubenVerborgh
Copy link
Owner

RubenVerborgh commented Apr 4, 2022

Thanks, @jeswr!

If it was indeed 2% I would agree

It was just a fictitious 2% as an example, but I see the confusion I created 🙂

In general, I meant to say that: we've got the perf now. Let's look at approaching that same perf with the clearest code possible—and I don't mind sacrificing a couple of percentages for long-term maintainability. That is: unless there is a report of an actual performance problem, we should not prematurely optimize. For instance, 500 maps are a case I've never seen in practice; and whereas it is good to scale, scaling in such a high number of transformations has so far not been necessary.

@jeswr
Copy link
Collaborator Author

jeswr commented Apr 5, 2022

For instance, 500 maps are a case I've never seen in practice;

Good point, and this number of maps was unnecessary to see non-trivial perf. gains,

For the code below on DELL XPS15 32G ram we get the following results (@jacoscaz - I don't think you were using a large enough Array to see the full effect)

code time(ms) speedup
this PR 223 21.2x
#57 308 15.4x
main 4736 1x
import { ArrayIterator } from './dist/asynciterator.js'

let i = 0;
const arr = new Array(2000000).fill(true).map(() => i++);

const iterator = new ArrayIterator(arr)
  .map((item) => item)
  .map((item) => item)
  .map((item) => item)
  .map((item) => item)
  .map((item) => item)
  .map((item) => item)
  .map((item) => item)
  .map((item) => item)
  .map((item) => item)
  .map((item) => item);

const now = Date.now();
iterator.on('data', () => {}).on('end', () => {
  console.log('elapsed', Date.now() - now);
});

@jeswr jeswr marked this pull request as ready for review April 5, 2022 02:11
@RubenVerborgh
Copy link
Owner

Okay, we're onto something!

For benchmarks, perhaps rather than one 2000000-element iterator, create 100 20000-element iterators or so. Numbers might or might not be different.

Once we have decided on the implementation strategy (cfr. other PRs) and the code has matured, we can go ahead with this.

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 6, 2022

Here's a test that goes through 20_000 items 100 times, each time applying 2 map() and 1 filter() transformations.

let i = 0;
const arr = new Array(20_000).fill(true).map(() => i++);

const now = Date.now();
let times = 100;

const loop = () => {
  if (times === 0) {
    console.log('elapsed', Date.now() - now);
    return;
  }
  const iterator = new ArrayIterator(arr)
    .map((item: number) => item)
    .map((item: number) => item)
    .filter((item: number) => item % 2 === 0)
  ;
  iterator.on('data', () => {}).on('end', () => {
    times -= 1;
    loop();
  });
};

loop();

On my machine (MacBook Pro, Apple Silicon, M1, 16 GB) this takes ~770 ms with the current main and ~90 ms with this branch, so roughly 8.5 times faster.

@jeswr
Copy link
Collaborator Author

jeswr commented Apr 6, 2022

How does #59 compare on your machine @jacoscaz ?

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 6, 2022

@jeswr with the same test as my previous comment, #59 takes ~110 ms, so roughly 1.2 times slower than this branch.

A 20% difference over the lifespan of an engine like Comunica can have significant effects but I agree with @RubenVerborgh that there's a line after which maintainability must be favored over raw performance. I'd be happy with either branches, really.

@jeswr
Copy link
Collaborator Author

jeswr commented Apr 6, 2022

I'd imagine that the 20% slowdown is either caused by 9e64c82 or f03b541; so I would prefer to still use the other branch and just revert by a commit or 2

@jacoscaz sorry to be a pain, but my machine is a bit too consumed with other tasks to do good perf. testing so are you able to work out which commit is the problem?

@jacoscaz
Copy link
Collaborator

jacoscaz commented Apr 6, 2022

Perf at the last 3 commits of #59:

  • 5bbc92b ~98 - 100 ms, 1.1x slower
  • f03b541 ~100 - 105 ms, still roughly 1.1x slower
  • 9e64c82 ~105 - 115 ms, getting closer to 1.2x slower

I've run the test 5 times per commit. While the results are pretty consistent when it comes to which commit goes faster, the relative difference between one commit and the next is not as consistent. With these latest numbers I would definitely go for whichever branch is more maintainable.

@RubenVerborgh
Copy link
Owner

^ Let's not draw conclusions from 100ms though; we'll need tests that run a couple of seconds, preferably with a warm-up.

@RubenVerborgh
Copy link
Owner

See #59.

}

read(): T | null {
const item = this.source.read();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should probably be super.read() so all checks (are we not closed?) are still done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants