-
Notifications
You must be signed in to change notification settings - Fork 182
feat: Improve unstable size analysis support #935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: Improve unstable size analysis support #935
Conversation
✅ Deploy Preview for salsa-rs canceled.
|
CodSpeed Performance ReportMerging #935 will degrade performances by 5.56%Comparing Summary
Benchmarks breakdown
|
How exactly does this regress performance? |
I think it would be great if we can split this PR. 3. is fairly uncontroversial. The use cases for 1 and 2 are less clear to me. For 1. can you give me an example on what this is uesd for? I'm asking because adding extra arguments makes using it in combination with For 2. It prefer an enum as argument rather than a bool. It makes it more expressive what |
For example, I want to analyze memory usage in rust-analyzer per type and not per query (it is more convenient to draw conclusions this way). This provides a way to do that and more (e.g. what I'm doing now, checking the effect of rust-lang/rust-analyzer#18403) with minimum effort. The disturbance you described is minimal because you can have a single method
Fair enough. |
89c8a47
to
0e29abf
Compare
Edited to use an enum instead of bool. |
I don't think I fully understand the use case. The table in the linked issue is grouped by query? I think we also have an API to iterate over all memos. Could your impelmentation use that instead? |
I want e.g. if I have Currently I have nowhere to store this attribute information (besides a static, which is really ugly). This is what I want to change. |
When using this in rust-analyzer I noticed there is still no way to have |
I see. This seems very specific. Have you considered using a thread local in your code? That should give you the same without that we need to change salsa for this less common case. |
I can do that, but it will be cleaner if it will be a parameter to the function. And this enables a lot of patterns (basically, you can apply any operation to the whole database, you can even select the operation at runtime should there be such need), and the cost is minimal - if you don't use this feature, you need to replace And FWIW, even with And in general, passing an addition data parameter to a callback is a common pattern (especially in C, where there are no closures, but here we can't use closures so it's back to the C idiom). |
I'm not oposed to having such a function that allows analysing the entire db. It only feels to me that this goes beyond what |
The design seems perfect to me. The alternative design will be to have a trait for that analysis that is called automatically (if it is implemented, via autoref specialization or explicitly specifying) but that means it'll be foreign for the Salsa user, at least for my framework of size analysis this will cause major trouble, as I need a lot of impls for third-party types. The naming... I agree it is not ideal. In fact, I originally renamed this function to |
Either way. I suggest splitting the PR. I'm on board with 2 and 3 but still think that 1 is looking for a generic traversal logic rather than what get size provides but I'm interested to hear from others. |
0e29abf
to
9d0acfb
Compare
Okay. I've omitted 1 from the PR and will open a Zulip discussion. |
@ibraheemdev would you mind taking a look at the changes? |
This looks fine to me, but it could use some tests for both the panicking behaviour and the new |
@@ -89,6 +92,12 @@ macro_rules! setup_input_struct { | |||
|
|||
type Revisions = [$zalsa::Revision; $N]; | |||
type Durabilities = [$zalsa::Durability; $N]; | |||
|
|||
$( | |||
fn heap_size(value: &Self::Fields, _panic_if_missing: $zalsa::PanicIfHeapSizeMissing) -> usize { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of threading the PanicIfHeapSizeMissing
everywhere, you could add a has_heap_size
method to Ingredient
and do a single check that all ingredients return true
if PanicIfHeapSizeMissing::Yes
is set. You'd need a HAS_HEAP_SIZE
constant on all the configurations, but that should be simple to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you could have Ingredient::memory_usage
return whether or not the heap size was accounted for, and have Configuration::heap_size
default to None
instead of 0
. Either way seems fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually like the current way more. It is simpler (especially if we want to show the name of the problematic ingredient), and the threading is not that bad. Do you have a preference?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine if you think that's easier, just an idea. It might be nice to add an is_yes
helper method to convert to a boolean though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or you could have Ingredient::memory_usage return whether or not the heap size was accounted for,
I'd prefer this approach as it's more flexible (if I understand it correctly). It leaves the decision of what to do if heap size isn't implemented to the caller. ty includes the memory analysis in production builts where panicking isn't an option, but we probably want to log a message if a heap size implementation is missing instead. This isn't possible with PanicIfHeapSizeMissing
but is possible if the function returns Option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think @ibraheemdev meant to expose that, just as an internal implementation detail.
If we do expose that, how do we? Do we change IngredientInfo::size_of_fields
to Option<usize>
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I had in mind but I see the API works slightly different than I thought. My assumption was that it returns an Iteratror
over all memos without doing any aggregation but MemoInfo
does a first level of aggregation. @ibraheemdev is this something we could change (return a list of SlotInfo
instead).
I also think this requires splitting size_of_fields
into two:
fields_stack_size
which returnsusize
fields_heap_size
which returnsOption<usize>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the memo aggregation a problem here? If we split size_of_fields
I think that should be sufficient to detect whether or not the heap size was tracked for a given memo type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the memo aggregation a problem here?
The API technically allows for one memo to return Some
but None
for another (both of the same ingredient) which is a bit awkward to handle. This shouldn't be possible but it isn't something we can express easily in the current API.
9d0acfb
to
dc4d5c7
Compare
I added tests, but I can't make sense of the numbers. |
tests/memory-usage.rs
Outdated
let input3 = MyInput::new(&db, 3); | ||
let input1 = MyInput::new(&db, String::with_capacity(100)); | ||
let input2 = MyInput::new(&db, String::with_capacity(100)); | ||
let input3 = MyInput::new(&db, String::with_capacity(100)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should make the fields different to ensure they aren't all interned to the same ID (the interned/tracked structs store the field not the entire input).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay I did that, but I still can't make sense of the numbers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which part doesn't make sense? It looks fine to me. The reason for the metadata size increase is alignment (previously the u32
field could fit into the padding).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it 522? It should be 400, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capacity
can be larger than the String::length
. You Can use shrink_to_fit
to shrink the String
to its length which should get you closer to the expected value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The capacity is actually exact here, it just also includes the stack size of the strings: 250+150+50+(24*3) = 522
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, got it.
1. Include an option `panic_if_missing` that will panic if there is an ingredient with no `heap_size()` defined, to ensure coverage. 2. Add `heap_size()` to tracked structs, interneds an inputs.
dc4d5c7
to
d0ceeb4
Compare
panic_if_missing
that will panic if there is an ingredient with noheap_size()
defined, to ensure coverage.heap_size()
to tracked structs, interneds an inputs.