Skip to content

[WIP] Add include_for_find support via 'expand' #877

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

NickLaMuro
Copy link
Member

@NickLaMuro NickLaMuro commented Jul 28, 2020

Note: Requires #874 to be fully effective.

Allows for opt in of :include_for_find option for Rbac::Filterer via the expand request parameter.

Include for find will effectively add a .include to the base query done in collection_search, which in cases like this:

/api/vms?expand=resources,hardware&attributes=num_cpu,name

Can avoid an N+1 on the associated resource (in the above case, hardware).

Example response

For the above query

{
    "actions": [
        {
            "href": "http://localhost:3000/api/vms",
            "method": "post",
            "name": "query"
        },
        {
            "href": "http://localhost:3000/api/vms",
            "method": "post",
            "name": "edit"
        },
        {
            "href": "http://localhost:3000/api/vms",
            "method": "post",
            "name": "add_lifecycle_event"
        },
				...
    ],
    "count": 123,
    "links": {
        "first": "http://localhost:3000/api/vms?expand=resources,hardware&attributes=num_cpu,name&offset=0",
        "last": "http://localhost:3000/api/vms?expand=resources,hardware&attributes=num_cpu,name&offset=0",
        "self": "http://localhost:3000/api/vms?expand=resources,hardware&attributes=num_cpu,name&offset=0"
    },
    "name": "vms",
    "pages": 1,
    "resources": [
        {
            "href": "http://localhost:3000/api/vms/101",
            "id": "101",
            "name": "cluster-vm-us-east-101",
            "num_cpu": 16
        },
        {
            "href": "http://localhost:3000/api/vms/103",
            "id": "102",
            "name": "cluster-vm-us-east-102",
            "num_cpu": 8
        },
        {
            "href": "http://localhost:3000/api/vms/103",
            "id": "103",
            "name": "cluster-vm-us-east-103",
            "num_cpu": 16
        },
        ...
    ],
    "subcount": 123
}

Performance Results

Not, that to gain any performance benefit from this PR using the /api/vm endpoint, #874 must be included as well.

Before

ms queries query (ms) rows
4241 1118 527.5 3147
2192 1116 460.4 1665
2132 1116 512.0 1665
2463 1116 469.7 1665
2592 1116 549.0 1665

After #874 only (not this PR)

ms queries query (ms) rows
2627 565 272.7 2594
925 563 171.4 1112
1078 563 217.9 1112
886 563 160.5 1112
1156 563 248.3 1112

After #874 + #877

ms queries query (ms) rows
1738 12 35 2041
260 10 9.8 559
262 10 9.2 559
257 10 9.6 559
259 10 10.0 559

Links

@@ -185,6 +185,7 @@ def collection_filterer(res, type, klass, is_subcollection = false)
options[:filter] = miq_expression if miq_expression
options[:offset] = params['offset'] if params['offset']
options[:limit] = params['limit'] if params['limit']
options[:include_for_find] = determine_include_for_find(klass)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😍

@Fryguy
Copy link
Member

Fryguy commented Jul 29, 2020

Nice! Do you have some performance comparisons?

@Fryguy Fryguy self-assigned this Jul 29, 2020
@jrafanie
Copy link
Member

Nice! Do you have some performance comparisons?

@NickLaMuro do we have to opt-in or something? I don't see improvements with applying this change on this request:

Before:
bundle exec miqperf benchmark -ac 5 "/api/vms?expand=resources&attributes=num_cpu,name"

/api/vms
|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 2054 |    1116 |     1011.9 | 1665 |
| 2299 |    1116 |     1092.3 | 1665 |
| 2031 |    1116 |      987.9 | 1665 |
| 1952 |    1116 |      939.9 | 1665 |
| 1934 |    1116 |      943.0 | 1665 |

After:
(add expand=hardware)

bundle exec miqperf benchmark -ac 5 "/api/vms?expand=resources,hardware&attributes=num_cpu,name"

|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 3798 |    1118 |     1092.4 | 3112 |
| 2091 |    1116 |     1019.4 | 1665 |
| 2332 |    1116 |     1090.9 | 1665 |
| 2054 |    1116 |      972.1 | 1665 |
| 1973 |    1116 |      933.7 | 1665 |
| 2082 |    1116 |      990.3 | 1665 |
| 2092 |    1116 |      973.7 | 1665 |
| 2089 |    1116 |      978.1 | 1665 |
| 2320 |    1116 |     1076.9 | 1665 |
| 2098 |    1116 |      984.0 | 1665 |

@abellotti
Copy link
Member

@jrafanie you won't see an improvement with that request (attributes=...) as this enhancement is for expanding subcollections. i.e. expand=... see method used https://github.com/ManageIQ/manageiq-api/pull/877/files#diff-8caecf6616f6fd1e2602842ad7dd51f0R108. Once this is expanded to also include virtual attributes and associations (via attributes), then you'd see the improvement with your query.

@NickLaMuro
Copy link
Member Author

NickLaMuro commented Jul 29, 2020

@abellotti no, I don't think that is correct.

The N+1 was happening because num_cpu is a virtual_delegate on hardware, and so it has to fetch it individually for each VM record when it gets here:

https://github.com/ManageIQ/manageiq-api/blob/master/app/controllers/api/base_controller/renderer.rb#L214

However, I think the issue is that @jrafanie is seeing I was probably testing locally with this patch also applied:

#874

So I think they were working together to get a better result overall, and without it, the N+1 causing a refetch of the VMs obliterates any .includes that might have been included early on in the .collection_search method.

This report output is a single run where I was modifying a few things in place:

$ bundle exec miqperf report --last
/api/vms
|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 2453 |     565 |      193.8 | 2594 | <-
| 2852 |     566 |      208.6 | 2595 |   |-- Before
| 2930 |     566 |      240.7 | 2595 | <-
| 2290 |      13 |         79 | 2042 | <---- After

(ignore the ms in the above, since it had to do a reload on some code)

And the last one, I tweaked the .reject from being a != (incorrect) to == (correct), and that caused the queries to drop down to what they should be.


I will mark this PR as [WIP] while I double check my theory, and also to wait for #874 to be merge so I can add tests without conflicts

@abellotti
Copy link
Member

abellotti commented Jul 29, 2020

GET /api/vms/:id?expand=hardware does not return hardware. (hardware is not a subcollection)
GET /api/vms/:id?attributes=hardware will return hardware

If the intent is to know what to prefetch for a virtual attributes specified in attributes, looking at expand does not seem right as callers will not be specifying those via expand (only for well defined subcollections). Is there a way to find what to preload for a virtual attribute specified via attributes ?

@NickLaMuro NickLaMuro changed the title Add include_for_find support via 'expand' [WIP] Add include_for_find support via 'expand' Jul 29, 2020
@miq-bot miq-bot added the wip label Jul 29, 2020
@NickLaMuro
Copy link
Member Author

@abellotti Thanks, I figured you would have had some issue with how I was designing this 😄

If the intent is to know what to prefetch for a virtual attributes specified in attributes

Not really, or at least it wasn't my intent, though it could be.

The idea was more that this was overloading expand to allow for an opt-in for an .includes, without resorting to adding a new URL param (maybe called "includes=" or something similar). However, I could be convinced that it makes more sense to do that to avoid muddying the use case of expand= like I am doing here.

@NickLaMuro
Copy link
Member Author

NickLaMuro commented Jul 29, 2020

Is there a way to find what to preload for a virtual attribute specified via attributes ?

I can investigate what this would take more today, though, it is significantly harder to do with virtual_attributes than just a simple hardware.cpu_sockets, where the table name is right there when we need it. I don't know how one might do introspection on a virtual_attribute off hand, or if it is even possible.

This was the explicit approach, where a user has to opt in to do it, where the later would be more complex, but would be automatically applied.

@NickLaMuro
Copy link
Member Author

Nice! Do you have some performance comparisons?

@Fryguy Yeah, sorry, I mentioned some when responding to @jrafanie here:

#877 (comment)

But I will throw some more formal ones together and put them in the PR description

@NickLaMuro
Copy link
Member Author

NickLaMuro commented Jul 29, 2020

As LJ showed, without my patch from #874 in place, I got the following for performance numbers:

$ bundle exec miqperf benchmark -ac 5 "/api/vms?expand=resources,hardware&attributes=num_cpu,name"
D, [2020-07-29T15:16:48.277412 #13338] DEBUG -- : --> logging in...
D, [2020-07-29T15:16:54.638950 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:16:59.451455 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:17:02.135731 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:17:04.765458 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:17:07.750053 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
$ bundle exec miqperf report --last
/api/vms
|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 4241 |    1118 |      527.5 | 3147 |
| 2192 |    1116 |      460.4 | 1665 |
| 2132 |    1116 |      512.0 | 1665 |
| 2463 |    1116 |      469.7 | 1665 |
| 2592 |    1116 |      549.0 | 1665 |

However, after applying the patch manually to that branch in the API:

$ git apply <(curl -L https://github.com/ManageIQ/manageiq-api/pull/874.patch)
$ git apply <(curl -L https://github.com/ManageIQ/manageiq-api/pull/877.patch)

I got the numbers I expected:

$ bundle exec miqperf benchmark -ac 5 "/api/vms?expand=resources,hardware&attributes=num_cpu,name"
D, [2020-07-29T15:18:04.040054 #13731] DEBUG -- : --> logging in...
D, [2020-07-29T15:18:09.667021 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:18:11.642624 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:18:12.000060 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:18:12.357015 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:18:12.777287 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
$ bundle exec miqperf report --last
/api/vms
|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 1738 |      12 |         35 | 2041 |
|  260 |      10 |        9.8 |  559 |
|  262 |      10 |        9.2 |  559 |
|  257 |      10 |        9.6 |  559 |
|  259 |      10 |       10.0 |  559 |

@NickLaMuro NickLaMuro force-pushed the include-for-find-support branch from 98e33df to ddd28eb Compare July 30, 2020 23:54
@jrafanie
Copy link
Member

I've confirmed the patches on jansa improve the results as @NickLaMuro reported...

$ git apply <(curl -L https://github.com/ManageIQ/manageiq-api/pull/874.patch)
$ git apply <(curl -L https://github.com/ManageIQ/manageiq-api/pull/877.patch)

@NickLaMuro
Copy link
Member Author

FYI: I am working on getting specs put together for this patch, but there was also some concerns I raised about .includes being added in a way that could bypass Rbac, so most likely the "breaks" will be pumped on this PR to get that vetted before pushing forward on this one.

@NickLaMuro NickLaMuro force-pushed the include-for-find-support branch 2 times, most recently from d6e3f60 to d2270d4 Compare August 5, 2020 16:58
@gtanzillo gtanzillo mentioned this pull request Aug 5, 2020
8 tasks
@jrafanie
Copy link
Member

jrafanie commented Aug 7, 2020

@NickLaMuro tests are failing, can you take a look?

@jrafanie
Copy link
Member

jrafanie commented Aug 7, 2020

Also, is this still WIP now that #874 is merged?

@NickLaMuro
Copy link
Member Author

tests are failing, can you take a look?

Yes I can. I was looking into some other potential improvements all of yesterday, so I will give this a look today and finish anything that needs to be finished up.

@NickLaMuro
Copy link
Member Author

@jrafanie Okay, there are a few things here:

tests are failing, can you take a look?

So the test that is failing is the one that I wrote, and I am not sure if it is worth keeping now that it already isn't working.

Basically, I am testing the number of queries executed is what I expect. It is possible we could test in a different way, where we are only matching the SELECTS that we want test instead of what I was doing, which was ignoring the first N queries that generally on all requests (see :base_query_count in the new spec). But if we keep the existing methodology, I think the specs might be pretty brittle and cause failures like this more often than is valuable (though, it failing when you ADD query ).

However, before even answering that question...

Also, is this still WIP now that #874 is merged?

I forgot there was another reason for the WIP flag, and that was I wanted to have a larger discussion of whether or not this is the right architectural approach to take for this, or at least the one we want to take. I think somehow using :include_for_find is fine, but how the user interacts with that feature of RBac via expand I would argue is up for debate, which I have already talked about here:

#877 (comment)

I don't think that has been answered since though, so I think this is still WIP.

@abellotti
Copy link
Member

abellotti commented Aug 7, 2020

Re: #877 (comment) the concern being:

Since we're overloading the meaning of expand, i.e. expand (and return) a subcollection and now prefetch association, the problem that would emerge would be what if we have a subcollection named same as the association. The user specifying expand= would get the perf. boost, but will also get the subcollection returned (not what we want).

While we can go through all currently exposed subcollections (via OPTIONS /api/:collection) and verify that this is not currently a valid problem scenario for now, no guarantee that won't bite us later. Hopefully there's another option we can contemplate.

@NickLaMuro
Copy link
Member Author

NickLaMuro commented Aug 7, 2020

@abellotti

Hopefully there's another option we can contemplate.

Definitely, as I mentioned before:

without resorting to adding a new URL param (maybe called "includes=" or something similar)

Doing that is definitely another option, but this probably was meant to more be a [POC] PR (which I probably should have labeled it as) and show that doing :includes_for_find was possible to reduce some API call overhead.

I would be happy to look into doing just that in a different branch, assuming making such an addition is something you agree with, because I will already be re-working this PR again as a result, and I don't fancy doing it a third if you didn't like the proposed concept from the beginning.

Edit: "proposed concept" being adding a new URL param includes= to implement the functionality found in this PR.

@abellotti
Copy link
Member

My preference is of course us being able to auto-determine classes/relationships of virtual attributes to fetch, but short of that, I won't object to a new parameter includes= as our performance knob. @Fryguy @jrafanie thoughts ?

@NickLaMuro
Copy link
Member Author

NickLaMuro commented Aug 7, 2020

My preference is of course us being able to auto-determine classes/relationships of virtual attributes to fetch

Oh crap, I forgot you had mentioned that previously:

Is there a way to find what to preload for a virtual attribute specified via attributes ?

My bad.

I am fine with either approach, I just tend to favor "performance knobs" since there is less of a chance it is breaking existing functionality or user queries in doing so, and if they need to be backported, they can be (safely).

However, I am fine moving forward with an implicit approach as well.

Allows for opt in of `:include_for_find` option for `Rbac::Filterer` via
the `expand` request parameter.

Include for find will effectively add a `.include` to the base query
done in `collection_search`, which in cases like this:

    /api/vms?expand=resources,hardware&attributes=num_cpu,name

Can avoid an N+1 on the associated resource.
@NickLaMuro NickLaMuro force-pushed the include-for-find-support branch from d2270d4 to 886dd97 Compare August 8, 2020 01:32
@miq-bot
Copy link
Member

miq-bot commented Aug 8, 2020

Checked commit NickLaMuro@886dd97 with ruby 2.5.7, rubocop 0.69.0, haml-lint 0.28.0, and yamllint
5 files checked, 6 offenses detected

app/controllers/api/base_controller/parameters.rb

  • ❗ - Line 109, Col 21 - Layout/BlockAlignment - end at 109, 20 is not aligned with relations = @req.derived_include_for_find.select do |relation| at 107, 8.

spec/support/shared_examples/expand_resources_includes.rb

@Fryguy
Copy link
Member

Fryguy commented Aug 8, 2020

I admit not reading the whole thread, so apologies if I'm repeating, but offhand, there's no way an end user will know that adding hardware will speed up num_cpu, so that doesn't feel like a good interface.

This is what :uses is for. Under the covers, we can ask what num_cpu uses, and then tack that into the includes chain, or even better just include num_cpu directly, which in turn will include the uses clause. Either way we can know what to include without putting the onus on the end user.

I am generally against more knobs because that's just increases complexity for the end user. So I'd be against adding anything to the interface if we can auto-determine things. If we can't, then I'd be ok with it, but I'd rather be sure we really need it.

@jrafanie
Copy link
Member

yeah, I agree with @abellotti and @Fryguy in that if we correctly specify the uses clause, it would be best if the code makes "use" of it to optimize virtual columns. If that's not easy and we wanted a smaller change that's backportable, then, we can entertain other ideas but I don't see value in introducing an interface change via a new attribute such as includes if we have the possibility of doing this automatically and without user's mistakenly including the wrong associations.

@NickLaMuro
Copy link
Member Author

Opened up #887 which I think fits what everyone else would like out of this one, so closing this one.

@NickLaMuro NickLaMuro closed this Aug 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

API Performance
5 participants