[WIP] Add include_for_find support via 'expand' #877

NickLaMuro · 2020-07-28T22:42:01Z

Note: Requires #874 to be fully effective.

Allows for opt in of :include_for_find option for Rbac::Filterer via the expand request parameter.

Include for find will effectively add a .include to the base query done in collection_search, which in cases like this:

/api/vms?expand=resources,hardware&attributes=num_cpu,name

Can avoid an N+1 on the associated resource (in the above case, hardware).

Example response

For the above query

{
    "actions": [
        {
            "href": "http://localhost:3000/api/vms",
            "method": "post",
            "name": "query"
        },
        {
            "href": "http://localhost:3000/api/vms",
            "method": "post",
            "name": "edit"
        },
        {
            "href": "http://localhost:3000/api/vms",
            "method": "post",
            "name": "add_lifecycle_event"
        },
				...
    ],
    "count": 123,
    "links": {
        "first": "http://localhost:3000/api/vms?expand=resources,hardware&attributes=num_cpu,name&offset=0",
        "last": "http://localhost:3000/api/vms?expand=resources,hardware&attributes=num_cpu,name&offset=0",
        "self": "http://localhost:3000/api/vms?expand=resources,hardware&attributes=num_cpu,name&offset=0"
    },
    "name": "vms",
    "pages": 1,
    "resources": [
        {
            "href": "http://localhost:3000/api/vms/101",
            "id": "101",
            "name": "cluster-vm-us-east-101",
            "num_cpu": 16
        },
        {
            "href": "http://localhost:3000/api/vms/103",
            "id": "102",
            "name": "cluster-vm-us-east-102",
            "num_cpu": 8
        },
        {
            "href": "http://localhost:3000/api/vms/103",
            "id": "103",
            "name": "cluster-vm-us-east-103",
            "num_cpu": 16
        },
        ...
    ],
    "subcount": 123
}

Performance Results

Not, that to gain any performance benefit from this PR using the /api/vm endpoint, #874 must be included as well.

Before

ms	queries	query (ms)	rows
4241	1118	527.5	3147
2192	1116	460.4	1665
2132	1116	512.0	1665
2463	1116	469.7	1665
2592	1116	549.0	1665

After #874 only (not this PR)

ms	queries	query (ms)	rows
2627	565	272.7	2594
925	563	171.4	1112
1078	563	217.9	1112
886	563	160.5	1112
1156	563	248.3	1112

After #874 + #877

ms	queries	query (ms)	rows
1738	12	35	2041
260	10	9.8	559
262	10	9.2	559
257	10	9.6	559
259	10	10.0	559

Links

Partially fixes API Performance #880

abellotti · 2020-07-29T01:40:53Z

app/controllers/api/base_controller/renderer.rb

@@ -185,6 +185,7 @@ def collection_filterer(res, type, klass, is_subcollection = false)
        options[:filter] = miq_expression if miq_expression
        options[:offset] = params['offset'] if params['offset']
        options[:limit] = params['limit'] if params['limit']
+        options[:include_for_find] = determine_include_for_find(klass)


Fryguy · 2020-07-29T13:09:56Z

Nice! Do you have some performance comparisons?

app/controllers/api/base_controller/parameters.rb

jrafanie · 2020-07-29T16:12:09Z

Nice! Do you have some performance comparisons?

@NickLaMuro do we have to opt-in or something? I don't see improvements with applying this change on this request:

Before:
bundle exec miqperf benchmark -ac 5 "/api/vms?expand=resources&attributes=num_cpu,name"

/api/vms
|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 2054 |    1116 |     1011.9 | 1665 |
| 2299 |    1116 |     1092.3 | 1665 |
| 2031 |    1116 |      987.9 | 1665 |
| 1952 |    1116 |      939.9 | 1665 |
| 1934 |    1116 |      943.0 | 1665 |

After:
(add expand=hardware)

bundle exec miqperf benchmark -ac 5 "/api/vms?expand=resources,hardware&attributes=num_cpu,name"

|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 3798 |    1118 |     1092.4 | 3112 |
| 2091 |    1116 |     1019.4 | 1665 |
| 2332 |    1116 |     1090.9 | 1665 |
| 2054 |    1116 |      972.1 | 1665 |
| 1973 |    1116 |      933.7 | 1665 |
| 2082 |    1116 |      990.3 | 1665 |
| 2092 |    1116 |      973.7 | 1665 |
| 2089 |    1116 |      978.1 | 1665 |
| 2320 |    1116 |     1076.9 | 1665 |
| 2098 |    1116 |      984.0 | 1665 |

abellotti · 2020-07-29T16:18:18Z

@jrafanie you won't see an improvement with that request (attributes=...) as this enhancement is for expanding subcollections. i.e. expand=... see method used https://github.com/ManageIQ/manageiq-api/pull/877/files#diff-8caecf6616f6fd1e2602842ad7dd51f0R108. Once this is expanded to also include virtual attributes and associations (via attributes), then you'd see the improvement with your query.

NickLaMuro · 2020-07-29T16:33:05Z

@abellotti no, I don't think that is correct.

The N+1 was happening because num_cpu is a virtual_delegate on hardware, and so it has to fetch it individually for each VM record when it gets here:

https://github.com/ManageIQ/manageiq-api/blob/master/app/controllers/api/base_controller/renderer.rb#L214

However, I think the issue is that @jrafanie is seeing I was probably testing locally with this patch also applied:

#874

So I think they were working together to get a better result overall, and without it, the N+1 causing a refetch of the VMs obliterates any .includes that might have been included early on in the .collection_search method.

This report output is a single run where I was modifying a few things in place:

$ bundle exec miqperf report --last
/api/vms
|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 2453 |     565 |      193.8 | 2594 | <-
| 2852 |     566 |      208.6 | 2595 |   |-- Before
| 2930 |     566 |      240.7 | 2595 | <-
| 2290 |      13 |         79 | 2042 | <---- After

(ignore the ms in the above, since it had to do a reload on some code)

And the last one, I tweaked the .reject from being a != (incorrect) to == (correct), and that caused the queries to drop down to what they should be.

I will mark this PR as [WIP] while I double check my theory, and also to wait for #874 to be merge so I can add tests without conflicts

abellotti · 2020-07-29T16:42:29Z

GET /api/vms/:id?expand=hardware does not return hardware. (hardware is not a subcollection)
GET /api/vms/:id?attributes=hardware will return hardware

If the intent is to know what to prefetch for a virtual attributes specified in attributes, looking at expand does not seem right as callers will not be specifying those via expand (only for well defined subcollections). Is there a way to find what to preload for a virtual attribute specified via attributes ?

NickLaMuro · 2020-07-29T16:51:57Z

@abellotti Thanks, I figured you would have had some issue with how I was designing this 😄

If the intent is to know what to prefetch for a virtual attributes specified in attributes

Not really, or at least it wasn't my intent, though it could be.

The idea was more that this was overloading expand to allow for an opt-in for an .includes, without resorting to adding a new URL param (maybe called "includes=" or something similar). However, I could be convinced that it makes more sense to do that to avoid muddying the use case of expand= like I am doing here.

NickLaMuro · 2020-07-29T16:54:18Z

Is there a way to find what to preload for a virtual attribute specified via attributes ?

I can investigate what this would take more today, though, it is significantly harder to do with virtual_attributes than just a simple hardware.cpu_sockets, where the table name is right there when we need it. I don't know how one might do introspection on a virtual_attribute off hand, or if it is even possible.

This was the explicit approach, where a user has to opt in to do it, where the later would be more complex, but would be automatically applied.

NickLaMuro · 2020-07-29T16:56:21Z

Nice! Do you have some performance comparisons?

@Fryguy Yeah, sorry, I mentioned some when responding to @jrafanie here:

#877 (comment)

But I will throw some more formal ones together and put them in the PR description

NickLaMuro · 2020-07-29T20:37:40Z

As LJ showed, without my patch from #874 in place, I got the following for performance numbers:

$ bundle exec miqperf benchmark -ac 5 "/api/vms?expand=resources,hardware&attributes=num_cpu,name"
D, [2020-07-29T15:16:48.277412 #13338] DEBUG -- : --> logging in...
D, [2020-07-29T15:16:54.638950 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:16:59.451455 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:17:02.135731 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:17:04.765458 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:17:07.750053 #13338] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
$ bundle exec miqperf report --last
/api/vms
|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 4241 |    1118 |      527.5 | 3147 |
| 2192 |    1116 |      460.4 | 1665 |
| 2132 |    1116 |      512.0 | 1665 |
| 2463 |    1116 |      469.7 | 1665 |
| 2592 |    1116 |      549.0 | 1665 |

However, after applying the patch manually to that branch in the API:

$ git apply <(curl -L https://github.com/ManageIQ/manageiq-api/pull/874.patch)
$ git apply <(curl -L https://github.com/ManageIQ/manageiq-api/pull/877.patch)

I got the numbers I expected:

$ bundle exec miqperf benchmark -ac 5 "/api/vms?expand=resources,hardware&attributes=num_cpu,name"
D, [2020-07-29T15:18:04.040054 #13731] DEBUG -- : --> logging in...
D, [2020-07-29T15:18:09.667021 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:18:11.642624 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:18:12.000060 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:18:12.357015 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
D, [2020-07-29T15:18:12.777287 #13731] DEBUG -- : --> making GET request: /api/vms?expand=resources,hardware&attributes=num_cpu,name
$ bundle exec miqperf report --last
/api/vms
|   ms | queries | query (ms) | rows |
| ---: |    ---: |       ---: | ---: |
| 1738 |      12 |         35 | 2041 |
|  260 |      10 |        9.8 |  559 |
|  262 |      10 |        9.2 |  559 |
|  257 |      10 |        9.6 |  559 |
|  259 |      10 |       10.0 |  559 |

jrafanie · 2020-07-31T15:57:25Z

I've confirmed the patches on jansa improve the results as @NickLaMuro reported...

$ git apply <(curl -L https://github.com/ManageIQ/manageiq-api/pull/874.patch)
$ git apply <(curl -L https://github.com/ManageIQ/manageiq-api/pull/877.patch)

NickLaMuro · 2020-08-04T15:09:55Z

FYI: I am working on getting specs put together for this patch, but there was also some concerns I raised about .includes being added in a way that could bypass Rbac, so most likely the "breaks" will be pumped on this PR to get that vetted before pushing forward on this one.

jrafanie · 2020-08-07T15:38:30Z

@NickLaMuro tests are failing, can you take a look?

jrafanie · 2020-08-07T15:38:50Z

Also, is this still WIP now that #874 is merged?

NickLaMuro · 2020-08-07T16:10:26Z

tests are failing, can you take a look?

Yes I can. I was looking into some other potential improvements all of yesterday, so I will give this a look today and finish anything that needs to be finished up.

NickLaMuro · 2020-08-07T17:31:15Z

@jrafanie Okay, there are a few things here:

tests are failing, can you take a look?

So the test that is failing is the one that I wrote, and I am not sure if it is worth keeping now that it already isn't working.

Basically, I am testing the number of queries executed is what I expect. It is possible we could test in a different way, where we are only matching the SELECTS that we want test instead of what I was doing, which was ignoring the first N queries that generally on all requests (see :base_query_count in the new spec). But if we keep the existing methodology, I think the specs might be pretty brittle and cause failures like this more often than is valuable (though, it failing when you ADD query ).

However, before even answering that question...

Also, is this still WIP now that #874 is merged?

I forgot there was another reason for the WIP flag, and that was I wanted to have a larger discussion of whether or not this is the right architectural approach to take for this, or at least the one we want to take. I think somehow using :include_for_find is fine, but how the user interacts with that feature of RBac via expand I would argue is up for debate, which I have already talked about here:

#877 (comment)

I don't think that has been answered since though, so I think this is still WIP.

abellotti · 2020-08-07T18:18:41Z

Re: #877 (comment) the concern being:

Since we're overloading the meaning of expand, i.e. expand (and return) a subcollection and now prefetch association, the problem that would emerge would be what if we have a subcollection named same as the association. The user specifying expand= would get the perf. boost, but will also get the subcollection returned (not what we want).

While we can go through all currently exposed subcollections (via OPTIONS /api/:collection) and verify that this is not currently a valid problem scenario for now, no guarantee that won't bite us later. Hopefully there's another option we can contemplate.

NickLaMuro · 2020-08-07T19:19:34Z

@abellotti

Hopefully there's another option we can contemplate.

Definitely, as I mentioned before:

without resorting to adding a new URL param (maybe called "includes=" or something similar)

Doing that is definitely another option, but this probably was meant to more be a [POC] PR (which I probably should have labeled it as) and show that doing :includes_for_find was possible to reduce some API call overhead.

I would be happy to look into doing just that in a different branch, assuming making such an addition is something you agree with, because I will already be re-working this PR again as a result, and I don't fancy doing it a third if you didn't like the proposed concept from the beginning.

Edit: "proposed concept" being adding a new URL param includes= to implement the functionality found in this PR.

abellotti · 2020-08-07T19:37:39Z

My preference is of course us being able to auto-determine classes/relationships of virtual attributes to fetch, but short of that, I won't object to a new parameter includes= as our performance knob. @Fryguy @jrafanie thoughts ?

NickLaMuro · 2020-08-07T19:42:45Z

My preference is of course us being able to auto-determine classes/relationships of virtual attributes to fetch

Oh crap, I forgot you had mentioned that previously:

Is there a way to find what to preload for a virtual attribute specified via attributes ?

[ref]

My bad.

I am fine with either approach, I just tend to favor "performance knobs" since there is less of a chance it is breaking existing functionality or user queries in doing so, and if they need to be backported, they can be (safely).

However, I am fine moving forward with an implicit approach as well.

Allows for opt in of `:include_for_find` option for `Rbac::Filterer` via the `expand` request parameter. Include for find will effectively add a `.include` to the base query done in `collection_search`, which in cases like this: /api/vms?expand=resources,hardware&attributes=num_cpu,name Can avoid an N+1 on the associated resource.

miq-bot · 2020-08-08T01:36:28Z

Checked commit NickLaMuro@886dd97 with ruby 2.5.7, rubocop 0.69.0, haml-lint 0.28.0, and yamllint
5 files checked, 6 offenses detected

app/controllers/api/base_controller/parameters.rb

❗ - Line 109, Col 21 - Layout/BlockAlignment - end at 109, 20 is not aligned with relations = @req.derived_include_for_find.select do |relation| at 107, 8.

spec/support/shared_examples/expand_resources_includes.rb

❗ - Line 17, Col 50 - Layout/ExtraSpacing - Unnecessary spacing detected.
❗ - Line 17, Col 52 - Layout/SpaceAroundOperators - Operator ? should be surrounded by a single space.
❗ - Line 17, Col 64 - Layout/ExtraSpacing - Unnecessary spacing detected.
❗ - Line 17, Col 66 - Layout/SpaceAroundOperators - Operator : should be surrounded by a single space.
❗ - Line 21, Col 14 - Style/BlockDelimiters - Avoid using {...} for multi-line blocks.

Fryguy · 2020-08-08T16:22:56Z

I admit not reading the whole thread, so apologies if I'm repeating, but offhand, there's no way an end user will know that adding hardware will speed up num_cpu, so that doesn't feel like a good interface.

This is what :uses is for. Under the covers, we can ask what num_cpu uses, and then tack that into the includes chain, or even better just include num_cpu directly, which in turn will include the uses clause. Either way we can know what to include without putting the onus on the end user.

I am generally against more knobs because that's just increases complexity for the end user. So I'd be against adding anything to the interface if we can auto-determine things. If we can't, then I'd be ok with it, but I'd rather be sure we really need it.

jrafanie · 2020-08-10T14:40:32Z

yeah, I agree with @abellotti and @Fryguy in that if we correctly specify the uses clause, it would be best if the code makes "use" of it to optimize virtual columns. If that's not easy and we wanted a smaller change that's backportable, then, we can entertain other ideas but I don't see value in introducing an interface change via a new attribute such as includes if we have the possibility of doing this automatically and without user's mistakenly including the wrong associations.

NickLaMuro · 2020-08-31T14:58:03Z

Opened up #887 which I think fits what everyone else would like out of this one, so closing this one.

NickLaMuro requested review from abellotti and bdunne as code owners July 28, 2020 22:42

abellotti reviewed Jul 29, 2020

View reviewed changes

Fryguy added the performance label Jul 29, 2020

Fryguy approved these changes Jul 29, 2020

View reviewed changes

Fryguy added the jansa/yes? label Jul 29, 2020

Fryguy self-assigned this Jul 29, 2020

Fryguy reviewed Jul 29, 2020

View reviewed changes

app/controllers/api/base_controller/parameters.rb Outdated Show resolved Hide resolved

NickLaMuro changed the title ~~Add include_for_find support via 'expand'~~ [WIP] Add include_for_find support via 'expand' Jul 29, 2020

miq-bot added the wip label Jul 29, 2020

NickLaMuro force-pushed the include-for-find-support branch from 98e33df to ddd28eb Compare July 30, 2020 23:54

NickLaMuro force-pushed the include-for-find-support branch 2 times, most recently from d6e3f60 to d2270d4 Compare August 5, 2020 16:58

gtanzillo mentioned this pull request Aug 5, 2020

API Performance #880

Closed

8 tasks

NickLaMuro force-pushed the include-for-find-support branch from d2270d4 to 886dd97 Compare August 8, 2020 01:32

NickLaMuro mentioned this pull request Aug 29, 2020

AR VirtualAttribute queries + introspection #887

Merged

3 tasks

NickLaMuro closed this Aug 31, 2020

[WIP] Add include_for_find support via 'expand' #877

[WIP] Add include_for_find support via 'expand' #877

Uh oh!

Conversation

NickLaMuro commented Jul 28, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Example response

Performance Results

Before

After #874 only (not this PR)

After #874 + #877

Links

Uh oh!

abellotti Jul 29, 2020

Choose a reason for hiding this comment

Uh oh!

Fryguy commented Jul 29, 2020

Uh oh!

Uh oh!

jrafanie commented Jul 29, 2020

Uh oh!

abellotti commented Jul 29, 2020

Uh oh!

NickLaMuro commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abellotti commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickLaMuro commented Jul 29, 2020

Uh oh!

NickLaMuro commented Jul 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickLaMuro commented Jul 29, 2020

Uh oh!

NickLaMuro commented Jul 29, 2020 • edited by jrafanie Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrafanie commented Jul 31, 2020

Uh oh!

NickLaMuro commented Aug 4, 2020

Uh oh!

jrafanie commented Aug 7, 2020

Uh oh!

jrafanie commented Aug 7, 2020

Uh oh!

NickLaMuro commented Aug 7, 2020

Uh oh!

NickLaMuro commented Aug 7, 2020

Uh oh!

abellotti commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NickLaMuro commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

abellotti commented Aug 7, 2020

Uh oh!

NickLaMuro commented Aug 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

miq-bot commented Aug 8, 2020

Uh oh!

Fryguy commented Aug 8, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jrafanie commented Aug 10, 2020

Uh oh!

NickLaMuro commented Aug 31, 2020

Uh oh!

Uh oh!

NickLaMuro commented Jul 28, 2020 •

edited

Loading

NickLaMuro commented Jul 29, 2020 •

edited

Loading

abellotti commented Jul 29, 2020 •

edited

Loading

NickLaMuro commented Jul 29, 2020 •

edited

Loading

NickLaMuro commented Jul 29, 2020 •

edited by jrafanie

Loading

abellotti commented Aug 7, 2020 •

edited

Loading

NickLaMuro commented Aug 7, 2020 •

edited

Loading

NickLaMuro commented Aug 7, 2020 •

edited

Loading

Fryguy commented Aug 8, 2020 •

edited

Loading