Skip to content

Fix SCT-3123 - 20 second chrome freeze#92

Draft
eapearson wants to merge 65 commits intomainfrom
fix-SCT-3123-20-second-chrome-freeze
Draft

Fix SCT-3123 - 20 second chrome freeze#92
eapearson wants to merge 65 commits intomainfrom
fix-SCT-3123-20-second-chrome-freeze

Conversation

@eapearson
Copy link
Contributor

@eapearson eapearson commented Sep 21, 2021

Fix PTV-3123: Navigator can “freeze” for 20 seconds when first opening

As reported in https://kbase-jira.atlassian.net/browse/SCT-3123, the Navigator can cause the Chrome browser to freeze for about 20 seconds when first loading.

Please read the ticket for analysis of the problem. The gist of it though is that Chrome will lock it's per-url (? ish) cache when there are 6 or more active connections for it. The cache lock will cause subsequent (or even existing?) connections to hang for 20 seconds - after 20 seconds something happens the locks are released. My task was not to fully understand this (I could speculate, e.g., that inactive keep-alive connections will be closed to unblock the cache, but I don't really know.)

Although the problem resolved itself after a period of time for the affected (internal) user, it did point to an existing issue with the Navigator - that it spawns 3 concurrent requests to auth. In production, where the problem was reported, CORS is invoked which causes each of the 3 requests to have a prefetch (HEAD) request. Now it may be a coincidence that this makes 6 requests, but this set of changes originall had the goal of tamping down these three requests to one.

The 3 concurrent connections are unnecessary from a web app point of view. Authentication need, and should, only be done once as the app starts.

So the goal of this set of fixes is to put that design in place.

It happens that, although limited in clock time to a couple of days of work, there are many necessary changes.

Changes

The first task was to determine why there were three auth calls upon startup.
The primary reason was that, as the Navigator started out as a Python app, when the Javascript web app was integrated, it was not done so holistically. Rather one part of the app was loaded into the header, another forming the main body. Both the header and main body handled authentication separately. In addition an internal component conducted its own authentication.

So the general solution was to hoist the authentication calls into a single location at the "top" of the web app, and provided the authentication information to the component tree.

As it happens, this design was also responsible for the primary ui anomaly also noted in the ticket -- that the login component did not show the user's avatar (rather the anonymous silhouette) and an empty, somewhat broken, menu, and the area which should show narratives was simply empty, with no message.

Dependencies in the project were very out of date, so, in stages, I updated them until they were completely current. This erased a great deal of technical debt and resolved a couple of security alerts. It required a modest set of changes for the major version updates.

Architectural Changes

The primary cause of the duplicate auth calls was that there were two basic entrypoints for the web app - the header and the body. Each of them did their own independent auth check.

I attribute this to the original design being Python-centric, but then subsequent development of the javascript front end (in TypeScript) did not entirely replace the Python layout and other markup supplied by one or more templates.

The Python layout was replaced with a React-based layout. This allows a single set of auth and user profile calls to be conducted as the app is loaded. It also allowed a lot of unused code to be removed.

At the same time, some of the async handling was improved by using an async state tracking interface. I've used this a lot in kbase-ui related work and it is very effective. Essentially an AsyncProcessState interface captures the initial, processing, error, and success states in a discrimated union generic type, in which the resulting value is available for the success state, and an error value for the error state.

There was another duplicate auth call invoked by a component, which was resolved by the usage of Context for auth information.

The propagation of auth and user profile information is via a React Context. This allows any component to handle auth state without any special custom coding, just standard Context.

Use react-select rather than custom dropdown for Sort

A custom dropdown existed for selection of narratives sort order. This control only seemed to different from a standard dropdown in that it included the label inside the dropdown. When this control broke during dependency update, I determined the best course of action was to simply use the same react-select control used elsewhere in the codebase.

The main advantage of this is that it offloads some cost for KBase:

  • maintenance for this control
  • testing
  • documentation

It is also consistent with the other dropdowns utilized in this project. The behavior of the custom dropdown was a little odd also. Due to the need to have the label inside the control, the actual dropdowns all had a large empty space on the left hand side, where the label appears in the primary control button.

Since we already depend on the upstream react-select 3rd party control, I don't see any cost to this change.

I've left it up for review whether this change in behavior is acceptable.

Authentication Handling

The behavior exhibited by the bug was to show an empty list of narratives, and the login button having an anonymous-user silhouette icon.

This itself was incorrect. It is the how the Navigator displayed when there was no authentication. In truth, the Navigator does not operate without authentication, so there is no valid display state for it.

Normally, authentication state is resolved very quickly - by detection of a browser cookie and a set of calls to the auth service. The end result is either a rendering of the Navigator if there is authentication, or a redirect to the kbase-ui login page if there isn't.

However, in this case, the auth service calls stalled for 20 seconds, so the assumption of a fast resolution of authentication state was broken.

Thus, this condition exposed the incorrect, unauthenticated view for around 20 seconds. Although fixing the 20 second pause does obviate this issue in a sense (since it would, in theory, always be quick and unnoticable), it does point to a design flaw that should be addressed.

There may be other reasons that the view would be exposed. For instance, the auth service or a network connection may be very slow. Even a flash of the intermediate state is undesirable and can be confusing to the end user.

The general solution is to refactor the app startup process to treat this as an asynchronous process which has the following states: none (initial), processing (loading), successful (authenticated or not), and error (some error encountered, e.g. service down.)

These states should be reflected in the Navigator ui during startup.

With this change, a slow evaluation of the authentication state (or frozen as in this case) will show a loading spinner in place of the Navigator. In the case of an authentication error, an error message is displayed.

A successful authentication query results in one of two outcomes.

if the authentication is not found (e.g. no session cookie, token expired, etc.), the Navigator redirects to the #login endpoint at kbase-ui.

If the authentication is found, the context is set up, and the navigator app is loaded.

The Navigator could be redesigned to operate well without authentication as well, but that would be an overall change in the inention of the design, which I took to be out of scope.

Miscellaneous Quality of Life Fixes

In the process of discovering and fixing these issues, I made several other changes to the development and testing experience I felt was necessary. There was also some reduction code and refactoring, especially to separate and normalize components, which helped with code clarity.

  • search input changed from type "text" to "search": this provides search-friendly features in browsers which support them (e.g. Safari), such as the search input clear button
  • added local development support with kbase-ui and documentation for it. This is important because the behavior of the app is different in local development and in a deployment. For instance:
    • lack of authentication will redirect to kbase-ui login in a deployment, but not in development;
    • a development configuration is utilized in local development mode, but normal deployment configuration can be used with kbase-ui;
  • added support for testing-library - this was required for some tests not supported by Enzyme, but has also become, for some time, the recommended testing library for React (well, recommended by Facebook).
  • Several components which were embedded within other components were broken out into independent files
  • Many, but not all, usages of inline styles replaced by imported stylesheets

Jira Ticket / Issue

Testing Instructions

  • Details for how to test the PR:
  • Tests pass locally and in GitHub Actions
  • Changes available by spinning up a local Navigator and navigating to X to see Y

Dev Checklist:

  • [-] My code follows the guidelines at https://sites.google.com/lbl.gov/trussresources/home?authuser=0
  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • [-] Any dependent changes have been merged and published in downstream modules
  • (JavaScript) I have run Prettier and ESLint on changed code manually or with a git precommit hook
  • [-] (Python) I have run Black and Flake8 on changed Python code manually or with a git precommit hook

Updating Version and Release Notes (if applicable)

…o prevent multiple calls for token; refactor tests appropriately; break some internal components into files to assist in the overall refactor
…t_server script contains other startup logic
…ates; just adds to the confusion in the codebase which led to this bug
…rom FB, CRA) testing library for React; and enzyme doesn't work with react-select.
…dropdown; ours failed with a React update, and it doesn't make sense to create (and have to support) this particular wheel, nor have a dropdown style different than the one already used elsewhere in this codebase.
…compass more of kbase-ui plugins, but never got very far.
…ly behavior on some browsers, e.g. a clear button.
@lgtm-com
Copy link

lgtm-com bot commented Sep 21, 2021

This pull request introduces 5 alerts and fixes 1 when merging 04a1c91 into 33ea5f7 - view on LGTM.com

new alerts:

  • 2 for Potentially inconsistent state update
  • 1 for Useless conditional
  • 1 for Unused variable, import, function or class
  • 1 for Unused or undefined state property

fixed alerts:

  • 1 for Unused variable, import, function or class

@coveralls
Copy link

coveralls commented Sep 22, 2021

Pull Request Test Coverage Report for Build 1343151276

  • 279 of 414 (67.39%) changed or added relevant lines in 35 files are covered.
  • 28 unchanged lines in 7 files lost coverage.
  • Overall coverage decreased (-3.7%) to 51.809%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/client/utils/AuthService.ts 21 22 95.45%
src/client/utils/NarrativeModel.ts 16 17 94.12%
src/client/components/dashboard/NarrativeList/ControlMenu/LinkOrgItem.tsx 5 7 71.43%
src/client/components/dashboard/NarrativeList/ControlMenu/Model.ts 0 2 0.0%
src/client/index.tsx 0 2 0.0%
src/client/utils/UserModel.ts 22 24 91.67%
src/client/components/LegacyHeader.tsx 8 11 72.73%
src/client/components/App.tsx 9 13 69.23%
src/client/components/global_header/AccountDropdown.tsx 2 6 33.33%
src/client/components/Main.tsx 16 21 76.19%
Files with Coverage Reduction New Missed Lines %
src/client/components/dashboard/NarrativeList/ControlMenu/LinkOrgItem.tsx 1 45.98%
src/client/components/dashboard/NarrativeList/ControlMenu/Model.ts 1 7.41%
src/client/components/dashboard/NarrativeList/ControlMenu/sharing/PermSearch.tsx 1 20.0%
src/client/components/dashboard/NarrativeList/Preview.tsx 2 13.79%
src/client/components/global_header/AccountDropdown.tsx 2 11.11%
src/client/components/dashboard/NarrativeList/Filters.tsx 5 62.3%
src/client/components/dashboard/NarrativeList/index.tsx 16 32.14%
Totals Coverage Status
Change from base Build 1233968275: -3.7%
Covered Lines: 781
Relevant Lines: 1333

💛 - Coveralls

@lgtm-com
Copy link

lgtm-com bot commented Sep 22, 2021

This pull request fixes 4 alerts when merging c96ed91 into 33ea5f7 - view on LGTM.com

fixed alerts:

  • 3 for Unused or undefined state property
  • 1 for Unused variable, import, function or class

@lgtm-com
Copy link

lgtm-com bot commented Sep 23, 2021

This pull request introduces 1 alert and fixes 4 when merging 5ef57ab into 33ea5f7 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

fixed alerts:

  • 3 for Unused or undefined state property
  • 1 for Unused variable, import, function or class

@lgtm-com
Copy link

lgtm-com bot commented Sep 23, 2021

This pull request introduces 1 alert and fixes 4 when merging b387835 into 33ea5f7 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

fixed alerts:

  • 3 for Unused or undefined state property
  • 1 for Unused variable, import, function or class

@lgtm-com
Copy link

lgtm-com bot commented Sep 23, 2021

This pull request introduces 1 alert and fixes 4 when merging 169916a into 33ea5f7 - view on LGTM.com

new alerts:

  • 1 for Unused variable, import, function or class

fixed alerts:

  • 3 for Unused or undefined state property
  • 1 for Unused variable, import, function or class

@lgtm-com
Copy link

lgtm-com bot commented Sep 23, 2021

This pull request fixes 4 alerts when merging f98fe26 into 33ea5f7 - view on LGTM.com

fixed alerts:

  • 3 for Unused or undefined state property
  • 1 for Unused variable, import, function or class

@eapearson
Copy link
Contributor Author

I'm going to keep plunking away to restore test coverage, but wanted to get the ball rolling on this rather large PR.

@dakotablair
Copy link
Collaborator

dakotablair commented Oct 5, 2021

I think it would be good to break this PR up as much as possible. I would ask that you isolate the code that fixes the bug in its own PR and make separate PRs for any architectural changes or miscellaneous fixes. From my perspective it is ideal that each PR be for one logical change. I would also note that before this PR the repo has 316 commits, and this PR alone has 61, so in terms of having an understandable history it would be nice if some of these could be squashed. All that being said, if you want to proceed with this PR as is then we will need to set up a time for a call so you can walk me though the changes.

@lgtm-com
Copy link

lgtm-com bot commented Oct 14, 2021

This pull request fixes 4 alerts when merging 077f092 into 33ea5f7 - view on LGTM.com

fixed alerts:

  • 3 for Unused or undefined state property
  • 1 for Unused variable, import, function or class

@eapearson
Copy link
Contributor Author

Paramvir and I met today, and he has approved of the primary user-visible change, the sort dropdown.
Just let me know what we need to do to close this out.
It has been in use on CI and narrative-dev for a month with no issues that I'm aware of.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants