Skip to content

Commit ceff726

Browse files
authored
Turbopack: don't crash the dev server when node_modules/next is briefly unresolvable (vercel#93877)
### What? When `node_modules/next` is briefly unresolvable mid-session (for example because pnpm is mid-install of a package with a `next` peer dependency, reshuffling its symlinks) Turbopack would: - Emit a `MissingNextFolderIssue` with severity `Fatal` - Propagate `anyhow!("Next.js package not found")` through ~10 levels of the build graph - Surface the failure at the napi boundary as a `TurbopackInternalError`, which the Next.js JS-side treats as fatal and shuts the dev server down The reported user impact was a dev session that became wedged and could not recover even after restarting `next dev`, because the persistent cache had stored the failed operation state. Users reported it as "stuck", "catastrophic", and required wiping `.next/` to recover. Unfortunately while i was able to reproduce a number of bad outcomes, i couldn't actually reproduce the issue of `next dev` getting stuck in a bad state. Still this PR will improve a number of apis and error messages See internal discussion: https://vercel.slack.com/archives/C046HAU4H7F/p1778792876550309 ### Why? A transient filesystem race during an install is the canonical recoverable error — once the install finishes, `next` is resolvable again. There is no reason this should bring down the dev server or poison the persistent cache. The bug had two contributing causes: 1. The error message was poor. It assumed the only cause was a mis-set `turbopack.root`, with the entire message in `title()`. It didn't acknowledge the install-race case or any of the other realistic causes (broken symlink, monorepo hoisting, removed `node_modules/next`, global install). Users had no way to know that the right response was to wait or refresh. 2. The Rust→napi boundary did not treat this as a recoverable error. Two specific code paths propagated `Err` instead of catching it: - `issue_filter_from_endpoint` in `endpoint.rs` calls `endpoint_op.connect().await?` — if the upstream endpoint resolution errors (as it does when `directory_tree_to_loader_tree` fails), this `?` propagates **before** the surrounding `strongly_consistent_catch_collectables` gets a chance to convert the failure into Issues. - `Project::hmr_version_state` is called bare from `project_hmr_events` at the napi boundary. There is no `_with_issues_operation` wrapper, so any error from the underlying `hmr_content`/`versioned_content_map.compute_entry` chain propagates straight to JS. ### How? Three changes: **1. Rewrite `MissingNextFolderIssue`** ([next_import_map.rs](crates/next-core/src/next_import_map.rs)) - Severity downgraded from `Fatal` to `Error`. `Fatal` triggers an explicit JS-side server shutdown in `print-build-errors.ts`; `Error` lets the dev server recover. - Title is a single line: `"Could not find the Next.js package (next/package.json)"`. - Description lists all realistic causes (concurrent install, missing/broken symlink, mis-set workspace root, monorepo hoisting, global install). - Doc link moved to `documentation_link()` so it doesn't pollute the inline message. **2. Make `issue_filter_from_endpoint` Err-tolerant** ([endpoint.rs](crates/next-napi-bindings/src/next_api/endpoint.rs)) - Wrap the `endpoint_op.connect().await` in a `match` and fall back to `IssueFilter::warnings_and_foreign_errors()` on `Err`. - This unblocks the existing `strongly_consistent_catch_collectables` recovery path, which then absorbs the upstream failure as Issues that surface through the napi boundary normally. **3. Add `hmr_version_state_with_issues_operation`** ([project.rs](crates/next-napi-bindings/src/next_api/project.rs)) - Mirrors the existing `hmr_update_with_issues_operation` pattern. - Drives `hmr_version_operation` as an `OperationVc`, catches any `Err` and falls back to `NotFoundVersion`, and collects issues from the operation via `peek_issues`. - The napi `project_hmr_events` handler now uses this wrapper instead of calling `Project::hmr_version_state` bare. Issues from both `hmr_version_state` and `hmr_update` are merged before being sent to JS. - Required lifting the inner `hmr_version_operation` out of `Project::hmr_version_state`'s closure to make it callable from the napi layer. A new regression test at [test/development/app-dir/concurrent-install/](test/development/app-dir/concurrent-install/) reproduces the failure by moving `node_modules/next` aside mid-HMR-session and asserts that the dev server does not emit `FATAL: An unexpected Turbopack error occurred` or `TurbopackInternalError`, and that the friendly Issue text surfaces. The test is skipped under `NEXT_SKIP_ISOLATE=1` since that mode doesn't produce a manipulable `node_modules`. Out of scope: after recovery, the dev server still returns 500 on subsequent requests until the user manually refreshes (the JS-side per-route error state isn't invalidated by Turbopack's successful recompile). That's a separate dev-server caching issue. <!-- NEXT_JS_LLM_PR -->
1 parent 55b2d3c commit ceff726

8 files changed

Lines changed: 306 additions & 70 deletions

File tree

crates/next-core/src/next_import_map.rs

Lines changed: 57 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -1195,56 +1195,79 @@ impl Issue for MissingNextFolderIssue {
11951195
}
11961196

11971197
fn severity(&self) -> IssueSeverity {
1198-
IssueSeverity::Fatal
1198+
// In theory this should be fatal (how can we ever recover from next missing when we are
1199+
// next), but we actually might be detecting an ephemeral scenario where 'next' is moving
1200+
// and we can recover.
1201+
IssueSeverity::Error
11991202
}
12001203

12011204
fn stage(&self) -> IssueStage {
12021205
IssueStage::Resolve
12031206
}
12041207

12051208
async fn title(&self) -> Result<StyledString> {
1206-
let system_path = match to_sys_path(self.path.clone()).await? {
1207-
Some(path) => path.to_str().unwrap_or("{unknown}").to_string(),
1208-
_ => "{unknown}".to_string(),
1209+
Ok(StyledString::Text(rcstr!(
1210+
"Could not find the Next.js package (next/package.json)"
1211+
)))
1212+
}
1213+
1214+
async fn description(&self) -> Result<Option<StyledString>> {
1215+
let context_path: RcStr = match to_sys_path(self.path.clone()).await? {
1216+
Some(path) => path.to_str().unwrap_or("{unknown}").into(),
1217+
_ => rcstr!("{unknown}"),
12091218
};
1210-
let root_path = match to_sys_path(self.root.clone()).await? {
1211-
Some(path) => path.to_str().unwrap_or("{unknown}").to_string(),
1212-
_ => "{unknown}".to_string(),
1219+
let root_path: RcStr = match to_sys_path(self.root.clone()).await? {
1220+
Some(path) => path.to_str().unwrap_or("{unknown}").into(),
1221+
_ => rcstr!("{unknown}"),
12131222
};
12141223

1215-
Ok(StyledString::Stack(vec![
1216-
StyledString::Line(vec![
1217-
StyledString::Text(
1218-
"Error: Next.js inferred your workspace root, but it may not be correct."
1219-
.into(),
1220-
),
1221-
]),
1224+
Ok(Some(StyledString::Stack(vec![
12221225
StyledString::Line(vec![
1223-
StyledString::Text("We couldn't find the Next.js package (".into()),
1224-
StyledString::Strong("next/package.json".into()),
1225-
StyledString::Text(") from the project directory: ".into()),
1226-
StyledString::Strong(system_path.into()),
1226+
StyledString::Text(rcstr!("Resolved from: ")),
1227+
StyledString::Strong(context_path),
12271228
]),
12281229
StyledString::Line(vec![
1229-
StyledString::Text("Filesystem root used for resolution: ".into()),
1230-
StyledString::Strong(root_path.into()),
1230+
StyledString::Text(rcstr!("Filesystem root used for resolution: ")),
1231+
StyledString::Strong(root_path),
12311232
]),
1233+
StyledString::Line(vec![StyledString::Text(rcstr!(""))]),
1234+
StyledString::Line(vec![StyledString::Text(rcstr!("Possible causes:"))]),
1235+
StyledString::Line(vec![StyledString::Text(rcstr!(
1236+
" - node_modules is being reorganized by a concurrent install (e.g. pnpm adding \
1237+
a package with a `next` peer dependency). This is transient and should clear \
1238+
once the install completes."
1239+
))]),
1240+
StyledString::Line(vec![StyledString::Text(rcstr!(
1241+
" - node_modules/next was removed, renamed, or has a broken symlink."
1242+
))]),
12321243
StyledString::Line(vec![
1233-
StyledString::Text("To fix this, set ".into()),
1234-
StyledString::Code("turbopack.root".into()),
1235-
StyledString::Text(
1236-
" in your Next.js config, or ensure the Next.js package is resolvable from the project directory.".into(),
1237-
),
1238-
]),
1239-
StyledString::Line(vec![
1240-
StyledString::Text("Note: For security and performance reasons, files outside of the project directory will not be compiled.".into()),
1241-
]),
1242-
StyledString::Line(vec![
1243-
StyledString::Text("See ".into()),
1244-
StyledString::Strong("https://nextjs.org/docs/app/api-reference/config/next-config-js/turbopack#root-directory".into()),
1245-
StyledString::Text(" for more information.".into()),
1244+
StyledString::Text(rcstr!(" - The workspace root is incorrect — see ")),
1245+
StyledString::Code(rcstr!("turbopack.root")),
1246+
StyledString::Text(rcstr!(
1247+
" in the Next.js config docs for how to configure it."
1248+
)),
12461249
]),
1247-
]))
1250+
StyledString::Line(vec![StyledString::Text(rcstr!(
1251+
" - In a monorepo, the Next.js package may only exist in a directory above the \
1252+
closest directory containing a package manager lockfile. The workspace root is \
1253+
detected by locating the nearest package manager lockfile."
1254+
))]),
1255+
StyledString::Line(vec![StyledString::Text(rcstr!(
1256+
" - Next.js is installed globally rather than as a project dependency. This is \
1257+
not supported; install it locally."
1258+
))]),
1259+
StyledString::Line(vec![StyledString::Text(rcstr!(""))]),
1260+
StyledString::Line(vec![StyledString::Text(rcstr!(
1261+
"Note: To ensure a hermetic build and a portable cache, files outside of the \
1262+
workspace root are not compiled."
1263+
))]),
1264+
])))
1265+
}
1266+
1267+
fn documentation_link(&self) -> RcStr {
1268+
rcstr!(
1269+
"https://nextjs.org/docs/app/api-reference/config/next-config-js/turbopack#root-directory"
1270+
)
12481271
}
12491272
}
12501273

crates/next-napi-bindings/src/next_api/endpoint.rs

Lines changed: 37 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -101,14 +101,22 @@ impl Deref for ExternalEndpoint {
101101

102102
/// Build an `IssueFilter` by reading the project from the endpoint's
103103
/// `OperationVc<OptionEndpoint>` and extracting ignore rules from its config.
104-
async fn issue_filter_from_endpoint(
105-
endpoint_op: OperationVc<OptionEndpoint>,
106-
) -> Result<Vc<IssueFilter>> {
107-
let endpoint_option = endpoint_op.connect().await?;
108-
if let Some(ep) = &*endpoint_option {
109-
Ok(ep.project().issue_filter())
110-
} else {
111-
Ok(IssueFilter::warnings_and_foreign_errors().cell())
104+
///
105+
/// If the upstream endpoint operation fails to resolve (e.g. because the build
106+
/// graph cannot be evaluated transiently — for example during a mid-session
107+
/// `node_modules` reshuffle), this falls back to a default filter rather than
108+
/// propagating the error. In this scenario we believe the caller will already be observing the
109+
/// same error
110+
async fn issue_filter_from_endpoint(endpoint_op: OperationVc<OptionEndpoint>) -> Vc<IssueFilter> {
111+
match endpoint_op.connect().await {
112+
Ok(endpoint_option) => {
113+
if let Some(ep) = &*endpoint_option {
114+
ep.project().issue_filter()
115+
} else {
116+
IssueFilter::warnings_and_foreign_errors().cell()
117+
}
118+
}
119+
Err(_) => IssueFilter::warnings_and_foreign_errors().cell(),
112120
}
113121
}
114122

@@ -124,7 +132,7 @@ async fn get_written_endpoint_with_issues_operation(
124132
endpoint_op: OperationVc<OptionEndpoint>,
125133
) -> Result<Vc<WrittenEndpointWithIssues>> {
126134
let write_to_disk_op = endpoint_write_to_disk_operation(endpoint_op);
127-
let filter = issue_filter_from_endpoint(endpoint_op).await?;
135+
let filter = issue_filter_from_endpoint(endpoint_op).await;
128136
let (written, issues, effects) =
129137
strongly_consistent_catch_collectables(write_to_disk_op, filter).await?;
130138
Ok(WrittenEndpointWithIssues {
@@ -229,25 +237,24 @@ async fn subscribe_issues_and_diags_operation(
229237
) -> Result<Vc<EndpointIssuesAndDiags>> {
230238
let changed_op = endpoint_server_changed_operation(endpoint_op);
231239

232-
if should_include_issues {
233-
let filter = issue_filter_from_endpoint(endpoint_op).await?;
234-
let (changed_value, issues, effects) =
235-
strongly_consistent_catch_collectables(changed_op, filter).await?;
236-
Ok(EndpointIssuesAndDiags {
237-
changed: changed_value,
238-
issues,
239-
effects,
240-
}
241-
.cell())
242-
} else {
243-
let changed_value = changed_op.read_strongly_consistent().await?;
244-
Ok(EndpointIssuesAndDiags {
245-
changed: Some(changed_value),
246-
issues: Arc::new(vec![]),
247-
effects: Arc::new(Effects::default()),
248-
}
249-
.cell())
240+
// Use catch-collectables in both branches so transient build-graph errors
241+
// (e.g. missing `node_modules/next` during a concurrent install) surface as
242+
// Issues rather than killing the subscription with a `TurbopackInternalError`.
243+
// When `should_include_issues` is false the caller doesn't need the Issue
244+
// payload, but we still need the catch path to avoid the FATAL.
245+
let filter = issue_filter_from_endpoint(endpoint_op).await;
246+
let (changed_value, issues, effects) =
247+
strongly_consistent_catch_collectables(changed_op, filter).await?;
248+
Ok(EndpointIssuesAndDiags {
249+
changed: changed_value,
250+
issues: if should_include_issues {
251+
issues
252+
} else {
253+
Arc::new(vec![])
254+
},
255+
effects,
250256
}
257+
.cell())
251258
}
252259

253260
#[tracing::instrument(level = "info", name = "get client-side endpoint changes", skip_all)]
@@ -265,11 +272,13 @@ pub fn endpoint_client_changed_subscribe(
265272
async move {
266273
let changed_op = endpoint_client_changed_operation(endpoint_op);
267274
// We don't capture issues and diagnostics here since we don't want to be
268-
// notified when they change
275+
// notified when they change. We also want errors to propagate so we don't use
276+
// strongly_consistent_catch_collectibles either.
269277
//
270278
// This must be a *read*, not just a resolve, because we need the root task created
271279
// by `subscribe` to re-run when the `Completion`'s value changes (via equality),
272280
// even if the cell id doesn't change.
281+
//
273282
let _ = changed_op.read_strongly_consistent().await?;
274283
Ok(())
275284
}

crates/next-napi-bindings/src/next_api/project.rs

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1821,6 +1821,10 @@ async fn hmr_update_with_issues_operation(
18211821
target: HmrTarget,
18221822
) -> Result<Vc<HmrUpdateWithIssues>> {
18231823
let update_op = project_hmr_update_operation(project, chunk_name, target, state);
1824+
// NOTE: we do not use `strongly_consistent_catch_collectables` here. The JS HMR
1825+
// consumers in `hot-reloader-turbopack.ts` (`subscribeToServerHmr` and
1826+
// `subscribeToClientHmrEvents`) rely on this read *throwing* on build-graph
1827+
// failures to trigger their recovery paths
18241828
let update = update_op.read_strongly_consistent().await?;
18251829
let filter = project.issue_filter();
18261830
let issues = get_issues(update_op, filter).await?;
@@ -1957,6 +1961,12 @@ async fn get_hmr_chunk_names_with_issues_operation(
19571961
target: HmrTarget,
19581962
) -> Result<Vc<HmrChunkNamesWithIssues>> {
19591963
let hmr_chunk_names_op = project_hmr_chunk_names_operation(container, target);
1964+
// Do NOT switch this to `strongly_consistent_catch_collectables`. The JS HMR
1965+
// chunk-names consumer in `hot-reloader-turbopack.ts` relies on this read
1966+
// *throwing* on build-graph failures so its outer `try` block exits the
1967+
// subscription loop. Swallowing the error and emitting an empty chunk-name
1968+
// list keeps the loop running but with stale state, and obscures the real
1969+
// failure from the dev server log.
19601970
let hmr_chunk_names = hmr_chunk_names_op.read_strongly_consistent().await?;
19611971
let filter = issue_filter_from_container(container);
19621972
let issues = get_issues(hmr_chunk_names_op, filter).await?;

packages/next/src/server/dev/turbopack-utils.ts

Lines changed: 8 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ export async function handleRouteType({
260260
// otherwise we don't known when to unsubscribe and this leaking
261261
hooks?.subscribeToChanges(
262262
serverKey,
263-
false,
263+
/** includeIssues=*/ false,
264264
route.dataEndpoint,
265265
() => {
266266
// Report the next compilation again
@@ -279,7 +279,7 @@ export async function handleRouteType({
279279
)
280280
hooks?.subscribeToChanges(
281281
clientKey,
282-
false,
282+
/** includeIssues=*/ false,
283283
route.htmlEndpoint,
284284
() => {
285285
return {
@@ -296,7 +296,7 @@ export async function handleRouteType({
296296
if (entrypoints.global.document) {
297297
hooks?.subscribeToChanges(
298298
getEntryKey('pages', 'server', '_document'),
299-
false,
299+
/** includeIssues=*/ false,
300300
entrypoints.global.document,
301301
() => {
302302
return {
@@ -354,7 +354,7 @@ export async function handleRouteType({
354354
// otherwise we don't known when to unsubscribe and this leaking
355355
hooks?.subscribeToChanges(
356356
key,
357-
true,
357+
/** includeIssues=*/ true,
358358
route.rscEndpoint,
359359
(change, hash) => {
360360
if (change.issues.some((issue) => issue.severity === 'error')) {
@@ -739,7 +739,7 @@ export async function handleEntrypoints({
739739
if (dev) {
740740
dev?.hooks.subscribeToChanges(
741741
key,
742-
false,
742+
/** includeIssues=*/ false,
743743
endpoint,
744744
async () => {
745745
const finishBuilding = dev.hooks.startBuilding(
@@ -876,7 +876,7 @@ export async function handlePagesErrorRoute({
876876
hooks.handleWrittenEndpoint(key, writtenEndpoint, false)
877877
hooks.subscribeToChanges(
878878
key,
879-
false,
879+
/** includeIssues=*/ false,
880880
entrypoints.global.app,
881881
() => {
882882
// There's a special case for this in `../client/page-bootstrap.ts`.
@@ -905,7 +905,7 @@ export async function handlePagesErrorRoute({
905905
hooks.handleWrittenEndpoint(key, writtenEndpoint, false)
906906
hooks.subscribeToChanges(
907907
key,
908-
false,
908+
/** includeIssues=*/ false,
909909
entrypoints.global.document,
910910
() => {
911911
return {
@@ -931,7 +931,7 @@ export async function handlePagesErrorRoute({
931931
hooks.handleWrittenEndpoint(key, writtenEndpoint, false)
932932
hooks.subscribeToChanges(
933933
key,
934-
false,
934+
/** includeIssues=*/ false,
935935
entrypoints.global.error,
936936
() => {
937937
// There's a special case for this in `../client/page-bootstrap.ts`.
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
export default function LateRoute() {
2+
return <p>late route</p>
3+
}
Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,8 @@
1+
import { ReactNode } from 'react'
2+
export default function Root({ children }: { children: ReactNode }) {
3+
return (
4+
<html>
5+
<body>{children}</body>
6+
</html>
7+
)
8+
}
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
export default function Page() {
2+
return <p>hello world</p>
3+
}

0 commit comments

Comments
 (0)