Skip to content

[Clang] [Diagnostics] Simplify filenames that contain '..' #143520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

Sirraide
Copy link
Member

@Sirraide Sirraide commented Jun 10, 2025

This can significantly shorten file paths to standard library headers, e.g. on my system, <ranges> is currently printed as

/usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/ranges

but with this change, we instead print

/usr/include/c++/15/ranges

This is of course just a heuristic, so there will definitely be paths that get longer as a result of this, but it helps with the standard library, and a lot of the diagnostics we print tend to originate in standard library headers (especially if you include notes listing overload candidates etc.). Update: We now always use whichever path ends up being shorter.

@AaronBallman pointed out that this might be problematic for network file systems since path resolution might take a while, so this is enabled only for paths that are part of a local filesystem.

The file names are cached in TextDiagnostic. While we could move it up into e.g. TextDiagnosticPrinter, DiagnosticsEngine, or maybe even the FileManager, to me this seems like something that we mainly care about when printing to the terminal (other diagnostics consumers probably don’t mind receiving the original file path). Moreover, this is already where we handle -fdiagnostics-absolute-paths or whatever that flag is called again.

@Sirraide Sirraide requested a review from AaronBallman June 10, 2025 12:43
@Sirraide Sirraide added the clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer label Jun 10, 2025
@llvmbot llvmbot added the clang Clang issues not falling into any other category label Jun 10, 2025
@llvmbot
Copy link
Member

llvmbot commented Jun 10, 2025

@llvm/pr-subscribers-clang

Author: None (Sirraide)

Changes

This can significantly shorten file paths to standard library headers, e.g. on my system, &lt;ranges&gt; is currently printed as

/usr/lib/gcc/x86_64-redhat-linux/15/../../../../include/c++/15/ranges

but with this change, we instead print

/usr/include/c++/15/ranges

This is of course just a heuristic, so there will definitely be paths that get longer as a result of this, but it helps with the standard library, and a lot of the diagnostics we print tend to originate in standard library headers (especially if you include notes listing overload candidates etc.).

@AaronBallman pointed out that this might be problematic for network file systems since path resolution might take a while, so this is enabled only for paths that are part of a local filesystem.

The file names are cached in TextDiagnostic. While we could move it up into e.g. TextDiagnosticPrinter, DiagnosticsEngine, or maybe even the FileManager, to me this seems like something that we mainly care about when printing to the terminal (other diagnostics consumers probably don’t mind receiving the original file path). Moreover, this is already where we handle -fdiagnostics-absolute-paths or whatever that flag is called again.


Full diff: https://github.com/llvm/llvm-project/pull/143520.diff

2 Files Affected:

  • (modified) clang/include/clang/Frontend/TextDiagnostic.h (+1)
  • (modified) clang/lib/Frontend/TextDiagnostic.cpp (+20-12)
diff --git a/clang/include/clang/Frontend/TextDiagnostic.h b/clang/include/clang/Frontend/TextDiagnostic.h
index e2e88d4d648a2..9c77bc3e00e19 100644
--- a/clang/include/clang/Frontend/TextDiagnostic.h
+++ b/clang/include/clang/Frontend/TextDiagnostic.h
@@ -35,6 +35,7 @@ namespace clang {
 class TextDiagnostic : public DiagnosticRenderer {
   raw_ostream &OS;
   const Preprocessor *PP;
+  llvm::StringMap<SmallString<128>> SimplifiedFileNameCache;
 
 public:
   TextDiagnostic(raw_ostream &OS, const LangOptions &LangOpts,
diff --git a/clang/lib/Frontend/TextDiagnostic.cpp b/clang/lib/Frontend/TextDiagnostic.cpp
index b9e681b52e509..edbad42b39950 100644
--- a/clang/lib/Frontend/TextDiagnostic.cpp
+++ b/clang/lib/Frontend/TextDiagnostic.cpp
@@ -738,12 +738,20 @@ void TextDiagnostic::printDiagnosticMessage(raw_ostream &OS,
 }
 
 void TextDiagnostic::emitFilename(StringRef Filename, const SourceManager &SM) {
-#ifdef _WIN32
-  SmallString<4096> TmpFilename;
-#endif
-  if (DiagOpts.AbsolutePath) {
-    auto File = SM.getFileManager().getOptionalFileRef(Filename);
-    if (File) {
+  auto File = SM.getFileManager().getOptionalFileRef(Filename);
+
+  // Try to simplify paths that contain '..' in any case since paths to
+  // standard library headers especially tend to get quite long otherwise.
+  // Only do that for local filesystems though to avoid slowing down
+  // compilation too much.
+  auto AlwaysSimplify = [&] {
+    return File->getName().contains("..") &&
+           llvm::sys::fs::is_local(File->getName());
+  };
+
+  if (File && (DiagOpts.AbsolutePath || AlwaysSimplify())) {
+    SmallString<128> &CacheEntry = SimplifiedFileNameCache[Filename];
+    if (CacheEntry.empty()) {
       // We want to print a simplified absolute path, i. e. without "dots".
       //
       // The hardest part here are the paths like "<part1>/<link>/../<part2>".
@@ -759,15 +767,15 @@ void TextDiagnostic::emitFilename(StringRef Filename, const SourceManager &SM) {
       // on Windows we can just use llvm::sys::path::remove_dots(), because,
       // on that system, both aforementioned paths point to the same place.
 #ifdef _WIN32
-      TmpFilename = File->getName();
-      llvm::sys::fs::make_absolute(TmpFilename);
-      llvm::sys::path::native(TmpFilename);
-      llvm::sys::path::remove_dots(TmpFilename, /* remove_dot_dot */ true);
-      Filename = StringRef(TmpFilename.data(), TmpFilename.size());
+      CacheEntry = File->getName();
+      llvm::sys::fs::make_absolute(CacheEntry);
+      llvm::sys::path::native(CacheEntry);
+      llvm::sys::path::remove_dots(CacheEntry, /* remove_dot_dot */ true);
 #else
-      Filename = SM.getFileManager().getCanonicalName(*File);
+      CacheEntry = SM.getFileManager().getCanonicalName(*File);
 #endif
     }
+    Filename = CacheEntry;
   }
 
   OS << Filename;

@Sirraide
Copy link
Member Author

This is of course just a heuristic, so there will definitely be paths that get longer as a result of this

Actually, it just occurred to me that we could just cache whichever path ends up being shorter, the original one or the resolved one.

@Sirraide
Copy link
Member Author

I’m not exactly sure how to test this change since this is not only platform-dependent but also path-dependent since we may end up producing absolute paths here.

@AaronBallman
Copy link
Collaborator

The file names are cached in TextDiagnostic. While we could move it up into e.g. TextDiagnosticPrinter, DiagnosticsEngine, or maybe even the FileManager, to me this seems like something that we mainly care about when printing to the terminal (other diagnostics consumers probably don’t mind receiving the original file path). Moreover, this is already where we handle -fdiagnostics-absolute-paths or whatever that flag is called again.

The downside to it being in TextDiagnostic is that consumers then all have to normalize the path themselves (some file system APIs on some systems are better about relative paths than others). If the paths are always equivalent, it might be kinder to pass the resolved path. WDYT?

@AaronBallman
Copy link
Collaborator

I’m not exactly sure how to test this change since this is not only platform-dependent but also path-dependent since we may end up producing absolute paths here.

I think this is a case where maybe we want to use unit tests. We have clang/unittests/Basic/DiagnosticTest.cpp or FileManagerTest.cpp already, so perhaps in one of those (depending on where the functionality ends up living)?

Comment on lines 747 to 750
auto AlwaysSimplify = [&] {
return File->getName().contains("..") &&
llvm::sys::fs::is_local(File->getName());
};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that is_local() isn't going to cause a performance issue by itself? I notice the Windows implementation looks like it could be expensive:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; I’ll look into what MSDN has to say about those functions.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we can come up with a good stress test, I might be able to try it out for you on my Windows machine and report back timings for my setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might be able to generate something real quick.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@AaronBallman

@echo off

mkdir stress-test
cd stress-test

mkdir nested
for /l %%i in (1, 1, 1000) do (
    echo int _%%i = ""; > "%%i.h"
    echo #include "../%%i.h" >> "nested\stress-test.cpp"
)

Then just cd into stress-test\nested and compile stress-test.cpp (only tested with wine but it should work).

Copy link
Member Author

@Sirraide Sirraide Jun 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Make sure to compile with -ferror-limit=0 if you’re using my branch since this is a bit pointless otherwise ;Þ)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without your patch, the average is consistently just shy of 7 seconds
With your patch, the average is kind of hard to trust; I've seen it as low as 5 seconds and as high as 9 seconds.

Overall, I think this means I'm either not testing what I think I'm testing or the performance changes aren't significant. If I was consistently getting 9 second times, I think that's demonstrate something tangible.

WDYT?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, on my system the difference was about 20ms (180ms vs 160ms), so I think there may be a performance difference, but it’s really not that bad. Also, this stress test is resolving 1000 different paths which is... hopefully a lot more than any user would ever see in a single compilation.

@Sirraide
Copy link
Member Author

The downside to it being in TextDiagnostic is that consumers then all have to normalize the path themselves (some file system APIs on some systems are better about relative paths than others). If the paths are always equivalent, it might be kinder to pass the resolved path. WDYT?

I mean, that’s also true I suppose; the only thing is then that we’d be normalising them twice if -fdiagnostics-absolute-paths is passed—unless we move the handling for that elsewhere as well, but now that’s dependent on the diagnostic options, so it probably shouldn’t be in FileManager—which leaves DiagnosticsEngine? But consumers don’t generally have access to the DiagnosticsEngine, so it’d have to be in the FileManager after all.

I guess we could always compute both the absolute and the ‘short’ path for a file whenever FileManager opens one so that they’re always available. But that might have some impact on performance (though I guess this is a perf/ branch already so we can try and see how it goes)?

@AaronBallman Thoughts?

@AaronBallman
Copy link
Collaborator

The downside to it being in TextDiagnostic is that consumers then all have to normalize the path themselves (some file system APIs on some systems are better about relative paths than others). If the paths are always equivalent, it might be kinder to pass the resolved path. WDYT?

I mean, that’s also true I suppose; the only thing is then that we’d be normalising them twice if -fdiagnostics-absolute-paths is passed—unless we move the handling for that elsewhere as well, but now that’s dependent on the diagnostic options, so it probably shouldn’t be in FileManager—which leaves DiagnosticsEngine? But consumers don’t generally have access to the DiagnosticsEngine, so it’d have to be in the FileManager after all.

We definitely don't want to normalize twice. Could we parameterize FileManager so we don't have to have it directly depend on diagnostic options?

I guess we could always compute both the absolute and the ‘short’ path for a file whenever FileManager opens one so that they’re always available. But that might have some impact on performance (though I guess this is a perf/ branch already so we can try and see how it goes)?

I think we could try it to see how it goes in terms of performance. Again, I think I'd be most worried about network builds -- I would expect a measurable different in performance even if there are no diagnostics issued just because we need the file information for SourceManager.

@Sirraide
Copy link
Member Author

We definitely don't want to normalize twice. Could we parameterize FileManager so we don't have to have it directly depend on diagnostic options?

One idea I just had is we could do something like:

enum class DiagnosticFileNameMode {
  Unmodified, // As specified by the user
  Canonical,  // Absolute path
  Short,      // Whichever is shorter
}

class FileManager {
  // ...
  StringRef getFileNameForDiagnostic(DiagnosticFileNameMode Mode);
};

And then have separate caches in FileManager for each kind of DiagnosticsFileNameMode and compute the corresponding file name lazily the first time it’s requested.

@AaronBallman
Copy link
Collaborator

We definitely don't want to normalize twice. Could we parameterize FileManager so we don't have to have it directly depend on diagnostic options?

One idea I just had is we could do something like:

enum class DiagnosticFileNameMode {
  Unmodified, // As specified by the user
  Canonical,  // Absolute path
  Short,      // Whichever is shorter
}

class FileManager {
  // ...
  StringRef getFileNameForDiagnostic(DiagnosticFileNameMode Mode);
};

And then have separate caches in FileManager for each kind of DiagnosticsFileNameMode and compute the corresponding file name lazily the first time it’s requested.

I think that approach makes sense! Thought "short path" means something different to those of us old enough to remember DOS 8.3 filenames. :-D

@Sirraide
Copy link
Member Author

I think that approach makes sense! Thought "short path" means something different to those of us old enough to remember DOS 8.3 filenames. :-D

Ha, those I’m not planning to add support for thankfully...

@Sirraide
Copy link
Member Author

Actually, we could also just put it in SourceManager because that already has a reference to the DiagnosticsEngine and then a single getNameForDiagnostic(StringRef Filename) function would do.

@llvmbot llvmbot added the clang:frontend Language frontend issues, e.g. anything involving "Sema" label Jun 10, 2025
@Sirraide
Copy link
Member Author

Actually, we could also just put it in SourceManager because that already has a reference to the DiagnosticsEngine and then a single getNameForDiagnostic(StringRef Filename) function would do.

I’ve done that. Also, SARIFDiagnostic::emitFilename() was just a copy-pasted version of TextDiagnostic::emitFilename(), so I’ve updated it to use the new function as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
clang:diagnostics New/improved warning or error message in Clang, but not in clang-tidy or static analyzer clang:frontend Language frontend issues, e.g. anything involving "Sema" clang Clang issues not falling into any other category
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants