Skip to content

[clang] Improve nested name specifier AST representation #147835

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 9, 2025

Conversation

mizvekov
Copy link
Contributor

@mizvekov mizvekov commented Jul 9, 2025

This is a major change on how we represent nested name qualifications in the AST.

  • The nested name specifier itself and how it's stored is changed. The prefixes for types are handled within the type hierarchy, which makes canonicalization for them super cheap, no memory allocation required. Also translating a type into nested name specifier form becomes a no-op. An identifier is stored as a DependentNameType. The nested name specifier gains a lightweight handle class, to be used instead of passing around pointers, which is similar to what is implemented for TemplateName. There is still one free bit available, and this handle can be used within a PointerUnion and PointerIntPair, which should keep bit-packing aficionados happy.
  • The ElaboratedType node is removed, all type nodes in which it could previously apply to can now store the elaborated keyword and name qualifier, tail allocating when present.
  • TagTypes can now point to the exact declaration found when producing these, as opposed to the previous situation of there only existing one TagType per entity. This increases the amount of type sugar retained, and can have several applications, for example in tracking module ownership, and other tools which care about source file origins, such as IWYU. These TagTypes are lazily allocated, in order to limit the increase in AST size.

This patch offers a great performance benefit.

It greatly improves compilation time for stdexec. For one datapoint, for test_on2.cpp in that project, which is the slowest compiling test, this patch improves -c compilation time by about 7.2%, with the -fsyntax-only improvement being at ~12%.

This has great results on compile-time-tracker as well:
image

This patch also further enables other optimziations in the future, and will reduce the performance impact of template specialization resugaring when that lands.

It has some other miscelaneous drive-by fixes.

About the review: Yes the patch is huge, sorry about that. Part of the reason is that I started by the nested name specifier part, before the ElaboratedType part, but that had a huge performance downside, as ElaboratedType is a big performance hog. I didn't have the steam to go back and change the patch after the fact.

There is also a lot of internal API changes, and it made sense to remove ElaboratedType in one go, versus removing it from one type at a time, as that would present much more churn to the users. Also, the nested name specifier having a different API avoids missing changes related to how prefixes work now, which could make existing code compile but not work.

How to review: The important changes are all in clang/include/clang/AST and clang/lib/AST, with also important changes in clang/lib/Sema/TreeTransform.h.

The rest and bulk of the changes are mostly consequences of the changes in API.

PS: TagType::getDecl is renamed to getOriginalDecl in this patch, just for easier to rebasing. I plan to rename it back after this lands.

Fixes #136624
Fixes #43179
Fixes #68670
Fixes #92757

@rupprecht
Copy link
Collaborator

Filed #154270 w/ another crash, which looks different from #153933 / #153996.

@AaronBallman
Copy link
Collaborator

The comments in this area are confusing me, FWIW:

  /// Stores the TagDecl associated with this type. The decl may point to any
  /// TagDecl that declares the entity.
  TagDecl *decl;

  ...

  TagDecl *getOriginalDecl() const { return decl; }

The function is called "get original decl" which implies you get the first declaration seen in the TU, but the data member it returns has a comment saying it may point to any declaration in the redeclaration chain. One of these is wrong, correct?

Yeah, that is describing what is valid from the point of view of the AST Node.

As a user of TagType, you can certainly create one which points to any declaration of an entity, and all of these nodes which point to a declaration of the same entity are the same type.

From the point of view of Sema, there are further rules on how these types are created, in normal day-to-day source code parsing, the declaration pointed to by a non-canonical TagType will be the one found by lookup at that point in the program.

FWIW the name getOriginalDecl was picked to temporarily disambiguate from the behavior of the original getDecl that existed before the patch.

The difference in behavior is such that getDecl would always return the definition if that existed, otherwise it would return the very first declaration ever found by typename lookup when parsing a program, as there only existed one TagType per entity.

The problem with keeping the name is that the behavior change meant that whenever I would rebase the patch, new users of getDecl would have popped up and it would be hard to make sure all uses of it were correct. By changing the name, I get a compilation error which would allow me to inspect and make the necessary changes.

Yeah, I think the plan is a reasonable one.

As I stated before, once this patch is settled and everyone has had a nice window to rebase their upstream, my plan is to submit another patch renaming getOriginalDecl back to getDecl.

Okay, if this is just a temporary oddity, that's fine. My big concern is that "original" implies "first" and that doesn't match the comment on what's returned. Because temporary measures have a tendency to ossify sometimes, it might make sense to add some more comments to clarify the situation. WDYT?

@carlosgalvezp
Copy link
Contributor

FYI bisecting #153770 leads to this patch

@mizvekov
Copy link
Contributor Author

Okay, if this is just a temporary oddity, that's fine. My big concern is that "original" implies "first" and that doesn't match the comment on what's returned. Because temporary measures have a tendency to ossify sometimes, it might make sense to add some more comments to clarify the situation. WDYT?

Yeah sure, it would have been a good idea to add a FIXME explaining this, I will do.

mizvekov added a commit that referenced this pull request Aug 19, 2025
…#153996)

In C++, it can be assumed the same linkage will be computed for all
redeclarations of an entity, and we have assertions to check this.

However, the linkage for a declaration can be requested in the middle of
deserealization, and at this point the redecl chain is not well formed,
as computation of the most recent declaration is deferred.

This patch makes that assertion work even in such conditions.

This fixes a regression introduced in
#147835, which was never
released, so there are no release notes for this.

Fixes #153933
llvm-sync bot pushed a commit to arm/arm-toolchain that referenced this pull request Aug 19, 2025
…consistency (#153996)

In C++, it can be assumed the same linkage will be computed for all
redeclarations of an entity, and we have assertions to check this.

However, the linkage for a declaration can be requested in the middle of
deserealization, and at this point the redecl chain is not well formed,
as computation of the most recent declaration is deferred.

This patch makes that assertion work even in such conditions.

This fixes a regression introduced in
llvm/llvm-project#147835, which was never
released, so there are no release notes for this.

Fixes #153933
@mizvekov
Copy link
Contributor Author

Okay, if this is just a temporary oddity, that's fine. My big concern is that "original" implies "first" and that doesn't match the comment on what's returned. Because temporary measures have a tendency to ossify sometimes, it might make sense to add some more comments to clarify the situation. WDYT?

After this commit, the comment in clang/include/clang/AST/Decl.h for clang::TypeDecl::getTypeForDecl,
isn't accurate any more, there's no more ASTContext::getRecordType.

These are fixed by: #154395

mizvekov added a commit that referenced this pull request Aug 19, 2025
A SubstTemplateTypeParmPackType is one of the types which may appear
in a NestedNameSpecifier, so add it to the list of expected types in TreeTransform.

This fixes a regression introduced in #147835, which has never been released, so
there are no release notes.

Fixes #154270
mizvekov added a commit that referenced this pull request Aug 19, 2025
A SubstTemplateTypeParmPackType is one of the types which may appear
in a NestedNameSpecifier, so add it to the list of expected types in TreeTransform.

This fixes a regression introduced in #147835, which has never been released, so
there are no release notes.

Fixes #154270
mizvekov added a commit that referenced this pull request Aug 19, 2025
Makes sure UnconventionalAssignOperatorCheck checks if the types reference the
same entity, not the exact declaration.

This adds a new matcher to support this check.

This fixes a regression introduced by #147835. Since this regression was never
released, there are no release notes.

Fixes #153770
mizvekov added a commit that referenced this pull request Aug 19, 2025
…54430)

Makes sure UnconventionalAssignOperatorCheck checks if the types
reference the same entity, not the exact declaration.

This adds a new matcher to support this check.

This fixes a regression introduced by #147835. Since this regression was
never released, there are no release notes.

Fixes #153770
mizvekov added a commit that referenced this pull request Aug 20, 2025
When building the base type for constructor initializer, the case of an
UnresolvedUsingType was not being handled.

For the non-dependent case, we are also skipping adding the UsingType, but
this is just missing information in the AST. A FIXME for this is added.

This fixes a regression introduced in #147835, which was never released, so
there are no release notes.

Fixes #154436
mizvekov added a commit that referenced this pull request Aug 20, 2025
When building the base type for constructor initializer, the case of an
UnresolvedUsingType was not being handled.

For the non-dependent case, we are also skipping adding the UsingType,
but this is just missing information in the AST. A FIXME for this is
added.

This fixes a regression introduced in #147835, which was never released,
so there are no release notes.

Fixes #154436
jrguzman-ms pushed a commit to msft-mirror-aosp/platform.external.libchrome that referenced this pull request Aug 20, 2025
llvm/llvm-project#147835 makes minor diagnostic changes.

Accept the previous and new diagnostic messages.

Bug: 437910658
Change-Id: Iec24f4e20944465fe12a87539822c095bb88979f
Reviewed-on: https://chromium-review.googlesource.com/c/chromium/src/+/6842043
Reviewed-by: Nico Weber <[email protected]>
Commit-Queue: Arthur Eubanks <[email protected]>
Cr-Commit-Position: refs/heads/main@{#1500458}


CrOS-Libchrome-Original-Commit: 83aa2e61488debc4572256e81768123605224687
@rupprecht
Copy link
Collaborator

I'm trying to reduce a crash that only reproduces w/ in module builds. Does this sound familiar?

assertion failed at clang/lib/AST/RecordLayoutBuilder.cpp:3380 in const ASTRecordLayout &clang::ASTContext::getASTRecordLayout(const RecordDecl *) const: D && "Cannot get layout of forward declarations!"
*** Check failure stack trace: ***
    @     0x557b3889784f  clang::ASTContext::getASTRecordLayout()
    @     0x557b3823cd68  clang::ASTContext::getTypeInfoImpl()
    @     0x557b3823e25c  clang::ASTContext::getTypeInfo()
    @     0x557b3823d855  clang::ASTContext::getPreferredTypeAlign()
    @     0x557b3823bf01  clang::ASTContext::getDeclAlign()
    @     0x557b366c6a06  clang::CodeGen::CodeGenFunction::EmitAutoVarAlloca()
    @     0x557b366c165c  clang::CodeGen::CodeGenFunction::EmitVarDecl()
    @     0x557b366c1120  clang::CodeGen::CodeGenFunction::EmitDecl()
    @     0x557b36760580  clang::CodeGen::CodeGenFunction::EmitDeclStmt()
    @     0x557b36754d50  clang::CodeGen::CodeGenFunction::EmitSimpleStmt()
    @     0x557b3675403e  clang::CodeGen::CodeGenFunction::EmitStmt()
    @     0x557b36761891  clang::CodeGen::CodeGenFunction::EmitCompoundStmtWithoutScope()
    @     0x557b368c47d5  clang::CodeGen::CodeGenFunction::GenerateCode()
    @     0x557b368ef9f6  clang::CodeGen::CodeGenModule::EmitGlobalFunctionDefinition()
    @     0x557b368e6bc8  clang::CodeGen::CodeGenModule::EmitGlobalDefinition()
    @     0x557b368d7786  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x557b368d77a2  clang::CodeGen::CodeGenModule::EmitDeferred()
...
    @     0x557b368d77a2  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x557b368d77a2  clang::CodeGen::CodeGenModule::EmitDeferred()
    @     0x557b368d41ae  clang::CodeGen::CodeGenModule::Release()
    @     0x557b36a1e14e  (anonymous namespace)::CodeGeneratorImpl::HandleTranslationUnit()

@mizvekov
Copy link
Contributor Author

mizvekov commented Aug 21, 2025

I'm trying to reduce a crash that only reproduces w/ in module builds. Does this sound familiar?

Haven't seen this one yet. That may have something to do with demoted definitions, a repro would be helpful.

@rupprecht
Copy link
Collaborator

I'm trying to reduce a crash that only reproduces w/ in module builds. Does this sound familiar?

Haven't seen this one yet. That may have something to do with demoted definitions, a repro would be helpful.

I'm trying to reduce it now, but because it's a modules issue/involves multiple files, it's much harder. I created #154840 to track this.

If you have any theories you want me to try out, e.g. places to add an assert or calls to dump(), I can patch it in and see what happens. But that's obviously a very inefficient way to debug this, so hopefully I have luck w/ reducing this.

@mizvekov
Copy link
Contributor Author

I'm trying to reduce it now, but because it's a modules issue/involves multiple files, it's much harder. I created #154840 to track this.

FWIW if that helps, cvise can reduce multiple files in one go. You can use it to go as far as reducing a whole cmake project at once, but that can be slow unless you do some amount of manual reduction first.

If you have any theories you want me to try out, e.g. places to add an assert or calls to dump(), I can patch it in and see what happens. But that's obviously a very inefficient way to debug this, so hopefully I have luck w/ reducing this.

Looking a bit more, this one is a bit less obvious because we do search for a definition in ASTContext::getTypeInfoImpl, so it does look like we lost track of the definition somehow.

@michaelrj-google
Copy link
Contributor

Hi, I'm helping fix some clang-as-a-library users and I'm running into some issues with the TypePrinter after this commit. Specifically the clang generator in sandboxed-api is broken (https://github.com/google/sandboxed-api/tree/main). After doing the trivial fixes (getDecl -> getOriginalDecl) some of the tests are failing (https://github.com/google/sandboxed-api/blob/main/sandboxed_api/tools/clang_generator/emitter_test.cc).

There are two specific tests failing in that test, NamedEnumWithoutTypedef and NestedAnonymousStruct.
NamedEnumWithoutTypedef expects

"enum Color { kRed, kGreen, kBlue }",
"typedef struct { enum Color member; } B"

and is getting

"typedef struct { enum Color member; } B"

While NestedAnonymousStruct expects

"struct A { struct { int number; } b; int data; }"

and is getting

"struct A { struct (unnamed struct at input.cc:2:16) b; int data; }"

Related, I think the change to void TypePrinter::printUsingBefore(const UsingType *T, raw_ostream &OS) might be causing similar issues. The getKeywordName is causing enum to get printed when the type is listed for template instantiation, which is causing errors.

If you need more information feel free to reach out, I'm happy to connect you with the relevant people.

@mizvekov
Copy link
Contributor Author

Hi, I'm helping fix some clang-as-a-library users and I'm running into some issues with the TypePrinter after this commit. Specifically the clang generator in sandboxed-api is broken (https://github.com/google/sandboxed-api/tree/main). After doing the trivial fixes (getDecl -> getOriginalDecl) some of the tests are failing (https://github.com/google/sandboxed-api/blob/main/sandboxed_api/tools/clang_generator/emitter_test.cc).

The NestedAnonymousStruct case seems to be missing a IncludeTagDefinition PrintingPolicy set to true.

@mizvekov
Copy link
Contributor Author

Related, I think the change to void TypePrinter::printUsingBefore(const UsingType *T, raw_ostream &OS) might be causing similar issues. The getKeywordName is causing enum to get printed when the type is listed for template instantiation, which is causing errors.

Can you give me an example of that? We shouldn't be printing a keyword unless the type was written / created with a keyword.

@bolshakov-a
Copy link
Contributor

bolshakov-a commented Aug 23, 2025

Seems like getOriginalDecl() always returns the full definition for enumerations (and not an opaque declaration for enums with fixed underlying type), at least in the C language mode. Is it guaranteed?

@mizvekov
Copy link
Contributor Author

Seems like getOriginalDecl() always returns the full definition for enumerations (and not an opaque declaration for enums with fixed underlying type), at least in the C language mode. Is it guaranteed?

I don't see that happening. Example: https://compiler-explorer.com/z/z9cMxPWva

@bolshakov-a
Copy link
Contributor

Ah, sorry. I tested on IWYU and get confused because it transforms a declaration to the definition later.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backend:AArch64 backend:AMDGPU backend:ARC backend:ARM backend:CSKY backend:Hexagon backend:Lanai backend:loongarch backend:MIPS backend:PowerPC backend:RISC-V backend:Sparc backend:SystemZ backend:WebAssembly backend:X86 clang:analysis clang:as-a-library libclang and C++ API clang:bytecode Issues for the clang bytecode constexpr interpreter clang:codegen IR generation bugs: mangling, exceptions, etc. clang:dataflow Clang Dataflow Analysis framework - https://clang.llvm.org/docs/DataFlowAnalysisIntro.html clang:frontend Language frontend issues, e.g. anything involving "Sema" clang:modules C++20 modules and Clang Header Modules clang:openmp OpenMP related changes to Clang clang:static analyzer clang Clang issues not falling into any other category clang-tidy clang-tools-extra clangd coroutines C++20 coroutines debuginfo HLSL HLSL Language Support libc++ libc++ C++ Standard Library. Not GNU libstdc++. Not libc++abi. lldb
Projects
None yet