Skip to content

Conversation

@jkorous-apple
Copy link
Contributor

Add core abstractions for identifying program entities across compilation and link unit boundaries in the Scalable Static Analysis Framework (SSAF).

Introduces three key components:

  • BuildNamespace: Represents build artifacts (compilation units, link units)
  • EntityName: Globally unique entity identifiers across compilation boundaries
  • AST mapping: Functions to map Clang AST declarations to EntityNames

Entity identification uses Unified Symbol Resolution (USR) as the underlying mechanism, with extensions for sub-entities (parameters, return values) via suffixes. The abstraction allows whole-program analysis by providing stable identifiers that persist across separately compiled translation units.

Add core abstractions for identifying program entities across compilation
and link unit boundaries in the Scalable Static Analysis Framework (SSAF).

Introduces three key components:
- BuildNamespace: Represents build artifacts (compilation units, link units)
- EntityName: Globally unique entity identifiers across compilation boundaries
- AST mapping: Functions to map Clang AST declarations to EntityNames

Entity identification uses Unified Symbol Resolution (USR) as the underlying
mechanism, with extensions for sub-entities (parameters, return values) via
suffixes. The abstraction allows whole-program analysis by providing stable
identifiers that persist across separately compiled translation units.
@llvmbot llvmbot added clang Clang issues not falling into any other category clang:analysis labels Nov 22, 2025
@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2025

@llvm/pr-subscribers-clang

Author: Jan Korous (jkorous-apple)

Changes

Add core abstractions for identifying program entities across compilation and link unit boundaries in the Scalable Static Analysis Framework (SSAF).

Introduces three key components:

  • BuildNamespace: Represents build artifacts (compilation units, link units)
  • EntityName: Globally unique entity identifiers across compilation boundaries
  • AST mapping: Functions to map Clang AST declarations to EntityNames

Entity identification uses Unified Symbol Resolution (USR) as the underlying mechanism, with extensions for sub-entities (parameters, return values) via suffixes. The abstraction allows whole-program analysis by providing stable identifiers that persist across separately compiled translation units.


Patch is 30.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169131.diff

13 Files Affected:

  • (added) clang/include/clang/Analysis/Scalable/ASTEntityMapping.h (+46)
  • (added) clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h (+84)
  • (added) clang/include/clang/Analysis/Scalable/Model/EntityName.h (+47)
  • (modified) clang/lib/Analysis/CMakeLists.txt (+1)
  • (added) clang/lib/Analysis/Scalable/ASTEntityMapping.cpp (+85)
  • (added) clang/lib/Analysis/Scalable/CMakeLists.txt (+19)
  • (added) clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp (+72)
  • (added) clang/lib/Analysis/Scalable/Model/EntityName.cpp (+44)
  • (modified) clang/unittests/Analysis/CMakeLists.txt (+1)
  • (added) clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp (+343)
  • (added) clang/unittests/Analysis/Scalable/BuildNamespaceTest.cpp (+99)
  • (added) clang/unittests/Analysis/Scalable/CMakeLists.txt (+18)
  • (added) clang/unittests/Analysis/Scalable/EntityNameTest.cpp (+62)
diff --git a/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
new file mode 100644
index 0000000000000..a137e8b741821
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
@@ -0,0 +1,46 @@
+//===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
+
+#include "clang/Analysis/Scalable/Model/EntityName.h"
+#include "clang/AST/Decl.h"
+#include "llvm/ADT/StringRef.h"
+#include <optional>
+
+namespace clang {
+namespace ssaf {
+
+/// Maps a declaration to an EntityName.
+///
+/// Supported declaration types for entity mapping:
+/// - Functions and methods
+/// - Global Variables
+/// - Function parameters
+/// - Struct/class/union type definitions
+/// - Struct/class/union fields
+///
+/// Implicit declarations and compiler builtins are not mapped.
+///
+/// \param D The declaration to map. Must not be null.
+///
+/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
+
+/// Maps a function return type to an EntityName.
+///
+/// \param FD The function declaration. Must not be null.
+///
+/// \return An EntityName for the function's return type.
+std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD);
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
diff --git a/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
new file mode 100644
index 0000000000000..c4bf7146e461f
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
@@ -0,0 +1,84 @@
+//===- BuildNamespace.h -----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
+
+#include "llvm/ADT/StringRef.h"
+#include <optional>
+#include <string>
+#include <vector>
+
+namespace clang {
+namespace ssaf {
+
+enum class BuildNamespaceKind : unsigned short {
+  CompilationUnit,
+  LinkUnit
+};
+
+std::string toString(BuildNamespaceKind BNK);
+
+std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str);
+
+/// Represents a single step in the build process.
+class BuildNamespace {
+  BuildNamespaceKind Kind;
+  std::string Name;
+public:
+  BuildNamespace(BuildNamespaceKind Kind, llvm::StringRef Name)
+    : Kind(Kind), Name(Name.str()) {}
+
+  static BuildNamespace makeTU(llvm::StringRef CompilationId);
+
+  bool operator==(const BuildNamespace& Other) const;
+  bool operator!=(const BuildNamespace& Other) const;
+  bool operator<(const BuildNamespace& Other) const;
+
+  friend class SerializationFormat;
+};
+
+/// Represents a sequence of steps in the build process.
+class NestedBuildNamespace {
+  friend class SerializationFormat;
+
+  std::vector<BuildNamespace> Namespaces;
+
+public:
+  NestedBuildNamespace() = default;
+
+  explicit NestedBuildNamespace(const std::vector<BuildNamespace>& Namespaces)
+    : Namespaces(Namespaces) {}
+
+  explicit NestedBuildNamespace(const BuildNamespace& N) {
+    Namespaces.push_back(N);
+  }
+
+  static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);
+
+  NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) {
+    auto Copy = *this;
+    for (const auto& N : Namespace.Namespaces)
+      Copy.Namespaces.push_back(N);
+    return Copy;
+  }
+
+  bool empty() const;
+
+  bool operator==(const NestedBuildNamespace& Other) const;
+  bool operator!=(const NestedBuildNamespace& Other) const;
+  bool operator<(const NestedBuildNamespace& Other) const;
+
+  friend class JSONWriter;
+  friend class LinkUnitResolution;
+};
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
diff --git a/clang/include/clang/Analysis/Scalable/Model/EntityName.h b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
new file mode 100644
index 0000000000000..7f11ef0589bf5
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
@@ -0,0 +1,47 @@
+//===- EntityName.h ---------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
+
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include <string>
+
+namespace clang {
+namespace ssaf {
+
+/// Uniquely identifies an entity in a program.
+///
+/// EntityName provides a globally unique identifier for program entities that remains
+/// stable across compilation boundaries. This enables whole-program analysis to track
+/// and relate entities across separately compiled translation units.
+class EntityName {
+  std::string USR;
+  llvm::SmallString<16> Suffix;
+  NestedBuildNamespace Namespace;
+
+public:
+  EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
+             NestedBuildNamespace Namespace);
+
+  bool operator==(const EntityName& Other) const;
+  bool operator!=(const EntityName& Other) const;
+  bool operator<(const EntityName& Other) const;
+
+  EntityName makeQualified(NestedBuildNamespace Namespace);
+
+  friend class LinkUnitResolution;
+  friend class SerializationFormat;
+};
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 1dbd4153d856f..99a2ec684e149 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -50,3 +50,4 @@ add_clang_library(clangAnalysis
 add_subdirectory(plugins)
 add_subdirectory(FlowSensitive)
 add_subdirectory(LifetimeSafety)
+add_subdirectory(Scalable)
diff --git a/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
new file mode 100644
index 0000000000000..87d05e8aa5dc3
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
@@ -0,0 +1,85 @@
+//===- ASTMapping.cpp - AST to SSAF Entity mapping --------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements utilities for mapping AST declarations to SSAF entities.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/ASTEntityMapping.h"
+#include "clang/AST/Decl.h"
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "clang/Index/USRGeneration.h"
+#include "llvm/ADT/SmallString.h"
+
+namespace clang {
+namespace ssaf {
+
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
+  if (!D)
+    return std::nullopt;
+
+  if (D->isImplicit())
+    return std::nullopt;
+
+  if (isa<FunctionDecl>(D) && cast<FunctionDecl>(D)->getBuiltinID())
+    return std::nullopt;
+
+  if (!isa<FunctionDecl>(D) && !isa<ParmVarDecl>(D) && !isa<VarDecl>(D) &&
+      !isa<FieldDecl>(D) && !isa<RecordDecl>(D))
+    return std::nullopt;
+
+  llvm::SmallString<16> Suffix;
+  const Decl *USRDecl = D;
+
+  // For parameters, use the parent function's USR with parameter index as suffix
+  if (const auto * PVD = dyn_cast<ParmVarDecl>(D)) {
+    const auto *FD = dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod());
+    if (!FD)
+      return std::nullopt;
+    USRDecl = FD;
+
+    const auto ParamIdx = PVD->getFunctionScopeIndex();
+    llvm::raw_svector_ostream OS(Suffix);
+    // Parameter uses function's USR with 1-based index as suffix
+    OS << (ParamIdx + 1);
+  }
+
+  llvm::SmallString<128> USRBuf;
+  if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) {
+    return std::nullopt;
+  }
+
+  if (USRBuf.empty())
+    return std::nullopt;
+
+  return EntityName(USRBuf.str(), Suffix, {});
+}
+
+std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD) {
+  if (!FD)
+    return std::nullopt;
+
+  if (FD->isImplicit())
+    return std::nullopt;
+
+  if (FD->getBuiltinID())
+    return std::nullopt;
+
+  llvm::SmallString<128> USRBuf;
+  if (clang::index::generateUSRForDecl(FD, USRBuf)) {
+    return std::nullopt;
+  }
+
+  if (USRBuf.empty())
+    return std::nullopt;
+
+  return EntityName(USRBuf.str(), "0", {});
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/lib/Analysis/Scalable/CMakeLists.txt b/clang/lib/Analysis/Scalable/CMakeLists.txt
new file mode 100644
index 0000000000000..ea4693f102cb2
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/CMakeLists.txt
@@ -0,0 +1,19 @@
+set(LLVM_LINK_COMPONENTS
+  Support
+  )
+
+add_clang_library(clangAnalysisScalable
+  ASTEntityMapping.cpp
+  Model/BuildNamespace.cpp
+  Model/EntityName.cpp
+
+  LINK_LIBS
+  clangAST
+  clangASTMatchers
+  clangBasic
+  clangIndex
+  clangLex
+  clangFrontend
+
+  DEPENDS
+  )
diff --git a/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
new file mode 100644
index 0000000000000..5284a9a87a33a
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
@@ -0,0 +1,72 @@
+//===- BuildNamespace.cpp ---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "llvm/Support/ErrorHandling.h"
+
+namespace clang {
+namespace ssaf {
+
+std::string toString(BuildNamespaceKind BNK) {
+  switch(BNK) {
+    case BuildNamespaceKind::CompilationUnit: return "compilation_unit";
+    case BuildNamespaceKind::LinkUnit: return "link_unit";
+  }
+  llvm_unreachable("Unknown BuildNamespaceKind");
+}
+
+std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str) {
+  if (Str == "compilation_unit")
+    return BuildNamespaceKind::CompilationUnit;
+  if (Str == "link_unit")
+    return BuildNamespaceKind::LinkUnit;
+  return std::nullopt;
+}
+
+BuildNamespace BuildNamespace::makeTU(llvm::StringRef CompilationId) {
+  return BuildNamespace{BuildNamespaceKind::CompilationUnit, CompilationId.str()};
+}
+
+bool BuildNamespace::operator==(const BuildNamespace& Other) const {
+  return Kind == Other.Kind && Name == Other.Name;
+}
+
+bool BuildNamespace::operator!=(const BuildNamespace& Other) const {
+  return !(*this == Other);
+}
+
+bool BuildNamespace::operator<(const BuildNamespace& Other) const {
+  if (Kind != Other.Kind)
+    return Kind < Other.Kind;
+  return Name < Other.Name;
+}
+
+NestedBuildNamespace NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) {
+  NestedBuildNamespace Result;
+  Result.Namespaces.push_back(BuildNamespace::makeTU(CompilationId));
+  return Result;
+}
+
+bool NestedBuildNamespace::empty() const {
+  return Namespaces.empty();
+}
+
+bool NestedBuildNamespace::operator==(const NestedBuildNamespace& Other) const {
+  return Namespaces == Other.Namespaces;
+}
+
+bool NestedBuildNamespace::operator!=(const NestedBuildNamespace& Other) const {
+  return !(*this == Other);
+}
+
+bool NestedBuildNamespace::operator<(const NestedBuildNamespace& Other) const {
+  return Namespaces < Other.Namespaces;
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/lib/Analysis/Scalable/Model/EntityName.cpp b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
new file mode 100644
index 0000000000000..3404ecc58fac2
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
@@ -0,0 +1,44 @@
+//===- EntityName.cpp -------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/Model/EntityName.h"
+
+namespace clang {
+namespace ssaf {
+
+EntityName::EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
+                       NestedBuildNamespace Namespace)
+  : USR(USR.str()), Suffix(Suffix), Namespace(std::move(Namespace)) {}
+
+bool EntityName::operator==(const EntityName& Other) const {
+  return USR == Other.USR &&
+         Suffix == Other.Suffix &&
+         Namespace == Other.Namespace;
+}
+
+bool EntityName::operator!=(const EntityName& Other) const {
+  return !(*this == Other);
+}
+
+bool EntityName::operator<(const EntityName& Other) const {
+  if (USR != Other.USR)
+    return USR < Other.USR;
+  if (Suffix != Other.Suffix)
+    return Suffix.str() < Other.Suffix.str();
+  return Namespace < Other.Namespace;
+}
+
+EntityName EntityName::makeQualified(NestedBuildNamespace Namespace) {
+  auto Copy = *this;
+  Copy.Namespace = Copy.Namespace.makeQualified(Namespace);
+
+  return Copy;
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/unittests/Analysis/CMakeLists.txt b/clang/unittests/Analysis/CMakeLists.txt
index e0acf436b37c7..97e768b11db69 100644
--- a/clang/unittests/Analysis/CMakeLists.txt
+++ b/clang/unittests/Analysis/CMakeLists.txt
@@ -26,3 +26,4 @@ add_clang_unittest(ClangAnalysisTests
   )
 
 add_subdirectory(FlowSensitive)
+add_subdirectory(Scalable)
diff --git a/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
new file mode 100644
index 0000000000000..8de0df246cb65
--- /dev/null
+++ b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
@@ -0,0 +1,343 @@
+//===- unittests/Analysis/Scalable/ASTEntityMappingTest.cpp --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/ASTEntityMapping.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/AST/Decl.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/ASTMatchers/ASTMatchers.h"
+#include "clang/Tooling/Tooling.h"
+#include "gtest/gtest.h"
+
+using namespace clang::ast_matchers;
+
+namespace clang {
+namespace ssaf {
+namespace {
+
+// Helper function to find a declaration by name
+template <typename DeclType>
+const DeclType *findDecl(ASTContext &Ctx, StringRef Name) {
+  auto Matcher = namedDecl(hasName(Name)).bind("decl");
+  auto Matches = match(Matcher, Ctx);
+  if (Matches.empty())
+    return nullptr;
+  return Matches[0].getNodeAs<DeclType>("decl");
+}
+
+TEST(ASTEntityMappingTest, FunctionDecl) {
+  auto AST = tooling::buildASTFromCode("void foo() {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, VarDecl) {
+  auto AST = tooling::buildASTFromCode("int x = 42;");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *VD = findDecl<VarDecl>(Ctx, "x");
+  ASSERT_NE(VD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(VD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, ParmVarDecl) {
+  auto AST = tooling::buildASTFromCode("void foo(int x) {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+  ASSERT_GT(FD->param_size(), 0u);
+
+  const auto *PVD = FD->getParamDecl(0);
+  ASSERT_NE(PVD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(PVD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, RecordDecl) {
+  auto AST = tooling::buildASTFromCode("struct S {};");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *RD = findDecl<RecordDecl>(Ctx, "S");
+  ASSERT_NE(RD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(RD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FieldDecl) {
+  auto AST = tooling::buildASTFromCode("struct S { int field; };");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FieldDecl>(Ctx, "field");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, NullDecl) {
+  auto EntityName = getLocalEntityNameForDecl(nullptr);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, ImplicitDecl) {
+  auto AST = tooling::buildASTFromCode(R"(
+    struct S {
+      S() = default;
+    };
+  )", "test.cpp", std::make_shared<PCHContainerOperations>());
+  auto &Ctx = AST->getASTContext();
+
+  const auto *RD = findDecl<CXXRecordDecl>(Ctx, "S");
+  ASSERT_NE(RD, nullptr);
+
+  // Find the implicitly-declared copy constructor
+  for (const auto *Ctor : RD->ctors()) {
+    if (Ctor->isCopyConstructor() && Ctor->isImplicit()) {
+      auto EntityName = getLocalEntityNameForDecl(Ctor);
+      EXPECT_FALSE(EntityName.has_value());
+      return;
+    }
+  }
+}
+
+TEST(ASTEntityMappingTest, BuiltinFunction) {
+  auto AST = tooling::buildASTFromCode(R"(
+    void test() {
+      __builtin_memcpy(0, 0, 0);
+    }
+  )");
+  auto &Ctx = AST->getASTContext();
+
+  // Find the builtin call
+  auto Matcher = callExpr().bind("call");
+  auto Matches = match(Matcher, Ctx);
+  ASSERT_FALSE(Matches.empty());
+
+  const auto *CE = Matches[0].getNodeAs<CallExpr>("call");
+  ASSERT_NE(CE, nullptr);
+
+  const auto *Callee = CE->getDirectCallee();
+  if (Callee && Callee->getBuiltinID()) {
+    auto EntityName = getLocalEntityNameForDecl(Callee);
+    EXPECT_FALSE(EntityName.has_value());
+  }
+}
+
+TEST(ASTEntityMappingTest, UnsupportedDecl) {
+  auto AST = tooling::buildASTFromCode("namespace N {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *ND = findDecl<NamespaceDecl>(Ctx, "N");
+  ASSERT_NE(ND, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(ND);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturn) {
+  auto AST = tooling::buildASTFromCode("int foo() { return 42; }");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForFunctionReturn(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturnNull) {
+  auto EntityName = getLocalEntityNameForFunctionReturn(nullptr);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturnBuiltin) {
+  auto AST = tooling::buildASTFromCode(R"(
+    void test() {
+      __builtin_memcpy(0, 0, 0);
+    }
+  )");
+  auto &Ctx = AST->getASTContext();
+
+  // Find the builtin call
+  auto Matcher = callExpr().bind("call");
+  auto Matches = match(Matcher, Ctx);
+  ASSERT_FALSE(Matches.empty());
+
+  const auto *CE = Matches[0].getNodeAs<CallExpr>("call");
+  ASSERT_NE(CE, nullptr);
+
+  const auto *Callee = CE->getDi...
[truncated]

@llvmbot
Copy link
Member

llvmbot commented Nov 22, 2025

@llvm/pr-subscribers-clang-analysis

Author: Jan Korous (jkorous-apple)

Changes

Add core abstractions for identifying program entities across compilation and link unit boundaries in the Scalable Static Analysis Framework (SSAF).

Introduces three key components:

  • BuildNamespace: Represents build artifacts (compilation units, link units)
  • EntityName: Globally unique entity identifiers across compilation boundaries
  • AST mapping: Functions to map Clang AST declarations to EntityNames

Entity identification uses Unified Symbol Resolution (USR) as the underlying mechanism, with extensions for sub-entities (parameters, return values) via suffixes. The abstraction allows whole-program analysis by providing stable identifiers that persist across separately compiled translation units.


Patch is 30.31 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/169131.diff

13 Files Affected:

  • (added) clang/include/clang/Analysis/Scalable/ASTEntityMapping.h (+46)
  • (added) clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h (+84)
  • (added) clang/include/clang/Analysis/Scalable/Model/EntityName.h (+47)
  • (modified) clang/lib/Analysis/CMakeLists.txt (+1)
  • (added) clang/lib/Analysis/Scalable/ASTEntityMapping.cpp (+85)
  • (added) clang/lib/Analysis/Scalable/CMakeLists.txt (+19)
  • (added) clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp (+72)
  • (added) clang/lib/Analysis/Scalable/Model/EntityName.cpp (+44)
  • (modified) clang/unittests/Analysis/CMakeLists.txt (+1)
  • (added) clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp (+343)
  • (added) clang/unittests/Analysis/Scalable/BuildNamespaceTest.cpp (+99)
  • (added) clang/unittests/Analysis/Scalable/CMakeLists.txt (+18)
  • (added) clang/unittests/Analysis/Scalable/EntityNameTest.cpp (+62)
diff --git a/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
new file mode 100644
index 0000000000000..a137e8b741821
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
@@ -0,0 +1,46 @@
+//===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
+
+#include "clang/Analysis/Scalable/Model/EntityName.h"
+#include "clang/AST/Decl.h"
+#include "llvm/ADT/StringRef.h"
+#include <optional>
+
+namespace clang {
+namespace ssaf {
+
+/// Maps a declaration to an EntityName.
+///
+/// Supported declaration types for entity mapping:
+/// - Functions and methods
+/// - Global Variables
+/// - Function parameters
+/// - Struct/class/union type definitions
+/// - Struct/class/union fields
+///
+/// Implicit declarations and compiler builtins are not mapped.
+///
+/// \param D The declaration to map. Must not be null.
+///
+/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
+
+/// Maps a function return type to an EntityName.
+///
+/// \param FD The function declaration. Must not be null.
+///
+/// \return An EntityName for the function's return type.
+std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD);
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ASTMAPPING_H
diff --git a/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
new file mode 100644
index 0000000000000..c4bf7146e461f
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
@@ -0,0 +1,84 @@
+//===- BuildNamespace.h -----------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
+
+#include "llvm/ADT/StringRef.h"
+#include <optional>
+#include <string>
+#include <vector>
+
+namespace clang {
+namespace ssaf {
+
+enum class BuildNamespaceKind : unsigned short {
+  CompilationUnit,
+  LinkUnit
+};
+
+std::string toString(BuildNamespaceKind BNK);
+
+std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str);
+
+/// Represents a single step in the build process.
+class BuildNamespace {
+  BuildNamespaceKind Kind;
+  std::string Name;
+public:
+  BuildNamespace(BuildNamespaceKind Kind, llvm::StringRef Name)
+    : Kind(Kind), Name(Name.str()) {}
+
+  static BuildNamespace makeTU(llvm::StringRef CompilationId);
+
+  bool operator==(const BuildNamespace& Other) const;
+  bool operator!=(const BuildNamespace& Other) const;
+  bool operator<(const BuildNamespace& Other) const;
+
+  friend class SerializationFormat;
+};
+
+/// Represents a sequence of steps in the build process.
+class NestedBuildNamespace {
+  friend class SerializationFormat;
+
+  std::vector<BuildNamespace> Namespaces;
+
+public:
+  NestedBuildNamespace() = default;
+
+  explicit NestedBuildNamespace(const std::vector<BuildNamespace>& Namespaces)
+    : Namespaces(Namespaces) {}
+
+  explicit NestedBuildNamespace(const BuildNamespace& N) {
+    Namespaces.push_back(N);
+  }
+
+  static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);
+
+  NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) {
+    auto Copy = *this;
+    for (const auto& N : Namespace.Namespaces)
+      Copy.Namespaces.push_back(N);
+    return Copy;
+  }
+
+  bool empty() const;
+
+  bool operator==(const NestedBuildNamespace& Other) const;
+  bool operator!=(const NestedBuildNamespace& Other) const;
+  bool operator<(const NestedBuildNamespace& Other) const;
+
+  friend class JSONWriter;
+  friend class LinkUnitResolution;
+};
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
diff --git a/clang/include/clang/Analysis/Scalable/Model/EntityName.h b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
new file mode 100644
index 0000000000000..7f11ef0589bf5
--- /dev/null
+++ b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
@@ -0,0 +1,47 @@
+//===- EntityName.h ---------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
+#define LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
+
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/StringRef.h"
+#include <string>
+
+namespace clang {
+namespace ssaf {
+
+/// Uniquely identifies an entity in a program.
+///
+/// EntityName provides a globally unique identifier for program entities that remains
+/// stable across compilation boundaries. This enables whole-program analysis to track
+/// and relate entities across separately compiled translation units.
+class EntityName {
+  std::string USR;
+  llvm::SmallString<16> Suffix;
+  NestedBuildNamespace Namespace;
+
+public:
+  EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
+             NestedBuildNamespace Namespace);
+
+  bool operator==(const EntityName& Other) const;
+  bool operator!=(const EntityName& Other) const;
+  bool operator<(const EntityName& Other) const;
+
+  EntityName makeQualified(NestedBuildNamespace Namespace);
+
+  friend class LinkUnitResolution;
+  friend class SerializationFormat;
+};
+
+} // namespace ssaf
+} // namespace clang
+
+#endif // LLVM_CLANG_ANALYSIS_SCALABLE_ENTITY_NAME_H
diff --git a/clang/lib/Analysis/CMakeLists.txt b/clang/lib/Analysis/CMakeLists.txt
index 1dbd4153d856f..99a2ec684e149 100644
--- a/clang/lib/Analysis/CMakeLists.txt
+++ b/clang/lib/Analysis/CMakeLists.txt
@@ -50,3 +50,4 @@ add_clang_library(clangAnalysis
 add_subdirectory(plugins)
 add_subdirectory(FlowSensitive)
 add_subdirectory(LifetimeSafety)
+add_subdirectory(Scalable)
diff --git a/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
new file mode 100644
index 0000000000000..87d05e8aa5dc3
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
@@ -0,0 +1,85 @@
+//===- ASTMapping.cpp - AST to SSAF Entity mapping --------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+//
+// This file implements utilities for mapping AST declarations to SSAF entities.
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/ASTEntityMapping.h"
+#include "clang/AST/Decl.h"
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "clang/Index/USRGeneration.h"
+#include "llvm/ADT/SmallString.h"
+
+namespace clang {
+namespace ssaf {
+
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
+  if (!D)
+    return std::nullopt;
+
+  if (D->isImplicit())
+    return std::nullopt;
+
+  if (isa<FunctionDecl>(D) && cast<FunctionDecl>(D)->getBuiltinID())
+    return std::nullopt;
+
+  if (!isa<FunctionDecl>(D) && !isa<ParmVarDecl>(D) && !isa<VarDecl>(D) &&
+      !isa<FieldDecl>(D) && !isa<RecordDecl>(D))
+    return std::nullopt;
+
+  llvm::SmallString<16> Suffix;
+  const Decl *USRDecl = D;
+
+  // For parameters, use the parent function's USR with parameter index as suffix
+  if (const auto * PVD = dyn_cast<ParmVarDecl>(D)) {
+    const auto *FD = dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod());
+    if (!FD)
+      return std::nullopt;
+    USRDecl = FD;
+
+    const auto ParamIdx = PVD->getFunctionScopeIndex();
+    llvm::raw_svector_ostream OS(Suffix);
+    // Parameter uses function's USR with 1-based index as suffix
+    OS << (ParamIdx + 1);
+  }
+
+  llvm::SmallString<128> USRBuf;
+  if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) {
+    return std::nullopt;
+  }
+
+  if (USRBuf.empty())
+    return std::nullopt;
+
+  return EntityName(USRBuf.str(), Suffix, {});
+}
+
+std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD) {
+  if (!FD)
+    return std::nullopt;
+
+  if (FD->isImplicit())
+    return std::nullopt;
+
+  if (FD->getBuiltinID())
+    return std::nullopt;
+
+  llvm::SmallString<128> USRBuf;
+  if (clang::index::generateUSRForDecl(FD, USRBuf)) {
+    return std::nullopt;
+  }
+
+  if (USRBuf.empty())
+    return std::nullopt;
+
+  return EntityName(USRBuf.str(), "0", {});
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/lib/Analysis/Scalable/CMakeLists.txt b/clang/lib/Analysis/Scalable/CMakeLists.txt
new file mode 100644
index 0000000000000..ea4693f102cb2
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/CMakeLists.txt
@@ -0,0 +1,19 @@
+set(LLVM_LINK_COMPONENTS
+  Support
+  )
+
+add_clang_library(clangAnalysisScalable
+  ASTEntityMapping.cpp
+  Model/BuildNamespace.cpp
+  Model/EntityName.cpp
+
+  LINK_LIBS
+  clangAST
+  clangASTMatchers
+  clangBasic
+  clangIndex
+  clangLex
+  clangFrontend
+
+  DEPENDS
+  )
diff --git a/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
new file mode 100644
index 0000000000000..5284a9a87a33a
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
@@ -0,0 +1,72 @@
+//===- BuildNamespace.cpp ---------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/Model/BuildNamespace.h"
+#include "llvm/Support/ErrorHandling.h"
+
+namespace clang {
+namespace ssaf {
+
+std::string toString(BuildNamespaceKind BNK) {
+  switch(BNK) {
+    case BuildNamespaceKind::CompilationUnit: return "compilation_unit";
+    case BuildNamespaceKind::LinkUnit: return "link_unit";
+  }
+  llvm_unreachable("Unknown BuildNamespaceKind");
+}
+
+std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str) {
+  if (Str == "compilation_unit")
+    return BuildNamespaceKind::CompilationUnit;
+  if (Str == "link_unit")
+    return BuildNamespaceKind::LinkUnit;
+  return std::nullopt;
+}
+
+BuildNamespace BuildNamespace::makeTU(llvm::StringRef CompilationId) {
+  return BuildNamespace{BuildNamespaceKind::CompilationUnit, CompilationId.str()};
+}
+
+bool BuildNamespace::operator==(const BuildNamespace& Other) const {
+  return Kind == Other.Kind && Name == Other.Name;
+}
+
+bool BuildNamespace::operator!=(const BuildNamespace& Other) const {
+  return !(*this == Other);
+}
+
+bool BuildNamespace::operator<(const BuildNamespace& Other) const {
+  if (Kind != Other.Kind)
+    return Kind < Other.Kind;
+  return Name < Other.Name;
+}
+
+NestedBuildNamespace NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) {
+  NestedBuildNamespace Result;
+  Result.Namespaces.push_back(BuildNamespace::makeTU(CompilationId));
+  return Result;
+}
+
+bool NestedBuildNamespace::empty() const {
+  return Namespaces.empty();
+}
+
+bool NestedBuildNamespace::operator==(const NestedBuildNamespace& Other) const {
+  return Namespaces == Other.Namespaces;
+}
+
+bool NestedBuildNamespace::operator!=(const NestedBuildNamespace& Other) const {
+  return !(*this == Other);
+}
+
+bool NestedBuildNamespace::operator<(const NestedBuildNamespace& Other) const {
+  return Namespaces < Other.Namespaces;
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/lib/Analysis/Scalable/Model/EntityName.cpp b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
new file mode 100644
index 0000000000000..3404ecc58fac2
--- /dev/null
+++ b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
@@ -0,0 +1,44 @@
+//===- EntityName.cpp -------------------------------------------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/Model/EntityName.h"
+
+namespace clang {
+namespace ssaf {
+
+EntityName::EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
+                       NestedBuildNamespace Namespace)
+  : USR(USR.str()), Suffix(Suffix), Namespace(std::move(Namespace)) {}
+
+bool EntityName::operator==(const EntityName& Other) const {
+  return USR == Other.USR &&
+         Suffix == Other.Suffix &&
+         Namespace == Other.Namespace;
+}
+
+bool EntityName::operator!=(const EntityName& Other) const {
+  return !(*this == Other);
+}
+
+bool EntityName::operator<(const EntityName& Other) const {
+  if (USR != Other.USR)
+    return USR < Other.USR;
+  if (Suffix != Other.Suffix)
+    return Suffix.str() < Other.Suffix.str();
+  return Namespace < Other.Namespace;
+}
+
+EntityName EntityName::makeQualified(NestedBuildNamespace Namespace) {
+  auto Copy = *this;
+  Copy.Namespace = Copy.Namespace.makeQualified(Namespace);
+
+  return Copy;
+}
+
+} // namespace ssaf
+} // namespace clang
diff --git a/clang/unittests/Analysis/CMakeLists.txt b/clang/unittests/Analysis/CMakeLists.txt
index e0acf436b37c7..97e768b11db69 100644
--- a/clang/unittests/Analysis/CMakeLists.txt
+++ b/clang/unittests/Analysis/CMakeLists.txt
@@ -26,3 +26,4 @@ add_clang_unittest(ClangAnalysisTests
   )
 
 add_subdirectory(FlowSensitive)
+add_subdirectory(Scalable)
diff --git a/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
new file mode 100644
index 0000000000000..8de0df246cb65
--- /dev/null
+++ b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
@@ -0,0 +1,343 @@
+//===- unittests/Analysis/Scalable/ASTEntityMappingTest.cpp --------------===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#include "clang/Analysis/Scalable/ASTEntityMapping.h"
+#include "clang/AST/ASTContext.h"
+#include "clang/AST/Decl.h"
+#include "clang/ASTMatchers/ASTMatchFinder.h"
+#include "clang/ASTMatchers/ASTMatchers.h"
+#include "clang/Tooling/Tooling.h"
+#include "gtest/gtest.h"
+
+using namespace clang::ast_matchers;
+
+namespace clang {
+namespace ssaf {
+namespace {
+
+// Helper function to find a declaration by name
+template <typename DeclType>
+const DeclType *findDecl(ASTContext &Ctx, StringRef Name) {
+  auto Matcher = namedDecl(hasName(Name)).bind("decl");
+  auto Matches = match(Matcher, Ctx);
+  if (Matches.empty())
+    return nullptr;
+  return Matches[0].getNodeAs<DeclType>("decl");
+}
+
+TEST(ASTEntityMappingTest, FunctionDecl) {
+  auto AST = tooling::buildASTFromCode("void foo() {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, VarDecl) {
+  auto AST = tooling::buildASTFromCode("int x = 42;");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *VD = findDecl<VarDecl>(Ctx, "x");
+  ASSERT_NE(VD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(VD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, ParmVarDecl) {
+  auto AST = tooling::buildASTFromCode("void foo(int x) {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+  ASSERT_GT(FD->param_size(), 0u);
+
+  const auto *PVD = FD->getParamDecl(0);
+  ASSERT_NE(PVD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(PVD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, RecordDecl) {
+  auto AST = tooling::buildASTFromCode("struct S {};");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *RD = findDecl<RecordDecl>(Ctx, "S");
+  ASSERT_NE(RD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(RD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FieldDecl) {
+  auto AST = tooling::buildASTFromCode("struct S { int field; };");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FieldDecl>(Ctx, "field");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, NullDecl) {
+  auto EntityName = getLocalEntityNameForDecl(nullptr);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, ImplicitDecl) {
+  auto AST = tooling::buildASTFromCode(R"(
+    struct S {
+      S() = default;
+    };
+  )", "test.cpp", std::make_shared<PCHContainerOperations>());
+  auto &Ctx = AST->getASTContext();
+
+  const auto *RD = findDecl<CXXRecordDecl>(Ctx, "S");
+  ASSERT_NE(RD, nullptr);
+
+  // Find the implicitly-declared copy constructor
+  for (const auto *Ctor : RD->ctors()) {
+    if (Ctor->isCopyConstructor() && Ctor->isImplicit()) {
+      auto EntityName = getLocalEntityNameForDecl(Ctor);
+      EXPECT_FALSE(EntityName.has_value());
+      return;
+    }
+  }
+}
+
+TEST(ASTEntityMappingTest, BuiltinFunction) {
+  auto AST = tooling::buildASTFromCode(R"(
+    void test() {
+      __builtin_memcpy(0, 0, 0);
+    }
+  )");
+  auto &Ctx = AST->getASTContext();
+
+  // Find the builtin call
+  auto Matcher = callExpr().bind("call");
+  auto Matches = match(Matcher, Ctx);
+  ASSERT_FALSE(Matches.empty());
+
+  const auto *CE = Matches[0].getNodeAs<CallExpr>("call");
+  ASSERT_NE(CE, nullptr);
+
+  const auto *Callee = CE->getDirectCallee();
+  if (Callee && Callee->getBuiltinID()) {
+    auto EntityName = getLocalEntityNameForDecl(Callee);
+    EXPECT_FALSE(EntityName.has_value());
+  }
+}
+
+TEST(ASTEntityMappingTest, UnsupportedDecl) {
+  auto AST = tooling::buildASTFromCode("namespace N {}");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *ND = findDecl<NamespaceDecl>(Ctx, "N");
+  ASSERT_NE(ND, nullptr);
+
+  auto EntityName = getLocalEntityNameForDecl(ND);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturn) {
+  auto AST = tooling::buildASTFromCode("int foo() { return 42; }");
+  auto &Ctx = AST->getASTContext();
+
+  const auto *FD = findDecl<FunctionDecl>(Ctx, "foo");
+  ASSERT_NE(FD, nullptr);
+
+  auto EntityName = getLocalEntityNameForFunctionReturn(FD);
+  EXPECT_TRUE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturnNull) {
+  auto EntityName = getLocalEntityNameForFunctionReturn(nullptr);
+  EXPECT_FALSE(EntityName.has_value());
+}
+
+TEST(ASTEntityMappingTest, FunctionReturnBuiltin) {
+  auto AST = tooling::buildASTFromCode(R"(
+    void test() {
+      __builtin_memcpy(0, 0, 0);
+    }
+  )");
+  auto &Ctx = AST->getASTContext();
+
+  // Find the builtin call
+  auto Matcher = callExpr().bind("call");
+  auto Matches = match(Matcher, Ctx);
+  ASSERT_FALSE(Matches.empty());
+
+  const auto *CE = Matches[0].getNodeAs<CallExpr>("call");
+  ASSERT_NE(CE, nullptr);
+
+  const auto *Callee = CE->getDi...
[truncated]

@github-actions
Copy link

github-actions bot commented Nov 22, 2025

⚠️ C/C++ code formatter, clang-format found issues in your code. ⚠️

You can test this locally with the following command:
git-clang-format --diff origin/main HEAD --extensions cpp,h -- clang/include/clang/Analysis/Scalable/ASTEntityMapping.h clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h clang/include/clang/Analysis/Scalable/Model/EntityName.h clang/lib/Analysis/Scalable/ASTEntityMapping.cpp clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp clang/lib/Analysis/Scalable/Model/EntityName.cpp clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp clang/unittests/Analysis/Scalable/BuildNamespaceTest.cpp clang/unittests/Analysis/Scalable/EntityNameTest.cpp --diff_from_common_commit

⚠️
The reproduction instructions above might return results for more than one PR
in a stack if you are using a stacked PR workflow. You can limit the results by
changing origin/main to the base branch/commit you want to compare against.
⚠️

View the diff from clang-format here.
diff --git a/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
index 9a2c01573..a6ccbd1d1 100644
--- a/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
+++ b/clang/include/clang/Analysis/Scalable/ASTEntityMapping.h
@@ -9,8 +9,8 @@
 #ifndef LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H
 #define LLVM_CLANG_ANALYSIS_SCALABLE_ASTENTITYMAPPING_H
 
-#include "clang/Analysis/Scalable/Model/EntityName.h"
 #include "clang/AST/Decl.h"
+#include "clang/Analysis/Scalable/Model/EntityName.h"
 #include "llvm/ADT/StringRef.h"
 #include <optional>
 
@@ -29,15 +29,17 @@ namespace clang::ssaf {
 ///
 /// \param D The declaration to map. Must not be null.
 ///
-/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
-std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
+/// \return An EntityName if the declaration can be mapped, std::nullopt
+/// otherwise.
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl *D);
 
 /// Maps a function return type to an EntityName.
 ///
 /// \param FD The function declaration. Must not be null.
 ///
 /// \return An EntityName for the function's return type.
-std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD);
+std::optional<EntityName>
+getLocalEntityNameForFunctionReturn(const FunctionDecl *FD);
 
 } // namespace clang::ssaf
 
diff --git a/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
index 6311d4656..5e259cf5e 100644
--- a/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
+++ b/clang/include/clang/Analysis/Scalable/Model/BuildNamespace.h
@@ -17,10 +17,7 @@
 
 namespace clang::ssaf {
 
-enum class BuildNamespaceKind : unsigned short {
-  CompilationUnit,
-  LinkUnit
-};
+enum class BuildNamespaceKind : unsigned short { CompilationUnit, LinkUnit };
 
 llvm::StringRef toString(BuildNamespaceKind BNK);
 
@@ -35,13 +32,13 @@ class BuildNamespace {
 
 public:
   BuildNamespace(BuildNamespaceKind Kind, llvm::StringRef Name)
-    : Kind(Kind), Name(Name.str()) {}
+      : Kind(Kind), Name(Name.str()) {}
 
   static BuildNamespace makeTU(llvm::StringRef CompilationId);
 
-  bool operator==(const BuildNamespace& Other) const;
-  bool operator!=(const BuildNamespace& Other) const;
-  bool operator<(const BuildNamespace& Other) const;
+  bool operator==(const BuildNamespace &Other) const;
+  bool operator!=(const BuildNamespace &Other) const;
+  bool operator<(const BuildNamespace &Other) const;
 
   friend class SerializationFormat;
 };
@@ -55,10 +52,10 @@ class NestedBuildNamespace {
 public:
   NestedBuildNamespace() = default;
 
-  explicit NestedBuildNamespace(const std::vector<BuildNamespace>& Namespaces)
-    : Namespaces(Namespaces) {}
+  explicit NestedBuildNamespace(const std::vector<BuildNamespace> &Namespaces)
+      : Namespaces(Namespaces) {}
 
-  explicit NestedBuildNamespace(const BuildNamespace& N) {
+  explicit NestedBuildNamespace(const BuildNamespace &N) {
     Namespaces.push_back(N);
   }
 
@@ -69,16 +66,17 @@ public:
   /// \param Namespace The namespace to append.
   NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) const {
     auto Copy = *this;
-    Copy.Namespaces.reserve(Copy.Namespaces.size() + Namespace.Namespaces.size());
+    Copy.Namespaces.reserve(Copy.Namespaces.size() +
+                            Namespace.Namespaces.size());
     llvm::append_range(Copy.Namespaces, Namespace.Namespaces);
     return Copy;
   }
 
   bool empty() const;
 
-  bool operator==(const NestedBuildNamespace& Other) const;
-  bool operator!=(const NestedBuildNamespace& Other) const;
-  bool operator<(const NestedBuildNamespace& Other) const;
+  bool operator==(const NestedBuildNamespace &Other) const;
+  bool operator!=(const NestedBuildNamespace &Other) const;
+  bool operator<(const NestedBuildNamespace &Other) const;
 
   friend class JSONWriter;
   friend class LinkUnitResolution;
diff --git a/clang/include/clang/Analysis/Scalable/Model/EntityName.h b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
index ea26d09f3..7d3ee8aa9 100644
--- a/clang/include/clang/Analysis/Scalable/Model/EntityName.h
+++ b/clang/include/clang/Analysis/Scalable/Model/EntityName.h
@@ -18,9 +18,10 @@ namespace clang::ssaf {
 
 /// Uniquely identifies an entity in a program.
 ///
-/// EntityName provides a globally unique identifier for program entities that remains
-/// stable across compilation boundaries. This enables whole-program analysis to track
-/// and relate entities across separately compiled translation units.
+/// EntityName provides a globally unique identifier for program entities that
+/// remains stable across compilation boundaries. This enables whole-program
+/// analysis to track and relate entities across separately compiled translation
+/// units.
 class EntityName {
   std::string USR;
   llvm::SmallString<16> Suffix;
@@ -32,9 +33,9 @@ public:
   EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
              NestedBuildNamespace Namespace);
 
-  bool operator==(const EntityName& Other) const;
-  bool operator!=(const EntityName& Other) const;
-  bool operator<(const EntityName& Other) const;
+  bool operator==(const EntityName &Other) const;
+  bool operator!=(const EntityName &Other) const;
+  bool operator<(const EntityName &Other) const;
 
   /// Creates a new EntityName with additional build namespace qualification.
   ///
diff --git a/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
index 35ff8fa16..3ed76cbfb 100644
--- a/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
+++ b/clang/lib/Analysis/Scalable/ASTEntityMapping.cpp
@@ -18,7 +18,7 @@
 
 namespace clang::ssaf {
 
-std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
+std::optional<EntityName> getLocalEntityNameForDecl(const Decl *D) {
   if (!D)
     return std::nullopt;
 
@@ -34,9 +34,11 @@ std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
   llvm::SmallString<16> Suffix;
   const Decl *USRDecl = D;
 
-  // For parameters, use the parent function's USR with parameter index as suffix
-  if (const auto * PVD = dyn_cast<ParmVarDecl>(D)) {
-    const auto *FD = dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod());
+  // For parameters, use the parent function's USR with parameter index as
+  // suffix
+  if (const auto *PVD = dyn_cast<ParmVarDecl>(D)) {
+    const auto *FD =
+        dyn_cast_or_null<FunctionDecl>(PVD->getParentFunctionOrMethod());
     if (!FD)
       return std::nullopt;
     USRDecl = FD;
@@ -58,7 +60,8 @@ std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D) {
   return EntityName(USRBuf.str(), Suffix, {});
 }
 
-std::optional<EntityName> getLocalEntityNameForFunctionReturn(const FunctionDecl* FD) {
+std::optional<EntityName>
+getLocalEntityNameForFunctionReturn(const FunctionDecl *FD) {
   if (!FD)
     return std::nullopt;
 
diff --git a/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
index 7676d56f8..5b8ad30b1 100644
--- a/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
+++ b/clang/lib/Analysis/Scalable/Model/BuildNamespace.cpp
@@ -13,9 +13,11 @@
 namespace clang::ssaf {
 
 llvm::StringRef toString(BuildNamespaceKind BNK) {
-  switch(BNK) {
-    case BuildNamespaceKind::CompilationUnit: return "compilation_unit";
-    case BuildNamespaceKind::LinkUnit: return "link_unit";
+  switch (BNK) {
+  case BuildNamespaceKind::CompilationUnit:
+    return "compilation_unit";
+  case BuildNamespaceKind::LinkUnit:
+    return "link_unit";
   }
   llvm_unreachable("Unknown BuildNamespaceKind");
 }
@@ -29,40 +31,40 @@ std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str) {
 }
 
 BuildNamespace BuildNamespace::makeTU(llvm::StringRef CompilationId) {
-  return BuildNamespace{BuildNamespaceKind::CompilationUnit, CompilationId.str()};
+  return BuildNamespace{BuildNamespaceKind::CompilationUnit,
+                        CompilationId.str()};
 }
 
-bool BuildNamespace::operator==(const BuildNamespace& Other) const {
+bool BuildNamespace::operator==(const BuildNamespace &Other) const {
   return asTuple() == Other.asTuple();
 }
 
-bool BuildNamespace::operator!=(const BuildNamespace& Other) const {
+bool BuildNamespace::operator!=(const BuildNamespace &Other) const {
   return !(*this == Other);
 }
 
-bool BuildNamespace::operator<(const BuildNamespace& Other) const {
+bool BuildNamespace::operator<(const BuildNamespace &Other) const {
   return asTuple() < Other.asTuple();
 }
 
-NestedBuildNamespace NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) {
+NestedBuildNamespace
+NestedBuildNamespace::makeTU(llvm::StringRef CompilationId) {
   NestedBuildNamespace Result;
   Result.Namespaces.push_back(BuildNamespace::makeTU(CompilationId));
   return Result;
 }
 
-bool NestedBuildNamespace::empty() const {
-  return Namespaces.empty();
-}
+bool NestedBuildNamespace::empty() const { return Namespaces.empty(); }
 
-bool NestedBuildNamespace::operator==(const NestedBuildNamespace& Other) const {
+bool NestedBuildNamespace::operator==(const NestedBuildNamespace &Other) const {
   return Namespaces == Other.Namespaces;
 }
 
-bool NestedBuildNamespace::operator!=(const NestedBuildNamespace& Other) const {
+bool NestedBuildNamespace::operator!=(const NestedBuildNamespace &Other) const {
   return !(*this == Other);
 }
 
-bool NestedBuildNamespace::operator<(const NestedBuildNamespace& Other) const {
+bool NestedBuildNamespace::operator<(const NestedBuildNamespace &Other) const {
   return Namespaces < Other.Namespaces;
 }
 
diff --git a/clang/lib/Analysis/Scalable/Model/EntityName.cpp b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
index b8c1ba8a0..7e66476d6 100644
--- a/clang/lib/Analysis/Scalable/Model/EntityName.cpp
+++ b/clang/lib/Analysis/Scalable/Model/EntityName.cpp
@@ -12,13 +12,13 @@ namespace clang::ssaf {
 
 EntityName::EntityName(llvm::StringRef USR, llvm::StringRef Suffix,
                        NestedBuildNamespace Namespace)
-  : USR(USR.str()), Suffix(Suffix), Namespace(std::move(Namespace)) {}
+    : USR(USR.str()), Suffix(Suffix), Namespace(std::move(Namespace)) {}
 
-bool EntityName::operator==(const EntityName& Other) const {
+bool EntityName::operator==(const EntityName &Other) const {
   return asTuple() == Other.asTuple();
 }
 
-bool EntityName::operator!=(const EntityName& Other) const {
+bool EntityName::operator!=(const EntityName &Other) const {
   return !(*this == Other);
 }
 
diff --git a/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
index 555b782e0..77930655a 100644
--- a/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
+++ b/clang/unittests/Analysis/Scalable/ASTEntityMappingTest.cpp
@@ -94,11 +94,13 @@ TEST(ASTEntityMappingTest, NullDecl) {
 }
 
 TEST(ASTEntityMappingTest, ImplicitDecl) {
-  auto AST = tooling::buildASTFromCode(R"(
+  auto AST = tooling::buildASTFromCode(
+      R"(
     struct S {
       S() = default;
     };
-  )", "test.cpp", std::make_shared<PCHContainerOperations>());
+  )",
+      "test.cpp", std::make_shared<PCHContainerOperations>());
   auto &Ctx = AST->getASTContext();
 
   const auto *RD = findDecl<CXXRecordDecl>(Ctx, "S");

@github-actions
Copy link

github-actions bot commented Nov 22, 2025

🐧 Linux x64 Test Results

  • 111795 tests passed
  • 4481 tests skipped

✅ The build succeeded and all tests passed.

#include "llvm/ADT/StringRef.h"
#include <optional>

namespace clang {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we can use nested namespaces now, like namespace clang::ssaf.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 In many other cases as well.

/// \param D The declaration to map. Must not be null.
///
/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does the "Local" refer to in the name?

LinkUnit
};

std::string toString(BuildNamespaceKind BNK);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: could this return a StringRef?

friend class SerializationFormat;
};

/// Represents a sequence of steps in the build process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it important to preserve the information what entities belong to the same step or could a NestedBuildNamespace be not a different type just the result of merging some BuildNamespaces?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me preface this by saying that the namespace design will possibly evolve when we actually implement entity linking. To some degree this is just an educated guess.

We could just use std::vector<BuildNamespace> instead of introducing another class but I expect that we will have operations on that type and having this class will allow the interfaces to enforce type correctness and to be self-descriptive. Alternatively, we could have BuildNamespace implemented as std::vector<std::pair<BuildNamespaceKind, std::string>> and sink the per-element logic to its implementation.

So, yes, I can imagine there's only a single type but can you please elaborate on why would you prefer that?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not have a strong preference or reason other than I feel like this would make the implementation a bit more concise and I usually prefer the more concise form until there is a need to split functionality out.

Comment on lines 9 to 10
#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in LLVM, the header guards should reflect the relative path.
https://llvm.org/docs/CodingStandards.html#header-guard
This applies to the other header guards as well.

Suggested change
#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_BUILD_NAMESPACE_H
#ifndef LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H
#define LLVM_CLANG_ANALYSIS_SCALABLE_MODEL_BUILDNAMESPACE_H

Comment on lines 66 to 67
for (const auto& N : Namespace.Namespaces)
Copy.Namespaces.push_back(N);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we use llvm::append_range here?
We know ahead of time how many elements we add. Could we reserve?

Comment on lines +26 to +27
std::string USR;
llvm::SmallString<16> Suffix;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the benefit of having two different string types close together?
Unless there is a good reason, for simplicity, I'd just stick with one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my view having more structure in the data makes it more simpler in terms of understanding the content, for example during debugging.

It is pretty likely that in the future we will have to either optimize this scheme or enhance USRs so we don't need suffixes which makes me not want to spend a ton of time polishing this during bring-up.

What do you see as the strongest argument for keeping the strings merged?

bool operator!=(const EntityName& Other) const;
bool operator<(const EntityName& Other) const;

EntityName makeQualified(NestedBuildNamespace Namespace);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, so this isn't a factory function. This is not static. I was surprised about this.
It might be useful to have 1-liner docs for these APIs like this one or in the other classes for the static factory functions about when to use one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right! It's missing the const. I'm adding the doc comments too.

#include "llvm/ADT/StringRef.h"
#include <optional>

namespace clang {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 In many other cases as well.

Comment on lines +224 to +227
const auto *Decl1 = Matches[0].getNodeAs<FunctionDecl>("decl");
const auto *Decl2 = Matches[1].getNodeAs<FunctionDecl>("decl");
ASSERT_NE(Decl1, nullptr);
ASSERT_NE(Decl2, nullptr);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to explicitly check that Decl1 has no body, but Decl2 does.

Comment on lines +268 to +269
// Use recordDecl(isStruct()) to avoid matching implicit typedefs
auto Matcher = recordDecl(hasName("S"), isStruct()).bind("decl");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't get this comment. I see no typedefs or usings in the example.

Comment on lines +301 to +302
ASSERT_GT(Func1->param_size(), 0u);
ASSERT_GT(Func2->param_size(), 0u);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we check exact parameter counts here with EQ?

Comment on lines +286 to +290
TEST(ASTEntityMappingTest, ParmVarDeclRedeclaration) {
auto AST = tooling::buildASTFromCode(R"(
void foo(int x);
void foo(int x) {}
)");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be interesting to see how redeclaration with different parameters names behave.
Parameters with default argument and also c-style variadic parameters could be interesting.
Although, probably they all behave the same.


static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);

NestedBuildNamespace makeQualified(NestedBuildNamespace Namespace) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should document what this does.
That said, this could be a const member function, right?
If it was, that would also give a hint about how to use this API.

Comment on lines +2 to +12
BuildNamespaceTest.cpp
EntityNameTest.cpp
ASTEntityMappingTest.cpp

CLANG_LIBS
clangAnalysisScalable
clangAST
clangASTMatchers
clangBasic
clangSerialization
clangTooling
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please sort these lists?

@usx95 usx95 self-requested a review November 26, 2025 13:59
Copy link
Collaborator

@ymand ymand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm really excited to see these patches start to land! I reviewed the header files, focusing on the API. I'll try to get back to implementation and tests soon, but wanted to get these out in the meantime.


std::optional<BuildNamespaceKind> parseBuildNamespaceKind(llvm::StringRef Str);

/// Represents a single step in the build process.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you expand this comment? It's not immediately obvious how a step in the build process relates to the notion of "namespace". Alternatively (or maybe additionally), consider adding some background explanation to the beginning of the file.

Namespaces.push_back(N);
}

static NestedBuildNamespace makeTU(llvm::StringRef CompilationId);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment.

/// stable across compilation boundaries. This enables whole-program analysis to track
/// and relate entities across separately compiled translation units.
class EntityName {
std::string USR;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be helpful to explain the roles of these fields. Perhaps copy some of the information from the PR description. To the casual reader "USR" won't have much meaning and since the type is string that won't inform them either. Additionally, the need for the suffix won't be obvious. Similarly, the role of Namespace in distinguishing between otherwise identical entities.

Alternatively, provide a detailed explanation on the class comments or the constructor.

class EntityName {
std::string USR;
llvm::SmallString<16> Suffix;
NestedBuildNamespace Namespace;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid of the size implications of including this. USRs are already large, but including an arbitrarily large vector as well, in each ID, threatens to make this unusable for large scale application.

Does the entity name need the full vector, or could something like a unique 64-bit ID suffice?

/// \return An EntityName if the declaration can be mapped, std::nullopt otherwise.
std::optional<EntityName> getLocalEntityNameForDecl(const Decl* D);

/// Maps a function return type to an EntityName.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would help to spell out in more detail what you mean/why this specialization is necessary. It's not really the type that's being identified, or you could just separately identify the type. It's specifically the return entity of this function.


namespace clang {
namespace ssaf {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For both of these lookup functions -- I expect these to be used heavily in analysis, so it would be beneficial if they were shorter names. Currently, the names encode a lot of the (existing) type information. Is that necessary/helpful? Could you instead go with a simpler scheme like getEntity and getReturnEntity?

Separately: constructing these is typically expensive, so we use a cache. Consider including a cache object in this library as well. I think that will be the correct choice for most use cases.

/// EntityName provides a globally unique identifier for program entities that remains
/// stable across compilation boundaries. This enables whole-program analysis to track
/// and relate entities across separately compiled translation units.
class EntityName {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Consider EntityID -- its shorter and (IMO) more intuitive. I think of "name" more for circumstances where the identifier is readable. But, it's a matter of taste.

@@ -0,0 +1,44 @@
//===- ASTMapping.h - AST to SSAF Entity mapping ----------------*- C++ -*-===//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: header name and the comment does not match.

LinkUnit
};

llvm::StringRef toString(BuildNamespaceKind BNK);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: do we need to qualify StringRef?

}

llvm::SmallString<128> USRBuf;
if (clang::index::generateUSRForDecl(USRDecl, USRBuf)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: braces could be dropped here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

clang:analysis clang Clang issues not falling into any other category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants