Skip to content

Creates hashes for different compliers/setups #981

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

sbryngelson
Copy link
Member

@sbryngelson sbryngelson commented Aug 10, 2025

User description

Avoid hashes for staging that are obscure character strings.


PR Type

Enhancement


Description

  • Replace SHA256 hash-based slugs with human-readable identifiers

  • Add compiler detection for build staging directories

  • Include configuration options and chemistry info in slugs

  • Improve build artifact organization and debugging


Diagram Walkthrough

flowchart LR
  A["Old Hash System"] --> B["SHA256 Hash Generation"]
  C["New Slug System"] --> D["Compiler Detection"]
  C --> E["Configuration Parts"]
  C --> F["Chemistry Info"]
  D --> G["Human-readable Slug"]
  E --> G
  F --> G
  B --> H["Obscure Hash String"]
  G --> I["Descriptive Build Path"]
Loading

File Walkthrough

Relevant files
Enhancement
build.py
Replace hash slugs with readable identifiers                         

toolchain/mfc/build.py

  • Added compiler detection method using CMake cache parsing
  • Replaced SHA256 hash-based slug generation with human-readable format
  • Incorporated configuration options, chemistry, and compiler info into
    slugs
  • Added new imports for regex, pathlib, subprocess, and tempfile
+71/-7   

@Copilot Copilot AI review requested due to automatic review settings August 10, 2025 16:07
@sbryngelson sbryngelson requested a review from a team as a code owner August 10, 2025 16:07
Copy link

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

⏱️ Estimated effort to review: 3 🔵🔵🔵⚪⚪
🧪 No relevant tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Possible Issue

get_compiler_info swallows CMake failures; with check=True and stdout/stderr redirected to DEVNULL, a failure will raise and abort slug generation instead of degrading gracefully.

with tempfile.TemporaryDirectory() as build_dir:
    subprocess.run(
        ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
        check=True,
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )
Logic Error

Compiler detection uses multiple ifs instead of elif; later checks can overwrite earlier matches (e.g., 'ifx' then 'clang'), yielding incorrect compiler IDs.

if "gfortran" in name:
    compiler = "gnu"
if "ifort" in name or "ifx" in name:
    compiler = "intel"
if "nvfortran" in name:
    compiler = "nvhpc"
if name == "ftn" or "cray" in name:
    compiler = "cray"
if "pgfortran" in name:
    compiler = "pgi"
if "clang" in name:
    compiler = "clang"
if "flang" in name:
    compiler = "flang"
return compiler
Performance/Side Effects

get_compiler_info runs CMake on every slug generation; this is expensive and creates temp dirs repeatedly. Consider caching or deriving from existing build metadata.

with tempfile.TemporaryDirectory() as build_dir:
    subprocess.run(
        ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
        check=True,
        stdout=subprocess.DEVNULL,
        stderr=subprocess.DEVNULL,
    )

    cache = pathlib.Path(build_dir) / "CMakeCache.txt"
    if not cache.exists():
        compiler = "unknown"

    pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
    compiler_name = ""

    for line in cache.read_text(errors="ignore").splitlines():
        if "fortran" in line.lower():
            m = pat.search(line)
            if m:
                compiler_name = pathlib.Path(m.group(1).strip()).name
                break

name = compiler_name.lower()
if "gfortran" in name:
    compiler = "gnu"
if "ifort" in name or "ifx" in name:
    compiler = "intel"
if "nvfortran" in name:
    compiler = "nvhpc"
if name == "ftn" or "cray" in name:
    compiler = "cray"
if "pgfortran" in name:
    compiler = "pgi"
if "clang" in name:
    compiler = "clang"
if "flang" in name:
    compiler = "flang"
return compiler

Copy link

qodo-merge-pro bot commented Aug 10, 2025

PR Code Suggestions ✨

Latest suggestions up to 4a57558

CategorySuggestion                                                                                                                                    Impact
Possible issue
Isolate compiler detection build

Running CMake in the repository root may modify or conflict with the actual
build, and ignores the active case’s CMake configuration. Restrict the probe to
a minimal, isolated CMake project that only detects the Fortran compiler. This
avoids side-effects and makes detection reliable even if the main project
requires specific options.

toolchain/mfc/build.py [39-81]

 def get_compiler_info(self) -> str:
     compiler = "unknown"
-    with tempfile.TemporaryDirectory() as build_dir:
+    # Create an isolated minimal CMake project to detect Fortran compiler without touching repo build dirs
+    cmake_lists = "cmake_minimum_required(VERSION 3.10)\nproject(DetectFC LANGUAGES Fortran)\n"
+    with tempfile.TemporaryDirectory() as tmpdir:
+        tmp_path = pathlib.Path(tmpdir)
+        (tmp_path / "CMakeLists.txt").write_text(cmake_lists)
+        build_dir = tmp_path / "build"
         try:
             subprocess.run(
-                ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
+                ["cmake", "-S", str(tmp_path), "-B", str(build_dir)],
                 check=True,
                 stdout=subprocess.DEVNULL,
                 stderr=subprocess.DEVNULL,
             )
         except (subprocess.CalledProcessError, FileNotFoundError):
             return "unknown"
-
-        cache = pathlib.Path(build_dir) / "CMakeCache.txt"
+        cache = build_dir / "CMakeCache.txt"
         if not cache.exists():
             return "unknown"
-
-        pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
+        pat = re.compile(r"CMAKE_Fortran_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
         compiler_name = ""
-
         for line in cache.read_text(errors="ignore").splitlines():
-            if "fortran" in line.lower():
-                m = pat.search(line)
-                if m:
-                    compiler_name = pathlib.Path(m.group(1).strip()).name
-                    break
-
+            m = pat.search(line)
+            if m:
+                compiler_name = pathlib.Path(m.group(1).strip()).name
+                break
     name = compiler_name.lower()
     if "gfortran" in name:
         compiler = "gnu"
     elif "ifort" in name or "ifx" in name:
         compiler = "intel"
     elif "nvfortran" in name:
         compiler = "nvhpc"
     elif name == "ftn" or "cray" in name:
         compiler = "cray"
     elif "pgfortran" in name:
         compiler = "pgi"
     elif "clang" in name:
         compiler = "clang"
     elif "flang" in name:
         compiler = "flang"
     return compiler
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a major robustness issue where running cmake in the project root can cause side effects or fail, and it proposes a much safer, isolated approach that is standard practice for this kind of toolchain probing.

High
Sanitize and limit slug length

Replacing the previous hash with a readable slug risks exceeding filesystem path
limits and producing illegal characters from config keys or solution names.
Normalize parts to safe characters and truncate the final slug to a reasonable
length to avoid build directory creation failures.

toolchain/mfc/build.py [83-114]

 def get_slug(self, case: Case ) -> str:
     if self.isDependency:
         return self.name
-
-    # Start with target name
-    parts = [self.name]
-
-    # Add active configuration options
+    def sanitize(s: str) -> str:
+        # allow alnum, dash and underscore
+        return re.sub(r"[^A-Za-z0-9_\-]+", "-", s).strip("-_").lower()
+    parts = [sanitize(self.name)]
     cfg = CFG()
     cfg_parts = []
     for key, value in sorted(cfg.items()):
-        if value:  # Only include enabled options
-            cfg_parts.append(key)
-
+        if value:
+            cfg_parts.append(sanitize(str(key)))
     if cfg_parts:
         parts.append('-'.join(cfg_parts))
-
-    # Add chemistry info if enabled
     if case.params.get('chemistry', 'F') == 'T':
-        parts.append(f"chem-{case.get_cantera_solution().name}")
-
-    # Add case optimization if enabled
+        parts.append(f"chem-{sanitize(case.get_cantera_solution().name)}")
     if case.params.get('case_optimization', False):
         parts.append('opt')
-
-    # Add compiler identifier
     compiler_id = self.get_compiler_info()
     if compiler_id:
-        parts.append(f"fc-{compiler_id}")
+        parts.append(f"fc-{sanitize(compiler_id)}")
+    slug = '_'.join(parts)
+    # truncate to avoid overly long paths
+    return slug[:80]
 
-    # Join all parts with underscores
-    return '_'.join(parts)
-
Suggestion importance[1-10]: 9

__

Why: This suggestion addresses a critical oversight in the new get_slug function, which could lead to build failures due to illegal characters in file paths or exceeding filesystem path length limits, making the build process more robust.

High
Use exact Fortran compiler key

The current pattern and filtering can mistakenly capture C/C++ compiler entries
and miss the exact Fortran key due to greedy matching. Match specifically on
CMAKE_Fortran_COMPILER to avoid false positives and ensure correct detection.

toolchain/mfc/build.py [56-64]

-pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
+pat = re.compile(r"^CMAKE_Fortran_COMPILER:FILEPATH=(.+)$", re.IGNORECASE)
 compiler_name = ""
+for line in cache.read_text(errors="ignore").splitlines():
+    m = pat.search(line)
+    if m:
+        compiler_name = pathlib.Path(m.group(1).strip()).name
+        break
 
-for line in cache.read_text(errors="ignore").splitlines():
-    if "fortran" in line.lower():
-        m = pat.search(line)
-        if m:
-            compiler_name = pathlib.Path(m.group(1).strip()).name
-            break
-
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out that the regex for compiler detection is too broad and could lead to incorrect matches; using a more specific pattern for CMAKE_Fortran_COMPILER significantly improves the reliability of the detection logic.

Medium
  • More

Previous suggestions

Suggestions up to commit 4a57558
CategorySuggestion                                                                                                                                    Impact
Possible issue
Sanitize and bound slug length

The new slug may exceed filesystem/path limits and introduce characters from
configuration keys/values not safe for paths. Sanitize components and bound the
slug length, preserving uniqueness with a short hash suffix.

toolchain/mfc/build.py [83-114]

 def get_slug(self, case: Case ) -> str:
     if self.isDependency:
         return self.name
 
-    # Start with target name
-    parts = [self.name]
+    import hashlib
+    def sanitize(s: str) -> str:
+        # allow alnum, dash, underscore only
+        return re.sub(r"[^A-Za-z0-9._-]+", "-", s).strip("-._")
 
-    # Add active configuration options
+    parts: typing.List[str] = [sanitize(self.name)]
+
     cfg = CFG()
     cfg_parts = []
     for key, value in sorted(cfg.items()):
-        if value:  # Only include enabled options
-            cfg_parts.append(key)
-
+        if value:
+            cfg_parts.append(sanitize(str(key)))
     if cfg_parts:
         parts.append('-'.join(cfg_parts))
 
-    # Add chemistry info if enabled
     if case.params.get('chemistry', 'F') == 'T':
-        parts.append(f"chem-{case.get_cantera_solution().name}")
+        parts.append(sanitize(f"chem-{case.get_cantera_solution().name}"))
 
-    # Add case optimization if enabled
     if case.params.get('case_optimization', False):
         parts.append('opt')
 
-    # Add compiler identifier
     compiler_id = self.get_compiler_info()
     if compiler_id:
-        parts.append(f"fc-{compiler_id}")
+        parts.append(sanitize(f"fc-{compiler_id}"))
 
-    # Join all parts with underscores
-    return '_'.join(parts)
+    slug = '_'.join(filter(None, parts)) or "target"
+    # Enforce max length with a stable hash suffix
+    max_len = 80
+    if len(slug) > max_len:
+        h = hashlib.sha256(slug.encode()).hexdigest()[:10]
+        slug = f"{slug[:max_len-11]}_{h}"
+    return slug
Suggestion importance[1-10]: 8

__

Why: This suggestion addresses a critical oversight in the PR, where the new human-readable slug could exceed filesystem path limits or contain unsafe characters, leading to runtime errors. The proposed sanitization and length enforcement are essential for robustness.

Medium
General
Cache and narrow compiler detection

Avoid invoking CMake on every call; it is expensive and may change state. Cache
the detected compiler once per process or derive it from existing
configuration/environment instead of reconfiguring. Also constrain the regex to
Fortran compiler entries to prevent false matches.

toolchain/mfc/build.py [39-81]

 def get_compiler_info(self) -> str:
+    # Simple in-process cache to avoid repeated cmake runs
+    if hasattr(self, "_cached_compiler_id"):
+        return getattr(self, "_cached_compiler_id")  # type: ignore[attr-defined]
+
     compiler = "unknown"
     with tempfile.TemporaryDirectory() as build_dir:
         try:
             subprocess.run(
                 ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
                 check=True,
                 stdout=subprocess.DEVNULL,
                 stderr=subprocess.DEVNULL,
             )
         except (subprocess.CalledProcessError, FileNotFoundError):
-            return "unknown"
+            setattr(self, "_cached_compiler_id", compiler)
+            return compiler
 
         cache = pathlib.Path(build_dir) / "CMakeCache.txt"
         if not cache.exists():
-            return "unknown"
+            setattr(self, "_cached_compiler_id", compiler)
+            return compiler
 
-        pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
+        # Only match the Fortran compiler entry
+        pat = re.compile(r"^CMAKE_Fortran_COMPILER:FILEPATH=(.+)$", re.IGNORECASE)
         compiler_name = ""
-
         for line in cache.read_text(errors="ignore").splitlines():
-            if "fortran" in line.lower():
-                m = pat.search(line)
-                if m:
-                    compiler_name = pathlib.Path(m.group(1).strip()).name
-                    break
+            m = pat.search(line)
+            if m:
+                compiler_name = pathlib.Path(m.group(1).strip()).name
+                break
 
     name = compiler_name.lower()
     if "gfortran" in name:
         compiler = "gnu"
     elif "ifort" in name or "ifx" in name:
         compiler = "intel"
     elif "nvfortran" in name:
         compiler = "nvhpc"
     elif name == "ftn" or "cray" in name:
         compiler = "cray"
     elif "pgfortran" in name:
         compiler = "pgi"
     elif "clang" in name:
         compiler = "clang"
     elif "flang" in name:
         compiler = "flang"
+
+    setattr(self, "_cached_compiler_id", compiler)
     return compiler
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that running cmake is expensive and proposes an effective caching mechanism, which significantly improves performance. It also improves the robustness of compiler detection with a more specific regex.

Medium
Normalize configuration in slug

Including all truthy values may make unrelated numeric/string settings collapse
(e.g., "0" vs 0 vs False) and destabilize the slug across environments.
Normalize booleans only, or include key=value for non-boolean settings to avoid
collisions.

toolchain/mfc/build.py [93-95]

 for key, value in sorted(cfg.items()):
-    if value:  # Only include enabled options
-        cfg_parts.append(key)
+    # Normalize: include boolean flags by key; include key=value for others
+    if isinstance(value, bool):
+        if value:
+            cfg_parts.append(str(key))
+    else:
+        cfg_parts.append(f"{key}={value}")
Suggestion importance[1-10]: 6

__

Why: The suggestion correctly points out that treating all truthy values as simple boolean flags can lead to ambiguous or unstable slugs. The proposed change to include key=value for non-boolean settings makes the generated slug more precise and robust.

Low
Incremental [*]
Handle missing compiler detection

Guard against empty compiler_name before .lower() to avoid an
UnboundLocalError/misclassification when no compiler match is found; return
"unknown" early if no match is detected.

toolchain/mfc/build.py [39-81]

 def get_compiler_info(self) -> str:
     compiler = "unknown"
     with tempfile.TemporaryDirectory() as build_dir:
         try:
             subprocess.run(
                 ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
                 check=True,
                 stdout=subprocess.DEVNULL,
                 stderr=subprocess.DEVNULL,
             )
         except (subprocess.CalledProcessError, FileNotFoundError):
             return "unknown"
 
         cache = pathlib.Path(build_dir) / "CMakeCache.txt"
         if not cache.exists():
             return "unknown"
 
         pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
         compiler_name = ""
 
         for line in cache.read_text(errors="ignore").splitlines():
             if "fortran" in line.lower():
                 m = pat.search(line)
                 if m:
                     compiler_name = pathlib.Path(m.group(1).strip()).name
                     break
 
+    if not compiler_name:
+        return "unknown"
+
     name = compiler_name.lower()
     if "gfortran" in name:
         compiler = "gnu"
     elif "ifort" in name or "ifx" in name:
         compiler = "intel"
     elif "nvfortran" in name:
         compiler = "nvhpc"
     elif name == "ftn" or "cray" in name:
         compiler = "cray"
     elif "pgfortran" in name:
         compiler = "pgi"
     elif "clang" in name:
         compiler = "clang"
     elif "flang" in name:
         compiler = "flang"
     return compiler
Suggestion importance[1-10]: 2

__

Why: The suggestion's reasoning about an UnboundLocalError is incorrect as compiler_name is initialized to "", but the proposed change for an early exit is a minor code quality improvement.

Low
Suggestions up to commit d42a0f7
CategorySuggestion                                                                                                                                    Impact
Possible issue
Isolate and correctly detect Fortran compiler

Running CMake in the repository root can clobber or depend on the project's main
build and fails if the source lacks a CMakeLists. Point CMake to an empty temp
source dir with a minimal CMakeLists that only detects the Fortran compiler.
Also restrict the cache regex to Fortran to avoid picking up C/C++ compilers on
multi-language projects.

toolchain/mfc/build.py [39-82]

 def get_compiler_info(self) -> str:
     compiler = "unknown"
-    with tempfile.TemporaryDirectory() as build_dir:
+    with tempfile.TemporaryDirectory() as build_dir, tempfile.TemporaryDirectory() as src_dir:
+        # Create a minimal CMake project to probe the Fortran compiler without touching the repo
+        cmakelists = pathlib.Path(src_dir) / "CMakeLists.txt"
+        cmakelists.write_text("cmake_minimum_required(VERSION 3.10)\nproject(FCProbe Fortran)\n", encoding="utf-8")
         try:
             subprocess.run(
-                ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
+                ["cmake", "-B", build_dir, "-S", src_dir, "-DCMAKE_VERBOSE_MAKEFILE=ON"],
                 check=True,
                 stdout=subprocess.DEVNULL,
                 stderr=subprocess.DEVNULL,
             )
         except (subprocess.CalledProcessError, FileNotFoundError):
             return "unknown"
 
         cache = pathlib.Path(build_dir) / "CMakeCache.txt"
         if not cache.exists():
-            compiler = "unknown"
-            return compiler
+            return "unknown"
 
-        pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
+        # Only match the Fortran compiler entry
+        pat = re.compile(r"^CMAKE_Fortran_COMPILER:FILEPATH=(.+)$", re.IGNORECASE)
         compiler_name = ""
-
         for line in cache.read_text(errors="ignore").splitlines():
-            if "fortran" in line.lower():
-                m = pat.search(line)
-                if m:
-                    compiler_name = pathlib.Path(m.group(1).strip()).name
-                    break
+            m = pat.search(line)
+            if m:
+                compiler_name = pathlib.Path(m.group(1).strip()).name
+                break
 
     name = compiler_name.lower()
     if "gfortran" in name:
-        compiler = "gnu"
-    elif "ifort" in name or "ifx" in name:
-        compiler = "intel"
-    elif "nvfortran" in name:
-        compiler = "nvhpc"
-    elif name == "ftn" or "cray" in name:
-        compiler = "cray"
-    elif "pgfortran" in name:
-        compiler = "pgi"
-    elif "clang" in name:
-        compiler = "clang"
-    elif "flang" in name:
-        compiler = "flang"
-    return compiler
+        return "gnu"
+    if "ifort" in name or "ifx" in name:
+        return "intel"
+    if "nvfortran" in name:
+        return "nvhpc"
+    if name == "ftn" or "cray" in name:
+        return "cray"
+    if "pgfortran" in name:
+        return "pgi"
+    if "flang" in name:
+        return "flang"
+    # Avoid misclassifying LLVM toolchains by 'clang' path components for Fortran
+    return "unknown"
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies that running cmake in the project root is fragile and can cause side effects, proposing a much more robust solution by creating an isolated, minimal CMake project for compiler detection.

High
General
Sanitize and bound slug length

Using joined raw config keys may produce very long or filesystem-unfriendly path
segments. Enforce safe characters and length, and hash the tail to keep the slug
deterministic yet bounded. This prevents path length errors and collisions on
different platforms.

toolchain/mfc/build.py [84-115]

 def get_slug(self, case: Case ) -> str:
     if self.isDependency:
         return self.name
 
-    # Start with target name
     parts = [self.name]
 
-    # Add active configuration options
+    # Add active configuration options (enabled keys only)
     cfg = CFG()
-    cfg_parts = []
-    for key, value in sorted(cfg.items()):
-        if value:  # Only include enabled options
-            cfg_parts.append(key)
+    enabled_keys = [k for k, v in sorted(cfg.items()) if v]
+    if enabled_keys:
+        cfg_segment = '-'.join(enabled_keys)
+        # Sanitize: allow alnum, dash, underscore
+        cfg_segment = re.sub(r"[^A-Za-z0-9_-]+", "_", cfg_segment)
+        parts.append(cfg_segment)
 
-    if cfg_parts:
-        parts.append('-'.join(cfg_parts))
+    if case.params.get('chemistry', 'F') == 'T':
+        chem_name = case.get_cantera_solution().name
+        chem_name = re.sub(r"[^A-Za-z0-9._-]+", "_", str(chem_name))
+        parts.append(f"chem-{chem_name}")
 
-    # Add chemistry info if enabled
-    if case.params.get('chemistry', 'F') == 'T':
-        parts.append(f"chem-{case.get_cantera_solution().name}")
-
-    # Add case optimization if enabled
     if case.params.get('case_optimization', False):
         parts.append('opt')
 
-    # Add compiler identifier
     compiler_id = self.get_compiler_info()
     if compiler_id:
         parts.append(f"fc-{compiler_id}")
 
-    # Join all parts with underscores
-    return '_'.join(parts)
+    slug = '_'.join(parts)
+    # Bound length; keep prefix readable, hash the remainder
+    max_len = 100
+    if len(slug) > max_len:
+        import hashlib
+        h = hashlib.sha256(slug.encode()).hexdigest()[:10]
+        slug = f"{slug[:max_len-12]}--{h}"
+    return slug
Suggestion importance[1-10]: 9

__

Why: The suggestion addresses a critical oversight in the PR, where the generated human-readable slug could become too long or contain characters invalid for file paths, preventing potential filesystem errors.

High
Omit unknown compiler tag

When compiler detection fails, compiler_id becomes "unknown", which pollutes
cache keys and prevents distinction across real compilers. Only append the
compiler segment when detection succeeds (non-"unknown"), else omit the tag.

toolchain/mfc/build.py [110-112]

 compiler_id = self.get_compiler_info()
-if compiler_id:
+if compiler_id and compiler_id != "unknown":
     parts.append(f"fc-{compiler_id}")
Suggestion importance[1-10]: 5

__

Why: This is a good suggestion that improves the logic by not including the fc-unknown tag in the build slug when compiler detection fails, making the resulting slug cleaner and more sensible.

Low
Incremental [*]
Use stable source directory

Avoid invoking CMake from arbitrary working directories; point -S to a known
source root (e.g., repository root) or detect it relative to this file to
prevent failures when get_compiler_info is called from other locations.

toolchain/mfc/build.py [39-82]

 def get_compiler_info(self) -> str:
     compiler = "unknown"
+    repo_root = pathlib.Path(__file__).resolve().parents[2]
     with tempfile.TemporaryDirectory() as build_dir:
         try:
             subprocess.run(
-                ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
+                ["cmake", "-B", build_dir, "-S", str(repo_root), "-DCMAKE_VERBOSE_MAKEFILE=ON"],
                 check=True,
                 stdout=subprocess.DEVNULL,
                 stderr=subprocess.DEVNULL,
             )
         except (subprocess.CalledProcessError, FileNotFoundError):
             return "unknown"
 
         cache = pathlib.Path(build_dir) / "CMakeCache.txt"
         if not cache.exists():
-            compiler = "unknown"
-            return compiler
+            return "unknown"
 
         pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
         compiler_name = ""
 
         for line in cache.read_text(errors="ignore").splitlines():
             if "fortran" in line.lower():
                 m = pat.search(line)
                 if m:
                     compiler_name = pathlib.Path(m.group(1).strip()).name
                     break
 
     name = compiler_name.lower()
     if "gfortran" in name:
         compiler = "gnu"
     elif "ifort" in name or "ifx" in name:
         compiler = "intel"
     elif "nvfortran" in name:
         compiler = "nvhpc"
     elif name == "ftn" or "cray" in name:
         compiler = "cray"
     elif "pgfortran" in name:
         compiler = "pgi"
     elif "clang" in name:
         compiler = "clang"
     elif "flang" in name:
         compiler = "flang"
     return compiler
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies that running cmake with -S . is fragile, as it depends on the current working directory. Using an absolute path to the source root based on __file__ makes the function more robust and reliable.

Medium
✅ Suggestions up to commit 4952826
CategorySuggestion                                                                                                                                    Impact
High-level
Avoid side-effectful compiler detection

The new get_compiler_info spins up a temporary CMake configure on every slug
computation, which introduces heavy, fragile side effects (requires cmake,
internet-less env assumptions, correct source root, and write permissions) and
makes slugs non-deterministic across hosts/containers. Consider deriving the
compiler identifier from existing, already-known build context (e.g.,
environment variables, toolchain/config files, or the actual build’s CMake cache
once created) or make it explicitly user-configured to keep slug generation
fast, deterministic, and reproducible.

Examples:

toolchain/mfc/build.py [39-78]
def get_compiler_info(self) -> str:
        compiler = "unknown"
        with tempfile.TemporaryDirectory() as build_dir:
            subprocess.run(
                ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
                check=True,
                stdout=subprocess.DEVNULL,
                stderr=subprocess.DEVNULL,
            )


 ... (clipped 30 lines)

Solution Walkthrough:

Before:

# In Target class
def get_compiler_info(self) -> str:
    # Create a temporary directory
    # Run a full 'cmake' configure process inside it
    # Parse the generated CMakeCache.txt to find the compiler
    # Return a short name for the compiler
    ...

def get_slug(self, case: Case) -> str:
    ...
    compiler_id = self.get_compiler_info() # Heavy, non-deterministic call
    parts.append(f"fc-{compiler_id}")
    return '_'.join(parts)

After:

# In Target class
def get_compiler_info(self, build_config) -> str:
    # Deterministically get compiler from a reliable source,
    # e.g., from an environment variable or a config object.
    compiler_path = build_config.get_fortran_compiler() # Or os.environ.get("FC")
    compiler_name = self.get_compiler_short_name(compiler_path)
    return compiler_name

def get_slug(self, case: Case, build_config) -> str:
    ...
    compiler_id = self.get_compiler_info(build_config) # Fast & deterministic
    parts.append(f"fc-{compiler_id}")
    return '_'.join(parts)
Suggestion importance[1-10]: 9

__

Why: The suggestion correctly identifies a critical design flaw in get_compiler_info, which introduces significant performance overhead and non-determinism by running a temporary cmake process, undermining the goal of reliable build artifact organization.

High
Possible issue
Early-return if cache missing
Suggestion Impact:The commit changed the code to return "unknown" immediately when the cache file does not exist, matching the suggested early return; it also added a try/except around cmake invocation.

code diff:

             cache = pathlib.Path(build_dir) / "CMakeCache.txt"
             if not cache.exists():
-                compiler = "unknown"
+                return "unknown"
 

After determining the cache is missing, the function continues and later reads
from it, risking a FileNotFoundError. Early-return when the cache is absent to
avoid exceptions and undefined state.

toolchain/mfc/build.py [49-51]

 cache = pathlib.Path(build_dir) / "CMakeCache.txt"
 if not cache.exists():
-    compiler = "unknown"
+    return "unknown"
Suggestion importance[1-10]: 9

__

Why: This suggestion correctly identifies a bug that would cause a FileNotFoundError if the CMakeCache.txt file is not created, and the proposed fix prevents the program from crashing.

High
Expose CMake errors during detection
Suggestion Impact:The commit wrapped the CMake invocation in a try/except and returned "unknown" on failure, addressing error handling. However, it continued to suppress stderr instead of capturing and displaying it, so only part of the suggestion was implemented.

code diff:

+            try:
+                subprocess.run(
+                    ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
+                    check=True,
+                    stdout=subprocess.DEVNULL,
+                    stderr=subprocess.DEVNULL,
+                )
+            except (subprocess.CalledProcessError, FileNotFoundError):
+                return "unknown"

Swallowing both stdout and stderr hides CMake errors and makes debugging
impossible. Capture stderr and surface a concise error if configuration fails,
rather than crashing with a generic exception.

toolchain/mfc/build.py [42-47]

-subprocess.run(
-    ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
-    check=True,
-    stdout=subprocess.DEVNULL,
-    stderr=subprocess.DEVNULL,
-)
+try:
+    subprocess.run(
+        ["cmake", "-B", build_dir, "-S", ".", "-DCMAKE_VERBOSE_MAKEFILE=ON"],
+        check=True,
+        stdout=subprocess.DEVNULL,
+        stderr=subprocess.PIPE,
+        text=True,
+    )
+except subprocess.CalledProcessError as e:
+    cons.print(f"[red]CMake configuration failed while detecting compiler:[/red] {e.stderr.strip() if e.stderr else 'no stderr'}")
+    return "unknown"
Suggestion importance[1-10]: 8

__

Why: This suggestion correctly identifies that suppressing stderr hides crucial debugging information and proposes a robust error handling mechanism that improves debuggability.

Medium
Make compiler match mutually exclusive
Suggestion Impact:The commit replaced the sequential if statements with an elif chain for compiler matching, preventing later matches from overwriting earlier ones. It kept/used "unknown" as the default.

code diff:

         name = compiler_name.lower()
         if "gfortran" in name:
             compiler = "gnu"
-        if "ifort" in name or "ifx" in name:
+        elif "ifort" in name or "ifx" in name:
             compiler = "intel"
-        if "nvfortran" in name:
+        elif "nvfortran" in name:
             compiler = "nvhpc"
-        if name == "ftn" or "cray" in name:
+        elif name == "ftn" or "cray" in name:
             compiler = "cray"
-        if "pgfortran" in name:
+        elif "pgfortran" in name:
             compiler = "pgi"
-        if "clang" in name:
+        elif "clang" in name:
             compiler = "clang"
-        if "flang" in name:
+        elif "flang" in name:
             compiler = "flang"
         return compiler

The sequential 'if' chain allows later matches to overwrite earlier ones (e.g.,
names containing multiple substrings), leading to incorrect identifiers. Use
mutually exclusive branching and normalize unknowns to ensure a single,
deterministic result.

toolchain/mfc/build.py [63-78]

 name = compiler_name.lower()
 if "gfortran" in name:
     compiler = "gnu"
-if "ifort" in name or "ifx" in name:
+elif "ifort" in name or "ifx" in name:
     compiler = "intel"
-if "nvfortran" in name:
+elif "nvfortran" in name:
     compiler = "nvhpc"
-if name == "ftn" or "cray" in name:
+elif name == "ftn" or "cray" in name:
     compiler = "cray"
-if "pgfortran" in name:
+elif "pgfortran" in name:
     compiler = "pgi"
-if "clang" in name:
+elif "clang" in name:
     compiler = "clang"
-if "flang" in name:
+elif "flang" in name:
     compiler = "flang"
+else:
+    compiler = "unknown"
 return compiler
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly points out a potential logic flaw in the series of if statements and improves the code's robustness by using elif for mutually exclusive checks.

Medium

Copilot

This comment was marked as outdated.

@sbryngelson sbryngelson requested a review from Copilot August 10, 2025 16:20
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the existing SHA256 hash-based build staging directory naming system with human-readable identifiers to improve debugging and build artifact organization. The change makes it easier to identify different compiler configurations and build setups at a glance.

Key Changes:

  • Replace SHA256 hash generation with descriptive slug components
  • Add compiler detection through CMake cache parsing
  • Include configuration options, chemistry info, and compiler type in directory names

compiler = "unknown"
return compiler

pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
Copy link
Preview

Copilot AI Aug 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex pattern is compiled inside the method on every call. Consider moving this to module level or class level as a constant to avoid recompilation.

Suggested change
pat = re.compile(r"CMAKE_.*_COMPILER:FILEPATH=(.+)", re.IGNORECASE)
pat = CMAKE_COMPILER_REGEX

Copilot uses AI. Check for mistakes.

@sbryngelson
Copy link
Member Author

/improve

Copy link

Persistent suggestions updated to latest commit 4a57558

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging this pull request may close these issues.

1 participant