Skip to content

Conversation

chloerh
Copy link
Contributor

@chloerh chloerh commented Sep 16, 2025

  • RHEL-187329 - [Topology] Configure 'die_id' in CPU topology part for VM

Test result:
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_5: STARTED
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_5: PASS (29.71 s)
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_2: STARTED
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_2: PASS (27.66 s)
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_11: STARTED
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_11: PASS (50.99 s)

Summary by CodeRabbit

  • Tests
    • Added comprehensive CPU die topology validation for host and guest.
    • Introduces variants for 2, 5, and 11 dies with aligned vCPU and NUMA configurations.
    • Collects per-CPU topology from within guests and verifies distribution across sockets, dies, cores, and threads.
    • Ensures die IDs and CPU groupings match expectations; reports structured errors.
    • Includes VM lifecycle handling and cleanup during test execution.

- RHEL-187329 - [Topology] Configure 'die_id' in CPU topology part
for VM

Signed-off-by: Haijiao Zhao <[email protected]>
Copy link

coderabbitai bot commented Sep 16, 2025

Walkthrough

Adds a new CPU topology die-id test: a config defining NUMA cells and die variants, and a Python test that parses guest CPU topology via sysfs, validates die groupings/counts, and optionally checks host capabilities. Test runs in host_capabilities or on_vm modes, with VM XML setup/teardown.

Changes

Cohort / File(s) Summary of Changes
CPU topology die-id test configuration
libvirt/tests/cfg/cpu/cpu_topology_die_id.cfg
New config defining cpu_topology_die_id. Adds on_vm variants for dies (2, 5, 11). Specifies NUMA cells and builds cpu/topology attrs using parameter expansion. Sets VM vcpu count as dies\*8 and strict auto NUMA placement.
CPU topology die-id test logic
libvirt/tests/src/cpu/cpu_topology_die_id.py
New test module. Adds parsing of per-CPU topology output, validation against die count, host capability checks, VM startup, in-guest sysfs probing, result aggregation, and cleanup. Exposes parse_cpu_topology, check_cpu_topology, run; adds VIRSH_ARGS and LOG.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester
  participant Runner as Test Runner
  participant Libvirt as libvirt/virsh
  participant VM as Guest VM
  participant Guest as Guest OS
  participant SysFS as /sys/devices/system/cpu/...

  rect rgb(245,245,255)
    note right of Runner: Mode: host_capabilities
    Tester->>Runner: run(cpu_topology_die_id, host_capabilities)
    Runner->>Libvirt: capability_xml (host NUMA/die info)
    Libvirt-->>Runner: XML capabilities
    Runner->>Runner: Validate die_id distribution per cell
    Runner-->>Tester: Pass/Fail
  end

  rect rgb(245,255,245)
    note right of Runner: Mode: on_vm
    Tester->>Runner: run(cpu_topology_die_id, on_vm, dies=N)
    Runner->>Libvirt: Define VM XML (vcpu=N*8, NUMA, CPU topology)
    Libvirt-->>Runner: VM defined
    Runner->>Libvirt: Start VM
    Libvirt-->>VM: Boot
    Runner->>Guest: Execute topology probe script
    Guest->>SysFS: Read cpu*/topology
    SysFS-->>Guest: per-CPU fields (core, socket, die)
    Guest-->>Runner: Topology text
    Runner->>Runner: parse_cpu_topology()
    Runner->>Runner: check_cpu_topology(die_count=N)
    alt Validation OK
      Runner-->>Tester: Pass with summary
    else Validation errors
      Runner-->>Tester: Fail with errors
    end
    Runner->>Libvirt: Restore/undefine VM XML
  end
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–90 minutes

Poem

I thump my paw in tidy rows,
Counting dies where vCPU flows—
Two, five, eleven: neat arrays,
NUMA moons in socket haze.
I sniff the sysfs breeze with glee,
“Topology’s true?” Yes—let’s see!
Hop, hop—tests pass; more clover for me. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Add new cpu case of check die id" directly reflects the primary change in the PR—adding a new CPU test that validates die_id topology (cfg and test module), so it is related and focused on the main intent. It is concise and does not include file lists, emojis, or unrelated information. The phrasing is slightly non‑idiomatic and could be clarified for readability.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (7)
libvirt/tests/cfg/cpu/cpu_topology_die_id.cfg (1)

14-16: NUMA cpus mapping is fixed to 0-15; dies=5/11 will leave most vCPUs unmapped.

For dies=5 and 11, vcpu = dies*8 yields 40 and 88 vCPUs, but NUMA cells only cover 0-15. This risks undefined placement or libvirt rejection depending on version. Make the cpulists depend on dies.

Apply this diff to generate dynamic ranges per socket:

-            cpu_numa_attrs = {'numa_cell': [{'id': '0', 'discard': 'yes', 'memory': '512000', 'cpus': '0-7', 'unit': 'KiB'}, {'memAccess': 'shared', 'id': '1', 'memory': '512000', 'cpus': '8-15', 'unit': 'KiB'}]}
+            cpu_numa_attrs = {
+                'numa_cell': [
+                    {'id': '0', 'discard': 'yes', 'memory': '512000',
+                     'cpus': '0-' + str(${dies}*4 - 1), 'unit': 'KiB'},
+                    {'memAccess': 'shared', 'id': '1', 'memory': '512000',
+                     'cpus': str(${dies}*4) + '-' + str(${dies}*8 - 1), 'unit': 'KiB'}
+                ]
+            }

If your environment requires integers in topology, you can also drop the quotes around '2' values in sockets/cores/threads.

libvirt/tests/src/cpu/cpu_topology_die_id.py (6)

3-7: Remove unused imports/const.

virsh, libvirt, and VIRSH_ARGS are unused. Trim to keep the module clean.

-from virttest import virsh
 from virttest.libvirt_xml import vm_xml
-from virttest.utils_test import libvirt
 from virttest.libvirt_xml import capability_xml
-VIRSH_ARGS = {'ignore_status': False, 'debug': True}

138-141: Docstring mismatch.

Update the docstring to reflect die-id topology validation.

-    Test start VM with hyper-v related features
+    Validate die_id topology via host capabilities or inside a VM.

167-187: Heredoc is safer than echo for multi-line shell script.

Prevents quoting/escape surprises and preserves tabs/newlines reliably.

-            session.cmd(f"echo '{shell_str}' > {shell_file}")
-            LOG.debug(session.cmd(f'cat {shell_file}'))
+            session.cmd(f"cat > {shell_file} <<'EOF'\n{shell_str}\nEOF")
+            LOG.debug(session.cmd_output(f'cat {shell_file}'))

194-196: test.fail expects a string; join the errors.

Otherwise you pass a list and get an unhelpful representation.

-            if result['valid'] is False:
-                test.fail(result['errors'])
+            if result['valid'] is False:
+                test.fail('; '.join(result['errors']))

142-172: Ensure session/VM cleanup and restore XML after shutdown.

Avoid leaking sessions and restoring XML while the VM is running.

@@
-    vmxml = vm_xml.VMXML.new_from_dumpxml(vm_name)
-    bkxml = vmxml.copy()
+    vmxml = vm_xml.VMXML.new_from_dumpxml(vm_name)
+    bkxml = vmxml.copy()
+    session = None
@@
-            session = vm.wait_for_login()
+            session = vm.wait_for_login()
@@
     finally:
-        bkxml.sync()
+        try:
+            if session:
+                session.close()
+        finally:
+            if vm.is_alive():
+                vm.destroy()
+            bkxml.sync()

Also applies to: 197-199


82-111: Topology checks are fragile for non-contiguous lists.

phylist/dielist equality to single ranges assumes contiguous numbering. Real kernels may return comma-separated lists (e.g., “0-3,32-35”). Consider parsing lists into sets and validating membership instead of string equality.

I can add a tiny cpulist parser and update the checks—want me to push a patch?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5d14c and 59ae779.

📒 Files selected for processing (2)
  • libvirt/tests/cfg/cpu/cpu_topology_die_id.cfg (1 hunks)
  • libvirt/tests/src/cpu/cpu_topology_die_id.py (1 hunks)
🧰 Additional context used
🪛 Ruff (0.12.2)
libvirt/tests/src/cpu/cpu_topology_die_id.py

145-145: Use of possibly insecure function; consider using ast.literal_eval

(S307)


146-146: Local variable cpu_attrs is assigned to but never used

Remove assignment to unused variable cpu_attrs

(F841)


146-146: Use of possibly insecure function; consider using ast.literal_eval

(S307)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Python 3.9
  • GitHub Check: Python 3.12
  • GitHub Check: Python 3.8
  • GitHub Check: Python 3.11

Comment on lines +145 to +146
vm_attrs = eval(params.get('vm_attrs', '{}'))
cpu_attrs = eval(params.get('cpu_attrs', '{}'))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Replace eval() with ast.literal_eval() and drop unused variable.

Avoid eval on untrusted config. Also cpu_attrs local is unused.

+import ast
@@
-    vm_attrs = eval(params.get('vm_attrs', '{}'))
-    cpu_attrs = eval(params.get('cpu_attrs', '{}'))
+    vm_attrs = ast.literal_eval(params.get('vm_attrs', '{}'))
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
vm_attrs = eval(params.get('vm_attrs', '{}'))
cpu_attrs = eval(params.get('cpu_attrs', '{}'))
import ast
vm_attrs = ast.literal_eval(params.get('vm_attrs', '{}'))
🧰 Tools
🪛 Ruff (0.12.2)

145-145: Use of possibly insecure function; consider using ast.literal_eval

(S307)


146-146: Local variable cpu_attrs is assigned to but never used

Remove assignment to unused variable cpu_attrs

(F841)


146-146: Use of possibly insecure function; consider using ast.literal_eval

(S307)

🤖 Prompt for AI Agents
libvirt/tests/src/cpu/cpu_topology_die_id.py around lines 145-146: replace the
unsafe eval() calls with ast.literal_eval() for parsing params.get('vm_attrs',
'{}') and remove the unused cpu_attrs variable; add "import ast" at the top of
the file if not already present and update the assignment to only create
vm_attrs = ast.literal_eval(params.get('vm_attrs', '{}')) (delete the cpu_attrs
line).

Comment on lines +152 to +166
try:
if case == 'host_capabilities':
capa = capability_xml.CapabilityXML()
cells = capa.cells_topology.get_cell()
die_id_dict = {cell.cell_id: [cpu['die_id']
for cpu in cell.cpu] for cell in cells}
die_ids = [set(v) for v in die_id_dict]
if not all([len(x) == 1 for x in die_ids]):
test.error(
'The values of die_id of the same cell should be the same')
die_id_set = set([int(x) for y in die_ids for x in y])
if die_id_set != {0, 1}:
test.error(
f'The values of die_id should be 0 or 1, not {die_id_set}')
elif case == 'on_vm':
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

🧩 Analysis chain

Fix host_capabilities logic: wrong dict iteration and over‑strict die_id check.

for v in die_id_dict iterates keys, not values. Also requiring die_id_set == {0,1} will fail on hosts with other die counts. Validate per‑cell die_id contiguity (0..max) instead.

         if case == 'host_capabilities':
             capa = capability_xml.CapabilityXML()
             cells = capa.cells_topology.get_cell()
-            die_id_dict = {cell.cell_id: [cpu['die_id']
-                                          for cpu in cell.cpu] for cell in cells}
-            die_ids = [set(v) for v in die_id_dict]
-            if not all([len(x) == 1 for x in die_ids]):
-                test.error(
-                    'The values of die_id of the same cell should be the same')
-            die_id_set = set([int(x) for y in die_ids for x in y])
-            if die_id_set != {0, 1}:
-                test.error(
-                    f'The values of die_id should be 0 or 1, not {die_id_set}')
+            die_id_map = {
+                cell.cell_id: [int(cpu['die_id']) for cpu in cell.cpu]
+                for cell in cells
+            }
+            die_sets = [set(v) for v in die_id_map.values()]
+            if any(len(s) == 0 for s in die_sets):
+                test.error('No die_id values found in one or more cells')
+            # Expect zero-based, contiguous die ids within each cell, e.g. {0..N}
+            for cid, s in zip(die_id_map.keys(), die_sets):
+                if sorted(s) != list(range(max(s) + 1)):
+                    test.error(f'Cell {cid} die_id set must be contiguous starting at 0, got {sorted(s)}')

If cells_topology.get_cell() or cell.cpu shapes differ in your tree, adjust attribute names accordingly.


🏁 Script executed:

#!/bin/bash
# Inspect capability_xml API to confirm attribute names used above.
fd -t f capability_*.py | xargs -I{} sh -c 'echo ">>> {}"; rg -n "class CapabilityXML|cells_topology|get_cell|die_id|cpu" -n {} -C2'

Length of output: 0


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
echo "Top-level files:"
ls -la | sed -n '1,120p'

echo
echo "Searching repository for relevant symbols..."
rg -n --hidden --no-ignore -S "CapabilityXML|capability_xml|cells_topology|get_cell|die_id|cell\.cpu|cell_id" -g '!**/node_modules/**' -g '!**/.git/**' || true

echo
TARGET="libvirt/tests/src/cpu/cpu_topology_die_id.py"
if [ -f "$TARGET" ]; then
  echo "---- Showing $TARGET (first 400 lines) ----"
  nl -ba "$TARGET" | sed -n '1,400p'
else
  echo "Target file not found: $TARGET"
fi

Length of output: 1922


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "PWD: $(pwd)"
echo
echo "Listing target directory (if present):"
ls -la libvirt/tests/src/cpu 2>/dev/null || true

FILE="libvirt/tests/src/cpu/cpu_topology_die_id.py"
if [ -f "$FILE" ]; then
  echo "---- Showing $FILE (up to 400 lines) ----"
  sed -n '1,400p' "$FILE"
else
  echo "$FILE not found. Searching for file..."
  if command -v fd >/dev/null 2>&1; then
    fd --hidden --type f 'cpu_topology_die_id.py' || true
  else
    find . -type f -name 'cpu_topology_die_id.py' -print || true
  fi
fi

echo
echo "Searching repository for capability_xml / CapabilityXML / die_id occurrences..."
if command -v rg >/dev/null 2>&1; then
  rg -n -uu "CapabilityXML|capability_xml|cells_topology|get_cell|die_id|cell\.cpu|cell_id" || true
else
  echo "rg not available; falling back to grep"
  grep -RIn --exclude-dir=.git -E "CapabilityXML|capability_xml|cells_topology|get_cell|die_id|cell\.cpu|cell_id" . || true
fi

Length of output: 33720


Fix host_capabilities logic: wrong dict iteration and over‑strict die_id check.

for v in die_id_dict iterates keys, not values. Also requiring die_id_set == {0,1} will fail on hosts with other die counts — validate per‑cell die_id contiguity (0..max) instead.

         if case == 'host_capabilities':
             capa = capability_xml.CapabilityXML()
             cells = capa.cells_topology.get_cell()
-            die_id_dict = {cell.cell_id: [cpu['die_id']
-                                          for cpu in cell.cpu] for cell in cells}
-            die_ids = [set(v) for v in die_id_dict]
-            if not all([len(x) == 1 for x in die_ids]):
-                test.error(
-                    'The values of die_id of the same cell should be the same')
-            die_id_set = set([int(x) for y in die_ids for x in y])
-            if die_id_set != {0, 1}:
-                test.error(
-                    f'The values of die_id should be 0 or 1, not {die_id_set}')
+            die_id_map = {
+                cell.cell_id: [int(cpu['die_id']) for cpu in cell.cpu]
+                for cell in cells
+            }
+            die_sets = [set(v) for v in die_id_map.values()]
+            if any(len(s) == 0 for s in die_sets):
+                test.error('No die_id values found in one or more cells')
+            # Expect zero-based, contiguous die ids within each cell, e.g. {0..N}
+            for cid, s in zip(die_id_map.keys(), die_sets):
+                if sorted(s) != list(range(max(s) + 1)):
+                    test.error(f'Cell {cid} die_id set must be contiguous starting at 0, got {sorted(s)}')

If cells_topology.get_cell() or cell.cpu shapes differ in your tree, adjust attribute names accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
if case == 'host_capabilities':
capa = capability_xml.CapabilityXML()
cells = capa.cells_topology.get_cell()
die_id_dict = {cell.cell_id: [cpu['die_id']
for cpu in cell.cpu] for cell in cells}
die_ids = [set(v) for v in die_id_dict]
if not all([len(x) == 1 for x in die_ids]):
test.error(
'The values of die_id of the same cell should be the same')
die_id_set = set([int(x) for y in die_ids for x in y])
if die_id_set != {0, 1}:
test.error(
f'The values of die_id should be 0 or 1, not {die_id_set}')
elif case == 'on_vm':
try:
if case == 'host_capabilities':
capa = capability_xml.CapabilityXML()
cells = capa.cells_topology.get_cell()
die_id_map = {
cell.cell_id: [int(cpu['die_id']) for cpu in cell.cpu]
for cell in cells
}
die_sets = [set(v) for v in die_id_map.values()]
if any(len(s) == 0 for s in die_sets):
test.error('No die_id values found in one or more cells')
# Expect zero-based, contiguous die ids within each cell, e.g. {0..N}
for cid, s in zip(die_id_map.keys(), die_sets):
if sorted(s) != list(range(max(s) + 1)):
test.error(f'Cell {cid} die_id set must be contiguous starting at 0, got {sorted(s)}')
elif case == 'on_vm':
🤖 Prompt for AI Agents
In libvirt/tests/src/cpu/cpu_topology_die_id.py around lines 152-166, the
host_capabilities branch iterates the die_id_dict keys instead of values and
then enforces a global set equality to {0,1}; change the comprehension to
iterate die_id_dict.values(), build per-cell sets of die_id integers, and
replace the global equality check with a per-cell contiguity check that verifies
each cell's die_id set equals set(range(0, max_die+1)) (i.e. starts at 0 and
contains contiguous IDs up to the max for that cell); ensure die_id values are
cast to int when forming sets and keep the existing fallback note about
adjusting attribute names if cells_topology.get_cell() or cell.cpu differ.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant