Add new cpu case of check die id #6567

chloerh · 2025-09-16T09:12:07Z

RHEL-187329 - [Topology] Configure 'die_id' in CPU topology part for VM

Test result:
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_5: STARTED
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_5: PASS (29.71 s)
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_2: STARTED
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_2: PASS (27.66 s)
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_11: STARTED
(1/1) type_specific.local.cpu_topology_die_id.on_vm.dies_11: PASS (50.99 s)

Summary by CodeRabbit

Tests
- Added comprehensive CPU die topology validation for host and guest.
- Introduces variants for 2, 5, and 11 dies with aligned vCPU and NUMA configurations.
- Collects per-CPU topology from within guests and verifies distribution across sockets, dies, cores, and threads.
- Ensures die IDs and CPU groupings match expectations; reports structured errors.
- Includes VM lifecycle handling and cleanup during test execution.

- RHEL-187329 - [Topology] Configure 'die_id' in CPU topology part for VM Signed-off-by: Haijiao Zhao <[email protected]>

coderabbitai · 2025-09-16T09:12:15Z

Walkthrough

Adds a new CPU topology die-id test: a config defining NUMA cells and die variants, and a Python test that parses guest CPU topology via sysfs, validates die groupings/counts, and optionally checks host capabilities. Test runs in host_capabilities or on_vm modes, with VM XML setup/teardown.

Changes

Cohort / File(s)	Summary of Changes
CPU topology die-id test configuration `libvirt/tests/cfg/cpu/cpu_topology_die_id.cfg`	New config defining cpu_topology_die_id. Adds on_vm variants for dies (2, 5, 11). Specifies NUMA cells and builds cpu/topology attrs using parameter expansion. Sets VM vcpu count as dies\*8 and strict auto NUMA placement.
CPU topology die-id test logic `libvirt/tests/src/cpu/cpu_topology_die_id.py`	New test module. Adds parsing of per-CPU topology output, validation against die count, host capability checks, VM startup, in-guest sysfs probing, result aggregation, and cleanup. Exposes parse_cpu_topology, check_cpu_topology, run; adds VIRSH_ARGS and LOG.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Tester
  participant Runner as Test Runner
  participant Libvirt as libvirt/virsh
  participant VM as Guest VM
  participant Guest as Guest OS
  participant SysFS as /sys/devices/system/cpu/...

  rect rgb(245,245,255)
    note right of Runner: Mode: host_capabilities
    Tester->>Runner: run(cpu_topology_die_id, host_capabilities)
    Runner->>Libvirt: capability_xml (host NUMA/die info)
    Libvirt-->>Runner: XML capabilities
    Runner->>Runner: Validate die_id distribution per cell
    Runner-->>Tester: Pass/Fail
  end

  rect rgb(245,255,245)
    note right of Runner: Mode: on_vm
    Tester->>Runner: run(cpu_topology_die_id, on_vm, dies=N)
    Runner->>Libvirt: Define VM XML (vcpu=N*8, NUMA, CPU topology)
    Libvirt-->>Runner: VM defined
    Runner->>Libvirt: Start VM
    Libvirt-->>VM: Boot
    Runner->>Guest: Execute topology probe script
    Guest->>SysFS: Read cpu*/topology
    SysFS-->>Guest: per-CPU fields (core, socket, die)
    Guest-->>Runner: Topology text
    Runner->>Runner: parse_cpu_topology()
    Runner->>Runner: check_cpu_topology(die_count=N)
    alt Validation OK
      Runner-->>Tester: Pass with summary
    else Validation errors
      Runner-->>Tester: Fail with errors
    end
    Runner->>Libvirt: Restore/undefine VM XML
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45–90 minutes

Poem

I thump my paw in tidy rows,
Counting dies where vCPU flows—
Two, five, eleven: neat arrays,
NUMA moons in socket haze.
I sniff the sysfs breeze with glee,
“Topology’s true?” Yes—let’s see!
Hop, hop—tests pass; more clover for me. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Add new cpu case of check die id" directly reflects the primary change in the PR—adding a new CPU test that validates die_id topology (cfg and test module), so it is related and focused on the main intent. It is concise and does not include file lists, emojis, or unrelated information. The phrasing is slightly non‑idiomatic and could be clarified for readability.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (7)

libvirt/tests/cfg/cpu/cpu_topology_die_id.cfg (1)
14-16: NUMA cpus mapping is fixed to 0-15; dies=5/11 will leave most vCPUs unmapped.

For dies=5 and 11, vcpu = dies*8 yields 40 and 88 vCPUs, but NUMA cells only cover 0-15. This risks undefined placement or libvirt rejection depending on version. Make the cpulists depend on dies.

Apply this diff to generate dynamic ranges per socket:
-            cpu_numa_attrs = {'numa_cell': [{'id': '0', 'discard': 'yes', 'memory': '512000', 'cpus': '0-7', 'unit': 'KiB'}, {'memAccess': 'shared', 'id': '1', 'memory': '512000', 'cpus': '8-15', 'unit': 'KiB'}]}
+            cpu_numa_attrs = {
+                'numa_cell': [
+                    {'id': '0', 'discard': 'yes', 'memory': '512000',
+                     'cpus': '0-' + str(${dies}*4 - 1), 'unit': 'KiB'},
+                    {'memAccess': 'shared', 'id': '1', 'memory': '512000',
+                     'cpus': str(${dies}*4) + '-' + str(${dies}*8 - 1), 'unit': 'KiB'}
+                ]
+            }
If your environment requires integers in topology, you can also drop the quotes around '2' values in sockets/cores/threads.
libvirt/tests/src/cpu/cpu_topology_die_id.py (6)
3-7: Remove unused imports/const.

virsh, libvirt, and VIRSH_ARGS are unused. Trim to keep the module clean.
-from virttest import virsh
 from virttest.libvirt_xml import vm_xml
-from virttest.utils_test import libvirt
 from virttest.libvirt_xml import capability_xml
-VIRSH_ARGS = {'ignore_status': False, 'debug': True}
138-141: Docstring mismatch.

Update the docstring to reflect die-id topology validation.
-    Test start VM with hyper-v related features
+    Validate die_id topology via host capabilities or inside a VM.
167-187: Heredoc is safer than echo for multi-line shell script.

Prevents quoting/escape surprises and preserves tabs/newlines reliably.
-            session.cmd(f"echo '{shell_str}' > {shell_file}")
-            LOG.debug(session.cmd(f'cat {shell_file}'))
+            session.cmd(f"cat > {shell_file} <<'EOF'\n{shell_str}\nEOF")
+            LOG.debug(session.cmd_output(f'cat {shell_file}'))
194-196: test.fail expects a string; join the errors.

Otherwise you pass a list and get an unhelpful representation.
-            if result['valid'] is False:
-                test.fail(result['errors'])
+            if result['valid'] is False:
+                test.fail('; '.join(result['errors']))
142-172: Ensure session/VM cleanup and restore XML after shutdown.

Avoid leaking sessions and restoring XML while the VM is running.
@@
-    vmxml = vm_xml.VMXML.new_from_dumpxml(vm_name)
-    bkxml = vmxml.copy()
+    vmxml = vm_xml.VMXML.new_from_dumpxml(vm_name)
+    bkxml = vmxml.copy()
+    session = None
@@
-            session = vm.wait_for_login()
+            session = vm.wait_for_login()
@@
     finally:
-        bkxml.sync()
+        try:
+            if session:
+                session.close()
+        finally:
+            if vm.is_alive():
+                vm.destroy()
+            bkxml.sync()
Also applies to: 197-199

82-111: Topology checks are fragile for non-contiguous lists.

phylist/dielist equality to single ranges assumes contiguous numbering. Real kernels may return comma-separated lists (e.g., “0-3,32-35”). Consider parsing lists into sets and validating membership instead of string equality.

I can add a tiny cpulist parser and update the checks—want me to push a patch?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7f5d14c and 59ae779.

📒 Files selected for processing (2)

libvirt/tests/cfg/cpu/cpu_topology_die_id.cfg (1 hunks)
libvirt/tests/src/cpu/cpu_topology_die_id.py (1 hunks)

🧰 Additional context used

🪛 Ruff (0.12.2)

libvirt/tests/src/cpu/cpu_topology_die_id.py

145-145: Use of possibly insecure function; consider using ast.literal_eval

(S307)

146-146: Local variable cpu_attrs is assigned to but never used

Remove assignment to unused variable cpu_attrs

(F841)

146-146: Use of possibly insecure function; consider using ast.literal_eval

(S307)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Python 3.9
GitHub Check: Python 3.12
GitHub Check: Python 3.8
GitHub Check: Python 3.11

coderabbitai · 2025-09-16T09:20:43Z

libvirt/tests/src/cpu/cpu_topology_die_id.py

+    vm_attrs = eval(params.get('vm_attrs', '{}'))
+    cpu_attrs = eval(params.get('cpu_attrs', '{}'))


⚠️ Potential issue

Replace eval() with ast.literal_eval() and drop unused variable.

Avoid eval on untrusted config. Also cpu_attrs local is unused.

+import ast @@ - vm_attrs = eval(params.get('vm_attrs', '{}')) - cpu_attrs = eval(params.get('cpu_attrs', '{}')) + vm_attrs = ast.literal_eval(params.get('vm_attrs', '{}'))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

vm_attrs = eval(params.get('vm_attrs', '{}'))

cpu_attrs = eval(params.get('cpu_attrs', '{}'))

import ast

vm_attrs = ast.literal_eval(params.get('vm_attrs', '{}'))

🧰 Tools

🪛 Ruff (0.12.2)

145-145: Use of possibly insecure function; consider using ast.literal_eval

(S307)

146-146: Local variable cpu_attrs is assigned to but never used

Remove assignment to unused variable cpu_attrs

(F841)

146-146: Use of possibly insecure function; consider using ast.literal_eval

(S307)

🤖 Prompt for AI Agents

libvirt/tests/src/cpu/cpu_topology_die_id.py around lines 145-146: replace the unsafe eval() calls with ast.literal_eval() for parsing params.get('vm_attrs', '{}') and remove the unused cpu_attrs variable; add "import ast" at the top of the file if not already present and update the assignment to only create vm_attrs = ast.literal_eval(params.get('vm_attrs', '{}')) (delete the cpu_attrs line).

coderabbitai · 2025-09-16T09:20:43Z

libvirt/tests/src/cpu/cpu_topology_die_id.py

+    try:
+        if case == 'host_capabilities':
+            capa = capability_xml.CapabilityXML()
+            cells = capa.cells_topology.get_cell()
+            die_id_dict = {cell.cell_id: [cpu['die_id']
+                                          for cpu in cell.cpu] for cell in cells}
+            die_ids = [set(v) for v in die_id_dict]
+            if not all([len(x) == 1 for x in die_ids]):
+                test.error(
+                    'The values of die_id of the same cell should be the same')
+            die_id_set = set([int(x) for y in die_ids for x in y])
+            if die_id_set != {0, 1}:
+                test.error(
+                    f'The values of die_id should be 0 or 1, not {die_id_set}')
+        elif case == 'on_vm':


⚠️ Potential issue

🧩 Analysis chain

Fix host_capabilities logic: wrong dict iteration and over‑strict die_id check.

for v in die_id_dict iterates keys, not values. Also requiring die_id_set == {0,1} will fail on hosts with other die counts. Validate per‑cell die_id contiguity (0..max) instead.

if case == 'host_capabilities': capa = capability_xml.CapabilityXML() cells = capa.cells_topology.get_cell() - die_id_dict = {cell.cell_id: [cpu['die_id'] - for cpu in cell.cpu] for cell in cells} - die_ids = [set(v) for v in die_id_dict] - if not all([len(x) == 1 for x in die_ids]): - test.error( - 'The values of die_id of the same cell should be the same') - die_id_set = set([int(x) for y in die_ids for x in y]) - if die_id_set != {0, 1}: - test.error( - f'The values of die_id should be 0 or 1, not {die_id_set}') + die_id_map = { + cell.cell_id: [int(cpu['die_id']) for cpu in cell.cpu] + for cell in cells + } + die_sets = [set(v) for v in die_id_map.values()] + if any(len(s) == 0 for s in die_sets): + test.error('No die_id values found in one or more cells') + # Expect zero-based, contiguous die ids within each cell, e.g. {0..N} + for cid, s in zip(die_id_map.keys(), die_sets): + if sorted(s) != list(range(max(s) + 1)): + test.error(f'Cell {cid} die_id set must be contiguous starting at 0, got {sorted(s)}')

If cells_topology.get_cell() or cell.cpu shapes differ in your tree, adjust attribute names accordingly.

🏁 Script executed:

#!/bin/bash # Inspect capability_xml API to confirm attribute names used above. fd -t f capability_*.py | xargs -I{} sh -c 'echo ">>> {}"; rg -n "class CapabilityXML|cells_topology|get_cell|die_id|cpu" -n {} -C2'

Length of output: 0

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "PWD: $(pwd)" echo "Top-level files:" ls -la | sed -n '1,120p' echo echo "Searching repository for relevant symbols..." rg -n --hidden --no-ignore -S "CapabilityXML|capability_xml|cells_topology|get_cell|die_id|cell\.cpu|cell_id" -g '!**/node_modules/**' -g '!**/.git/**' || true echo TARGET="libvirt/tests/src/cpu/cpu_topology_die_id.py" if [ -f "$TARGET" ]; then echo "---- Showing $TARGET (first 400 lines) ----" nl -ba "$TARGET" | sed -n '1,400p' else echo "Target file not found: $TARGET" fi

Length of output: 1922

🏁 Script executed:

#!/bin/bash set -euo pipefail echo "PWD: $(pwd)" echo echo "Listing target directory (if present):" ls -la libvirt/tests/src/cpu 2>/dev/null || true FILE="libvirt/tests/src/cpu/cpu_topology_die_id.py" if [ -f "$FILE" ]; then echo "---- Showing $FILE (up to 400 lines) ----" sed -n '1,400p' "$FILE" else echo "$FILE not found. Searching for file..." if command -v fd >/dev/null 2>&1; then fd --hidden --type f 'cpu_topology_die_id.py' || true else find . -type f -name 'cpu_topology_die_id.py' -print || true fi fi echo echo "Searching repository for capability_xml / CapabilityXML / die_id occurrences..." if command -v rg >/dev/null 2>&1; then rg -n -uu "CapabilityXML|capability_xml|cells_topology|get_cell|die_id|cell\.cpu|cell_id" || true else echo "rg not available; falling back to grep" grep -RIn --exclude-dir=.git -E "CapabilityXML|capability_xml|cells_topology|get_cell|die_id|cell\.cpu|cell_id" . || true fi

Length of output: 33720

Fix host_capabilities logic: wrong dict iteration and over‑strict die_id check.

for v in die_id_dict iterates keys, not values. Also requiring die_id_set == {0,1} will fail on hosts with other die counts — validate per‑cell die_id contiguity (0..max) instead.

if case == 'host_capabilities': capa = capability_xml.CapabilityXML() cells = capa.cells_topology.get_cell() - die_id_dict = {cell.cell_id: [cpu['die_id'] - for cpu in cell.cpu] for cell in cells} - die_ids = [set(v) for v in die_id_dict] - if not all([len(x) == 1 for x in die_ids]): - test.error( - 'The values of die_id of the same cell should be the same') - die_id_set = set([int(x) for y in die_ids for x in y]) - if die_id_set != {0, 1}: - test.error( - f'The values of die_id should be 0 or 1, not {die_id_set}') + die_id_map = { + cell.cell_id: [int(cpu['die_id']) for cpu in cell.cpu] + for cell in cells + } + die_sets = [set(v) for v in die_id_map.values()] + if any(len(s) == 0 for s in die_sets): + test.error('No die_id values found in one or more cells') + # Expect zero-based, contiguous die ids within each cell, e.g. {0..N} + for cid, s in zip(die_id_map.keys(), die_sets): + if sorted(s) != list(range(max(s) + 1)): + test.error(f'Cell {cid} die_id set must be contiguous starting at 0, got {sorted(s)}')

If cells_topology.get_cell() or cell.cpu shapes differ in your tree, adjust attribute names accordingly.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

try:

if case == 'host_capabilities':

capa = capability_xml.CapabilityXML()

cells = capa.cells_topology.get_cell()

die_id_dict = {cell.cell_id: [cpu['die_id']

for cpu in cell.cpu] for cell in cells}

die_ids = [set(v) for v in die_id_dict]

if not all([len(x) == 1 for x in die_ids]):

test.error(

'The values of die_id of the same cell should be the same')

die_id_set = set([int(x) for y in die_ids for x in y])

if die_id_set != {0, 1}:

test.error(

f'The values of die_id should be 0 or 1, not {die_id_set}')

elif case == 'on_vm':

try:

if case == 'host_capabilities':

capa = capability_xml.CapabilityXML()

cells = capa.cells_topology.get_cell()

die_id_map = {

cell.cell_id: [int(cpu['die_id']) for cpu in cell.cpu]

for cell in cells

}

die_sets = [set(v) for v in die_id_map.values()]

if any(len(s) == 0 for s in die_sets):

test.error('No die_id values found in one or more cells')

# Expect zero-based, contiguous die ids within each cell, e.g. {0..N}

for cid, s in zip(die_id_map.keys(), die_sets):

if sorted(s) != list(range(max(s) + 1)):

test.error(f'Cell {cid} die_id set must be contiguous starting at 0, got {sorted(s)}')

elif case == 'on_vm':

🤖 Prompt for AI Agents

In libvirt/tests/src/cpu/cpu_topology_die_id.py around lines 152-166, the host_capabilities branch iterates the die_id_dict keys instead of values and then enforces a global set equality to {0,1}; change the comprehension to iterate die_id_dict.values(), build per-cell sets of die_id integers, and replace the global equality check with a per-cell contiguity check that verifies each cell's die_id set equals set(range(0, max_die+1)) (i.e. starts at 0 and contains contiguous IDs up to the max for that cell); ensure die_id values are cast to int when forming sets and keep the existing fallback note about adjusting attribute names if cells_topology.get_cell() or cell.cpu differ.

Add new cpu case of check die id

59ae779

- RHEL-187329 - [Topology] Configure 'die_id' in CPU topology part for VM Signed-off-by: Haijiao Zhao <[email protected]>

coderabbitai bot reviewed Sep 16, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add new cpu case of check die id #6567

Add new cpu case of check die id #6567

chloerh commented Sep 16, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Sep 16, 2025 •

edited

Loading

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 16, 2025

Uh oh!

coderabbitai bot Sep 16, 2025

Uh oh!

Uh oh!

		vm_attrs = eval(params.get('vm_attrs', '{}'))
		cpu_attrs = eval(params.get('cpu_attrs', '{}'))

Add new cpu case of check die id #6567

Are you sure you want to change the base?

Add new cpu case of check die id #6567

Conversation

chloerh commented Sep 16, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

chloerh commented Sep 16, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 16, 2025 •

edited

Loading