Skip to content

Conversation

@haircommander
Copy link
Contributor

@haircommander haircommander commented Oct 20, 2025

What type of PR is this?

/kind feature

What this PR does / why we need it:

Which issue(s) this PR fixes:

Special notes for your reviewer:

Does this PR introduce a user-facing change?

add tests for metric descriptors in critest

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. kind/feature Categorizes issue or PR as related to a new feature. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Oct 20, 2025
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: haircommander

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels Oct 20, 2025
@haircommander haircommander force-pushed the metrics-tests branch 2 times, most recently from 37a4117 to 5188a10 Compare October 20, 2025 19:18
@haircommander
Copy link
Contributor Author

cc @akhilerm

Comment on lines 44 to 47
"container_fs_inodes_free",
"container_fs_inodes_total",
"container_fs_io_current",
"container_fs_io_time_seconds_total",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fs metrics are currently not implemented in cri-o. ref: cri-o/cri-o#9344

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah there's a number of these cri-o doesn't support yet. My plan is to get consensus on the list then fill in the gaps


By("create container in pod")
ic := f.CRIClient.CRIImageClient
containerID := framework.CreatePauseContainer(rc, ic, podID, podConfig, "container-for-metrics-")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we use CreateDefaultContainer rather than the pause container?

"container_fs_write_seconds_total",
"container_fs_writes_merged_total",
"container_fs_writes_total",
"container_last_seen",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few of these metrics are currently not in the implementation in containerd also.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have a list handy of the ones you aren't yet supporting? is it because you don't plan on it or haven't gotten to it yet
I think for this KEP to go beta each should really support the full set (I say knowing CRI-O is missing some)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From containerd, the below metrics are not reported from metric descriptors

          "container_cpu_load_average_10s",
          "container_cpu_load_d_average_10s",
          "container_file_descriptors",
          "container_last_seen",
          "container_oom_events_total",
          "container_pressure_cpu_stalled_seconds_total",
          "container_pressure_cpu_waiting_seconds_total",
          "container_pressure_io_stalled_seconds_total",
          "container_pressure_io_waiting_seconds_total",
          "container_pressure_memory_stalled_seconds_total",
          "container_pressure_memory_waiting_seconds_total",
          "container_sockets",
          "container_spec_cpu_period",
          "container_spec_cpu_shares",
          "container_spec_memory_limit_bytes",
          "container_spec_memory_reservation_limit_bytes",
          "container_spec_memory_swap_limit_bytes",
          "container_start_time_seconds",
          "container_tasks_state",
          "container_threads",
          "container_ulimits_soft",

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are now added as part of #12426.

Checking why the CI fails on containerd main.

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Nov 5, 2025
@haircommander haircommander changed the title WIP critest: test metric and metric descriptors ritest: test metric and metric descriptors Nov 5, 2025
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2025
@haircommander haircommander changed the title ritest: test metric and metric descriptors critest: test metric and metric descriptors Nov 5, 2025
@haircommander
Copy link
Contributor Author

once cri-o/cri-o#9571 merges, this should pass for CRI-O. I still could use @akhilerm 's help figuring out why containerd tests are still failing

@haircommander
Copy link
Contributor Author

  Nov  6 17:55:09.926: INFO: Unexpected error occurred: rpc error: code = Unknown desc = container create failed: time="2025-11-06T17:55:09Z" level=error msg="runc create failed: unable to start container process: error during container init: error closing exec fds: get handle to /proc/thread-self/fd: unsafe procfs detected: openat2 fsmount:fscontext:proc/thread-self/fd/: operation not permitted"

CRI-O failures are expected and unrelated opencontainers/runc#4968

@haircommander haircommander changed the title critest: test metric and metric descriptors critest: test metric descriptors Nov 6, 2025
@haircommander
Copy link
Contributor Author

i've paired this down to just test present metric descriptors. CRI-O was hitting issues with diskIO metrics being tricky to reliably trigger (io.stat file doesn't get created when the container hasn't triggered block IO) and containerd is returning empty metrics for some reason. I think adding the pod metrics tests can be a follow up

}
}

Expect(missingMetrics).To(BeEmpty(), "Expected %s metrics to be present and they were not", strings.Join(missingMetrics, " "))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Expect(missingMetrics).To(BeEmpty(), "Expected %s metrics to be present and they were not", strings.Join(missingMetrics, " "))
Expect(missingMetrics).To(BeEmpty(), "Expected metrics missing: %s", strings.Join(missingMetrics, ", "))

}

// testPodSandboxMetrics verifies that metrics are present for the specified pod.
func testPodSandboxMetrics(allMetrics []*runtimeapi.PodSandboxMetrics, podID string) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are still taking in this function. Only the metrics descriptors test are needed right?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/feature Categorizes issue or PR as related to a new feature. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants