Skip to content

Add HLD for Memory Statistics Enhancement with New Metrics, Leak Detection, and gNMI Access #1962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Arham-Nasir
Copy link
Contributor

@Arham-Nasir Arham-Nasir commented Apr 10, 2025

This PR adds a High-Level Design (HLD) to doc/memory_statistics/memory_statistics_enhancement.md, enhancing the Memory Statistics feature in SONiC. Key additions:

  • Extends metrics to Docker, process, and CPU memory.
  • Adds memory leak detection via trend analysis.
  • Enables remote log access via gNMI.
    This HLD builds on v1 (memory_statistics_hld).

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

Copy link

linux-foundation-easycla bot commented Apr 24, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.

@mssonicbld
Copy link
Collaborator

/azp run

Copy link

No pipelines are associated with this pull request.


## Architecture Design

The enhancement fits within the existing framework without altering its core structure. The memorystatsd is extended to collect additional metrics and detect leaks, interfacing with hostcfgd for ConfigDB updates. Enhancements were made to the existing gNMI server which processes logs into JSON and makes them remotely accessible. This integrates seamlessly with SONiC’s modular design, leveraging existing daemons and adding gNMI capabilities.
Copy link
Contributor

@qiluo-msft qiluo-msft May 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment raised in community review meeting: why develop a new daemon MemoryStatsd for monitor memory data, which is overlapping with existing procdockerd? Could this new feature design reusing existing daemon or combined them into one daemon?

I see some code PRs are already merged, but this comment is still applicable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your feedback. We’ll carefully review procdockerd for overlap with MemoryStatsd and explore options for reusing or consolidating them into a single daemon. We appreciate your suggestion and will provide an update after our evaluation.

@rminnikanti
Copy link
Contributor

This PR adds a High-Level Design (HLD) to doc/memory_statistics/memory_statistics_enhancement.md, enhancing the Memory Statistics feature in SONiC. Key additions:

  • Extends metrics to Docker, process, and CPU memory.
  • Adds memory leak detection via trend analysis.
  • Enables remote log access via gNMI.
    This HLD builds on v1 (memory_statistics_hld.md).

Can you please replace link of memory_statistics_hld.md with original sonic-net/SONiC repo link instead of personal fork link?

@Arham-Nasir
Copy link
Contributor Author

This PR adds a High-Level Design (HLD) to doc/memory_statistics/memory_statistics_enhancement.md, enhancing the Memory Statistics feature in SONiC. Key additions:

  • Extends metrics to Docker, process, and CPU memory.
  • Adds memory leak detection via trend analysis.
  • Enables remote log access via gNMI.
    This HLD builds on v1 (memory_statistics_hld.md).

Can you please replace link of memory_statistics_hld.md with original sonic-net/SONiC repo link instead of personal fork link?

Thanks for pointing it out. Updated

This section outlines the functional requirements necessary for implementing this HLD in SONiC:

- **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers and individual processes.
- **Memory Leak Detection:** The feature must analyze memory usage trends over time to detect potential leaks and report them via CLI.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the practical benefit of storing memory histograms on the switch itself? Would it be more efficient to pull this data externally and analyze it off-box?

## Functional Requirements
This section outlines the functional requirements necessary for implementing this HLD in SONiC:

- **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers and individual processes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory leaks are not specific to SONiC—they’re a broader Linux-level concern. It might be better to explore native Linux tooling for leak detection instead of custom mechanisms.

### Core Functionalities

#### Data Collection and Storage
The `memorystatsd` collects system, Docker and process memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the intent behind the memory stats daemon? Is it based on existing sources like /proc/meminfo or /proc/dockerstats, or is it introducing a completely new daemon?

The `memorystatsd` collects system, Docker and process memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage.

#### Log Processing and Storage
Logs are processed into JSON for gNMI retrieval with low overhead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Log rotation and retention need to be defined. Is logrotate being used to manage stored memory metrics?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants