-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Add HLD for Memory Statistics Enhancement with New Metrics, Leak Detection, and gNMI Access #1962
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
…ction, and gNMI access Signed-off-by: Arham-Nasir <[email protected]>
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
/azp run |
No pipelines are associated with this pull request. |
Signed-off-by: Arham-Nasir <[email protected]>
/azp run |
No pipelines are associated with this pull request. |
Signed-off-by: Arham-Nasir <[email protected]>
/azp run |
No pipelines are associated with this pull request. |
|
||
## Architecture Design | ||
|
||
The enhancement fits within the existing framework without altering its core structure. The memorystatsd is extended to collect additional metrics and detect leaks, interfacing with hostcfgd for ConfigDB updates. Enhancements were made to the existing gNMI server which processes logs into JSON and makes them remotely accessible. This integrates seamlessly with SONiC’s modular design, leveraging existing daemons and adding gNMI capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment raised in community review meeting: why develop a new daemon MemoryStatsd for monitor memory data, which is overlapping with existing procdockerd? Could this new feature design reusing existing daemon or combined them into one daemon?
I see some code PRs are already merged, but this comment is still applicable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your feedback. We’ll carefully review procdockerd for overlap with MemoryStatsd and explore options for reusing or consolidating them into a single daemon. We appreciate your suggestion and will provide an update after our evaluation.
Can you please replace link of memory_statistics_hld.md with original sonic-net/SONiC repo link instead of personal fork link? |
Thanks for pointing it out. Updated |
This section outlines the functional requirements necessary for implementing this HLD in SONiC: | ||
|
||
- **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers and individual processes. | ||
- **Memory Leak Detection:** The feature must analyze memory usage trends over time to detect potential leaks and report them via CLI. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the practical benefit of storing memory histograms on the switch itself? Would it be more efficient to pull this data externally and analyze it off-box?
## Functional Requirements | ||
This section outlines the functional requirements necessary for implementing this HLD in SONiC: | ||
|
||
- **Monitoring Capabilities:** The system must monitor memory metrics for system (Total, Used, Free, Available, Cached, Buffers, Shared), Docker containers and individual processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Memory leaks are not specific to SONiC—they’re a broader Linux-level concern. It might be better to explore native Linux tooling for leak detection instead of custom mechanisms.
### Core Functionalities | ||
|
||
#### Data Collection and Storage | ||
The `memorystatsd` collects system, Docker and process memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intent behind the memory stats daemon? Is it based on existing sources like /proc/meminfo or /proc/dockerstats, or is it introducing a completely new daemon?
The `memorystatsd` collects system, Docker and process memory metrics using `psutil` and Docker APIs, storing them as compressed log files for optimized memory usage. | ||
|
||
#### Log Processing and Storage | ||
Logs are processed into JSON for gNMI retrieval with low overhead. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Log rotation and retention need to be defined. Is logrotate being used to manage stored memory metrics?
This PR adds a High-Level Design (HLD) to doc/memory_statistics/memory_statistics_enhancement.md, enhancing the Memory Statistics feature in SONiC. Key additions:
This HLD builds on v1 (memory_statistics_hld).