-
Notifications
You must be signed in to change notification settings - Fork 22
hw-mgmt: thermal: ThermalControl memory usage optimization. #2127
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: V.7.0040.4000_BR
Are you sure you want to change the base?
Conversation
1. Optimized memory usage. 2. Fixed potential memory leaks. 3. Add memory usage dump 4. Add memory usage detailed trace Example memory usage on MSN3800 Before: VIRT = 252336 RES = 23464 After: VIRT = 177364 RES = 24492 Bug: 4665041 Signed-off-by: Oleksandr Shamray <[email protected]>
Greptile OverviewGreptile SummaryThis PR implements comprehensive memory optimization for the ThermalControl system, reducing virtual memory usage by ~75KB (from 252KB to 177KB on MSN3800). The changes address critical memory leaks and add memory profiling capabilities. Key optimizations:
Issues found:
Code quality improvements:
Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant TM as ThermalManagement
participant SD as system_device
participant PSU as psu_fan_sensor
participant PROC as Subprocess
participant MEM as Memory Profiler
Note over TM: Memory Optimization Flow
TM->>MEM: Initial snapshot (gmemory_snapshot_profiler)
activate MEM
MEM-->>TM: Baseline memory snapshot
deactivate MEM
loop Thermal Control Cycle
TM->>SD: Poll sensors & handle errors
SD->>SD: clear_fault_list() - Use .clear() not reassignment
Note right of SD: Optimized: Reuses list objects
alt Error counter growth
SD->>SD: Check dict size >= warn_err_limit_entries
SD->>SD: Remove zero-count entries
Note right of SD: Memory leak prevention
end
PSU->>PSU: set_pwm() via i2cset
PSU->>PROC: Popen(['i2cset', ...])
activate PROC
PROC-->>PSU: Result
PSU->>PROC: _terminate_proc() in finally block
deactivate PROC
Note right of PSU: Fixed: Prevents zombie processes
TM->>TM: write_pwm_mlxreg()
TM->>PROC: Popen(['yes']) + Popen(['mlxreg'])
activate PROC
PROC-->>TM: PWM set
TM->>PROC: _terminate_proc() for both processes
deactivate PROC
Note right of TM: Fixed: Ensures cleanup in finally
alt Memory usage > threshold
TM->>MEM: collect_snapshot(self)
activate MEM
MEM-->>TM: Current snapshot
deactivate MEM
TM->>MEM: compare_snapshots(old, new)
MEM-->>TM: Memory growth report (JSON)
TM->>TM: Log detailed memory comparison
TM->>TM: Increase threshold by ALERT_STEP
end
end
Note over TM,MEM: Result: ~75KB VIRT reduction, zombie leak fixed
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional Comments (1)
-
usr/usr/bin/hw_management_thermal_control_2_5.py, line 1764-1780 (link)logic: Same timeout issue as in thermal_control.py: outer loop uses parameter
timeout(2.0s) butsubprocess.runhas hardcodedtimeout=2.0. With retries, total wait time could exceed intended timeout significantly.
3 files reviewed, 2 comments
| timestump = current_milli_time() | ||
| while current_milli_time() - timestump < timeout: | ||
| try: | ||
| # 1. check if i2c bus is available | ||
| if os.path.isfile("/sys/bus/i2c/devices/i2c-{}/name".format(int(bus))): | ||
| cmd = ['i2cset', '-f', '-y', str(bus), str(addr)] + arg_list | ||
| # 2. execute i2c command | ||
| result = subprocess.run(cmd, shell=False, timeout=2.0, | ||
| capture_output=True, check=True) | ||
| return result | ||
| else: | ||
| self.log.warn("I2C bus {} is not available. Try again in 100ms".format(bus)) | ||
| time.sleep(0.1) | ||
| except (ValueError, TypeError, OSError, IOError): | ||
| self.log.warn("I2C bus {} is not available".format(bus)) | ||
| time.sleep(0.1) | ||
| return None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
logic: The outer while loop uses parameter timeout (default 2.0s) but subprocess.run has hardcoded timeout=2.0. This could cause the total wait time to be much longer than intended if retries occur. If the outer timeout is 2.0s and subprocess timeout is also 2.0s, a single retry could take 4+ seconds total.
Consider using a shorter subprocess timeout or making the relationship between the two timeouts explicit, e.g.:
| timestump = current_milli_time() | |
| while current_milli_time() - timestump < timeout: | |
| try: | |
| # 1. check if i2c bus is available | |
| if os.path.isfile("/sys/bus/i2c/devices/i2c-{}/name".format(int(bus))): | |
| cmd = ['i2cset', '-f', '-y', str(bus), str(addr)] + arg_list | |
| # 2. execute i2c command | |
| result = subprocess.run(cmd, shell=False, timeout=2.0, | |
| capture_output=True, check=True) | |
| return result | |
| else: | |
| self.log.warn("I2C bus {} is not available. Try again in 100ms".format(bus)) | |
| time.sleep(0.1) | |
| except (ValueError, TypeError, OSError, IOError): | |
| self.log.warn("I2C bus {} is not available".format(bus)) | |
| time.sleep(0.1) | |
| return None | |
| timestump = current_milli_time() | |
| subprocess_timeout = min(1.0, timeout / 2) # Use half of total timeout or 1s max | |
| while current_milli_time() - timestump < timeout * 1000: # convert to ms | |
| try: | |
| # 1. check if i2c bus is available | |
| if os.path.isfile("/sys/bus/i2c/devices/i2c-{}/name".format(int(bus))): | |
| cmd = ['i2cset', '-f', '-y', str(bus), str(addr)] + arg_list | |
| # 2. execute i2c command | |
| result = subprocess.run(cmd, shell=False, timeout=subprocess_timeout, | |
| capture_output=True, check=True) | |
| return result | |
| else: | |
| self.log.warn("I2C bus {} is not available. Try again in 100ms".format(bus)) | |
| time.sleep(0.1) | |
| except (ValueError, TypeError, OSError, IOError): | |
| self.log.warn("I2C bus {} is not available".format(bus)) | |
| time.sleep(0.1) | |
| return None |
Example memory usage on MSN3800
Before:
VIRT = 252336
RES = 23464
After:
VIRT = 177364
RES = 24492
Bug: 4665041