-
Notifications
You must be signed in to change notification settings - Fork 270
Add shm size check #978
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add shm size check #978
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @flyinglandlord, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces a proactive shared memory (/dev/shm
) monitoring feature to LightLLM. It aims to alert users about potentially insufficient SHM space, which is critical for optimal performance, by providing both continuous periodic checks and an option for a one-time startup verification.
Highlights
- New Feature: SHM Size Monitoring: LightLLM now includes a mechanism to periodically check the available
/dev/shm
size, logging warnings if it falls below the recommended 128GB threshold. This helps users identify potential memory issues that could impact performance. - New Command-Line Argument: A new
--disable-shm-warning
argument has been added toapi_cli.py
. When this argument is set, LightLLM will perform a single SHM size check only during startup and then disable further periodic checks, providing flexibility for users who prefer a less verbose output.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This PR introduces a shared memory size check, enhancing robustness. The review suggests clarifying the memory check (total vs. available) and improving maintainability via constants and import conventions.
def get_shm_size_gb(): | ||
""" | ||
获取 /dev/shm 的总大小(以GB为单位)。 | ||
""" | ||
try: | ||
shm_path = "/dev/shm" | ||
if not os.path.exists(shm_path): | ||
logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.") | ||
return 0 | ||
|
||
# shutil.disk_usage 返回 (total, used, free) | ||
total_bytes = shutil.disk_usage(shm_path).total | ||
total_gb = total_bytes / (1024 ** 3) | ||
return total_gb | ||
except Exception as e: | ||
logger.error(f"Error getting /dev/shm size: {e}") | ||
return 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The function calculates the total SHM size but the PR description refers to "available" space. This can be misleading. To check for available space, use shutil.disk_usage(shm_path).free
and rename the function to get_shm_free_size_gb
for clarity.
def get_shm_size_gb(): | |
""" | |
获取 /dev/shm 的总大小(以GB为单位)。 | |
""" | |
try: | |
shm_path = "/dev/shm" | |
if not os.path.exists(shm_path): | |
logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.") | |
return 0 | |
# shutil.disk_usage 返回 (total, used, free) | |
total_bytes = shutil.disk_usage(shm_path).total | |
total_gb = total_bytes / (1024 ** 3) | |
return total_gb | |
except Exception as e: | |
logger.error(f"Error getting /dev/shm size: {e}") | |
return 0 | |
def get_shm_free_size_gb(): | |
""" | |
获取 /dev/shm 的可用大小(以GB为单位)。 | |
""" | |
try: | |
shm_path = "/dev/shm" | |
if not os.path.exists(shm_path): | |
logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.") | |
return 0 | |
# shutil.disk_usage 返回 (total, used, free) | |
free_bytes = shutil.disk_usage(shm_path).free | |
free_gb = free_bytes / (1024 ** 3) | |
return free_gb | |
except Exception as e: | |
logger.error(f"Error getting /dev/shm size: {e}") | |
return 0 |
required_size = 128 # 128G | ||
if shm_size < required_size: | ||
logger.warning(f"{RED}Available shm size is less than 128G: {shm_size:.2f}G{ENDC}") | ||
else: | ||
logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {required_size} GB).{ENDC}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better maintainability, define required_size
as a constant REQUIRED_SIZE_GB
and use it in the warning message to avoid hardcoding the value 128G
.
required_size = 128 # 128G | |
if shm_size < required_size: | |
logger.warning(f"{RED}Available shm size is less than 128G: {shm_size:.2f}G{ENDC}") | |
else: | |
logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {required_size} GB).{ENDC}") | |
REQUIRED_SIZE_GB = 128 # 128G | |
if shm_size < REQUIRED_SIZE_GB: | |
logger.warning(f"{RED}Available shm size is less than {REQUIRED_SIZE_GB}G: {shm_size:.2f}G{ENDC}") | |
else: | |
logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {REQUIRED_SIZE_GB} GB).{ENDC}") |
import threading | ||
|
||
def periodic_shm_warning(): | ||
while True: | ||
check_shm_size() | ||
time.sleep(120) # 每 120 秒打印一次警告日志 | ||
|
||
shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True) | ||
shm_warning_thread.start() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For style and maintainability:
- Move
import threading
to the top of the file (PEP 8). - Define the sleep interval
120
as a named constant (e.g.,SHM_CHECK_INTERVAL_S = 120
) at the module level.
Since I can't suggest changes outside the diff, I'll define the constant locally, but consider moving both the import and the constant to the module level.
import threading | |
def periodic_shm_warning(): | |
while True: | |
check_shm_size() | |
time.sleep(120) # 每 120 秒打印一次警告日志 | |
shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True) | |
shm_warning_thread.start() | |
import threading | |
SHM_CHECK_INTERVAL_S = 120 # Consider moving this to a module-level constant | |
def periodic_shm_warning(): | |
while True: | |
check_shm_size() | |
time.sleep(SHM_CHECK_INTERVAL_S) # 每 120 秒打印一次警告日志 | |
shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True) | |
shm_warning_thread.start() |
New Startup Command:
--disable-shm-warning
.By default, LightLLM will now periodically check the available SHM size every 120 seconds and log warnings if it's below the recommended threshold (128GB).
If
--disable-shm-warning
is set, LightLLM will perform a one-time SHM size check only during startup, and then disable further periodic checks.