Skip to content

Add shm size check #978

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

flyinglandlord
Copy link
Collaborator

New Startup Command: --disable-shm-warning.

By default, LightLLM will now periodically check the available SHM size every 120 seconds and log warnings if it's below the recommended threshold (128GB).

If --disable-shm-warning is set, LightLLM will perform a one-time SHM size check only during startup, and then disable further periodic checks.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @flyinglandlord, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a proactive shared memory (/dev/shm) monitoring feature to LightLLM. It aims to alert users about potentially insufficient SHM space, which is critical for optimal performance, by providing both continuous periodic checks and an option for a one-time startup verification.

Highlights

  • New Feature: SHM Size Monitoring: LightLLM now includes a mechanism to periodically check the available /dev/shm size, logging warnings if it falls below the recommended 128GB threshold. This helps users identify potential memory issues that could impact performance.
  • New Command-Line Argument: A new --disable-shm-warning argument has been added to api_cli.py. When this argument is set, LightLLM will perform a single SHM size check only during startup and then disable further periodic checks, providing flexibility for users who prefer a less verbose output.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This PR introduces a shared memory size check, enhancing robustness. The review suggests clarifying the memory check (total vs. available) and improving maintainability via constants and import conventions.

Comment on lines +23 to +39
def get_shm_size_gb():
"""
获取 /dev/shm 的总大小(以GB为单位)。
"""
try:
shm_path = "/dev/shm"
if not os.path.exists(shm_path):
logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.")
return 0

# shutil.disk_usage 返回 (total, used, free)
total_bytes = shutil.disk_usage(shm_path).total
total_gb = total_bytes / (1024 ** 3)
return total_gb
except Exception as e:
logger.error(f"Error getting /dev/shm size: {e}")
return 0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The function calculates the total SHM size but the PR description refers to "available" space. This can be misleading. To check for available space, use shutil.disk_usage(shm_path).free and rename the function to get_shm_free_size_gb for clarity.

Suggested change
def get_shm_size_gb():
"""
获取 /dev/shm 的总大小以GB为单位)。
"""
try:
shm_path = "/dev/shm"
if not os.path.exists(shm_path):
logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.")
return 0
# shutil.disk_usage 返回 (total, used, free)
total_bytes = shutil.disk_usage(shm_path).total
total_gb = total_bytes / (1024 ** 3)
return total_gb
except Exception as e:
logger.error(f"Error getting /dev/shm size: {e}")
return 0
def get_shm_free_size_gb():
"""
获取 /dev/shm 的可用大小以GB为单位)。
"""
try:
shm_path = "/dev/shm"
if not os.path.exists(shm_path):
logger.error(f"{shm_path} not exist, this may indicate a system or Docker configuration anomaly.")
return 0
# shutil.disk_usage 返回 (total, used, free)
free_bytes = shutil.disk_usage(shm_path).free
free_gb = free_bytes / (1024 ** 3)
return free_gb
except Exception as e:
logger.error(f"Error getting /dev/shm size: {e}")
return 0

Comment on lines +47 to +51
required_size = 128 # 128G
if shm_size < required_size:
logger.warning(f"{RED}Available shm size is less than 128G: {shm_size:.2f}G{ENDC}")
else:
logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {required_size} GB).{ENDC}")

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For better maintainability, define required_size as a constant REQUIRED_SIZE_GB and use it in the warning message to avoid hardcoding the value 128G.

Suggested change
required_size = 128 # 128G
if shm_size < required_size:
logger.warning(f"{RED}Available shm size is less than 128G: {shm_size:.2f}G{ENDC}")
else:
logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {required_size} GB).{ENDC}")
REQUIRED_SIZE_GB = 128 # 128G
if shm_size < REQUIRED_SIZE_GB:
logger.warning(f"{RED}Available shm size is less than {REQUIRED_SIZE_GB}G: {shm_size:.2f}G{ENDC}")
else:
logger.info(f"{GREEN}/dev/shm available space is sufficient ({shm_size:.2f} GB >= {REQUIRED_SIZE_GB} GB).{ENDC}")

Comment on lines +100 to +108
import threading

def periodic_shm_warning():
while True:
check_shm_size()
time.sleep(120) # 每 120 秒打印一次警告日志

shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True)
shm_warning_thread.start()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For style and maintainability:

  • Move import threading to the top of the file (PEP 8).
  • Define the sleep interval 120 as a named constant (e.g., SHM_CHECK_INTERVAL_S = 120) at the module level.

Since I can't suggest changes outside the diff, I'll define the constant locally, but consider moving both the import and the constant to the module level.

Suggested change
import threading
def periodic_shm_warning():
while True:
check_shm_size()
time.sleep(120) # 每 120 秒打印一次警告日志
shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True)
shm_warning_thread.start()
import threading
SHM_CHECK_INTERVAL_S = 120 # Consider moving this to a module-level constant
def periodic_shm_warning():
while True:
check_shm_size()
time.sleep(SHM_CHECK_INTERVAL_S) # 每 120 秒打印一次警告日志
shm_warning_thread = threading.Thread(target=periodic_shm_warning, daemon=True)
shm_warning_thread.start()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant