Skip to content

Add alert monitoring#2

Open
AsharMoin wants to merge 1 commit into
Retailogists:mainfrom
AsharMoin:feature/proactive-alert-monitoring
Open

Add alert monitoring#2
AsharMoin wants to merge 1 commit into
Retailogists:mainfrom
AsharMoin:feature/proactive-alert-monitoring

Conversation

@AsharMoin

Copy link
Copy Markdown

Overview

Adds proactive alert monitoring to the GCP Monitoring Bot, allowing users to set up background alerts that automatically trigger when GCP resource thresholds are exceeded, eliminating the need for manual monitoring queries.

Key Features

Background thread-based monitoring system with configurable check intervals
Alert rule management: create, list, and delete custom monitoring rules
Real-time metrics from GCP using preexisting functions (VM CPU utilization)
Alert display integrated into conversation flow
Alerts fire once then auto-delete

Technical Implementation

  • New core/alerts/ module with a rule engine and notification system
  • JSON-based alert storage
  • Enhanced bot with 3 new alert-related tools
  • Background monitoring thread starts automatically with main application
  • Cleanup is handled at the end of every conversation

Usage Examples

User :> Create an alert rule with resource_type "vm", metric "cpu_utilization", threshold 80, and name it "cpu_monitor"
Bot :> Alert 'cpu_alert' created!

[60 seconds later, if CPU exceeds threshold]

🚨 ALERT TRIGGERED 🚨
Rule: cpu_alert
Current: 90.0 | Threshold: 80
Time: 14:40:29

User :>

Supported Alert Types

VM CPU Monitoring: create_alert_rule("cpu_alert", "vm", "cpu_utilization", 80)

Testing

Manual testing successful with mock cpu_utilization

Alert creation via GenAI bot commands: Working
Background monitoring thread: Working
Alert triggering and notification: Working (For the sake of the demo have to replace some logic with hard code)
Alert display in conversation flow: Working
Alert rule persistence and cleanup: Working

###Current Limitations
Demo-focused implementation: Simplified for presentation purposes, not production-ready
Limited metric types: Only VM CPU usage supported
Single VM monitoring: Checks only first VM in zone, not all instances
Bot requires very specific prompting: The bot requires very specific and clear prompting other wise it will not be able to make the alert
Console-only notifications: No email, Slack, or webhook integrations

Technical Debt

Hardcoded test CPU values in _get_metric_value() for reliable demo
No configuration file for alert settings or check intervals
Alert rule validation is minimal (trusts user input)

Breaking Change Risk

NONE - This is purely additive functionality:

All existing bot commands and functionality preserved
New alert tools added to existing tool set
Background monitoring runs independently of main conversation loop
No changes to existing GCP monitoring functions

Files Added:

core/alerts/rule_engine.py
core/alerts/alert_storage.py
core/alerts/alert_scheduler.py
core/alerts/notification_handler.py
core/alerts/init.py

Files Modified:

core/bot.py - Added alert tools to GenAI bot capabilities
main.py - Auto-start monitoring and alert cleanup on exit

- Created alert rule engine for creating/managing alert rules
- Added background monitoring thread that checks GCP metrics using our tool functions
- Added alert creation capabilities to GenAI bot
- Added alert display in main conversation loop
- Supports VM CPU utilization only (for now)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant