-
Notifications
You must be signed in to change notification settings - Fork 7
Notify VM Creation Failure #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -4,11 +4,24 @@ import ( | |
| "context" | ||
| "time" | ||
|
|
||
| "fmt" | ||
| "sync" | ||
|
|
||
| "github.com/IBM/power-access-cloud/api/apis/app/v1alpha1" | ||
| "github.com/IBM/power-access-cloud/api/internal/pkg/pac-go-server/db" | ||
| "github.com/IBM/power-access-cloud/api/internal/pkg/pac-go-server/db/mongodb" | ||
| "github.com/IBM/power-access-cloud/api/internal/pkg/pac-go-server/models" | ||
| "github.com/go-logr/logr" | ||
| "github.com/pkg/errors" | ||
| "sigs.k8s.io/cluster-api/util/patch" | ||
| ) | ||
|
|
||
| var ( | ||
| notificationCache = make(map[string]time.Time) | ||
| cacheMutex sync.RWMutex | ||
| minIntervalMinutes = 30 | ||
| ) | ||
|
|
||
| type ServiceScopeParams struct { | ||
| ControllerScopeParams | ||
| Service *v1alpha1.Service | ||
|
|
@@ -30,6 +43,48 @@ func (m *ServiceScope) PatchServiceObject() error { | |
| return m.servicePatchHelper.Patch(context.TODO(), m.Service) | ||
| } | ||
|
|
||
| // NotifyServiceCreationFailure creates an event to notify about service creation failure | ||
| func (s *ServiceScope) NotifyServiceCreationFailure(errorMessage string) error { | ||
| if !shouldNotify(s.Service.Status.VM.InstanceID) { | ||
| return nil | ||
| } | ||
| event, err := models.NewEvent(s.Service.Spec.UserID, s.Service.Spec.UserID, models.EventServiceCreateFailed) | ||
| if err != nil { | ||
| return err | ||
| } | ||
|
|
||
| // Notify both user and admin | ||
| event.SetNotifiyBoth() | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do we need to notify user as well? I think sending a email to user might not be needed. @mayuka-c what do you think
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah I think not required for user, only admin should be fine. |
||
|
|
||
| // Set the error message | ||
| logMessage := fmt.Sprintf("Service '%s' creation failed. Reason: %s", s.Service.Name, errorMessage) | ||
| event.SetLog(models.EventLogLevelERROR, logMessage) | ||
|
|
||
| dbCon, disconnect, err := connectDB(s.Logger) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It will be better if we can maintain a single long living connection instead of recreating the connection every time this method gets called. Hypothetically if a bunch of VMs fail at the same time then this will create a bunch of connections to the DB potentially causing slowdown/crash on the DB as well.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The original changes was actually like that, Mayuka added this comment that contradicts with yours. I think, maintaining a connection would be a better idea
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. True, better to have a single persisted connection rather
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @GunaKKIBM We can go with single DB connection but please do make it a singleton (if the connection doesn't exist or if fails -> create new one or resuse the same connection) |
||
| if err != nil { | ||
| s.Logger.Error(err, "Error connecting to DB") | ||
| return err | ||
| } | ||
| defer disconnect() | ||
|
|
||
| err = dbCon.NewEvent(event) | ||
| if err != nil { | ||
| return err | ||
| } | ||
| recordNotification(s.Service.Status.VM.InstanceID) | ||
| s.Logger.Info("Created failure notification event", "service", s.Service.Name) | ||
| return nil | ||
| } | ||
|
|
||
| func (s *ServiceScope) ClearNotificationCache() { | ||
| if s.Service.Status.VM.InstanceID != "" { | ||
| clearNotification(s.Service.Status.VM.InstanceID) | ||
| s.Logger.Info("Cleared notification cache for VM instance", | ||
| "instanceID", s.Service.Status.VM.InstanceID, | ||
| "service", s.Service.Name) | ||
| } | ||
| } | ||
|
|
||
| func NewServiceScope(ctx context.Context, params ServiceScopeParams) (*ServiceScope, error) { | ||
| scope := &ServiceScope{} | ||
|
|
||
|
|
@@ -55,3 +110,50 @@ func NewServiceScope(ctx context.Context, params ServiceScopeParams) (*ServiceSc | |
|
|
||
| return scope, nil | ||
| } | ||
|
|
||
| // shouldNotify returns true if a notification can be sent for the service. | ||
| // Implements rate limiting to prevent duplicate notifications within minIntervalMinutes. | ||
| func shouldNotify(instanceID string) bool { | ||
| cacheMutex.RLock() | ||
| lastNotified, exists := notificationCache[instanceID] | ||
| cacheMutex.RUnlock() | ||
|
|
||
| if !exists { | ||
| return true | ||
| } | ||
|
|
||
| return time.Since(lastNotified) > time.Duration(minIntervalMinutes)*time.Minute | ||
| } | ||
|
|
||
| // recordNotification records the current time as the last notification timestamp for the service. | ||
| // Used by rate limiting to track when notifications were sent. | ||
| func recordNotification(instanceID string) { | ||
| cacheMutex.Lock() | ||
| notificationCache[instanceID] = time.Now() | ||
| cacheMutex.Unlock() | ||
| } | ||
|
|
||
| // clearNotification removes the service from the notification cache. | ||
| func clearNotification(instanceID string) { | ||
| cacheMutex.Lock() | ||
| delete(notificationCache, instanceID) | ||
| cacheMutex.Unlock() | ||
| } | ||
|
|
||
| // connectDB establishes a database connection and returns the connection along with a disconnect function | ||
| // The caller is responsible for calling the disconnect function (typically with defer) | ||
| func connectDB(logger logr.Logger) (db.DB, func(), error) { | ||
| dbCon := mongodb.New() | ||
| if err := dbCon.Connect(); err != nil { | ||
| return nil, nil, fmt.Errorf("failed to connect to MongoDB: %w", err) | ||
| } | ||
|
|
||
| // Return disconnect function that the caller can defer | ||
| disconnect := func() { | ||
| if err := dbCon.Disconnect(); err != nil { | ||
| logger.Error(err, "Failed disconnecting from MongoDB") | ||
| } | ||
| } | ||
|
|
||
| return dbCon, disconnect, nil | ||
| } | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -78,4 +78,4 @@ func (db *MongoDB) CollectionExists(name string) (bool, error) { | |
|
|
||
| // collection exists | ||
| return true, nil | ||
| } | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. looks like, this got updated because of gofmt |
||
| } | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was this a mistake?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. .I ran gofmt, on a few files, and this got added, will remove it. |
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the cache is in-memory this can cause multiple emails in a short time if the controller goes into a restart loop, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this can happen. In the long run, we should persist this using DB