Skip to content

Feature/windows event id filtering #1737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

Paamicky
Copy link
Collaborator

@Paamicky Paamicky commented Jun 18, 2025

Description of the issue

The CloudWatch Agent currently filters Windows Event Logs based on security levels (i.e Error, Critical, Information...). This does not allow customers to only collect specific and relevant logs to CloudWatch

Description of changes

  • Add Event ID filtering to the CWAgent Windows logs collection

sample configuration of event ID filtering

  "logs": {
    "logs_collected": {
      "windows_events": {
        "collect_list": [
          {
            "event_name": "System",
            "event_ids": [4624, 4625, 4634],
            "log_group_name": "WindowsEventIds-Test",
            "log_stream_name": "{instance_id}"
          }
        ]
      }
    }
  }
}

Caution

This PR has a testing PR linked to here: aws/amazon-cloudwatch-agent-test#541, merge this first.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

  • Added unit tests for event ID validation and processing
  • Manually tested on Windows to verify events are properly filtered by ID
  • Verified backward compatibility with existing configurations

Requirements

Before commiting your code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

Integration Tests

To run integration tests against this PR, add the ready for testing label.

@Paamicky Paamicky force-pushed the feature/windows-eventId-filtering branch 3 times, most recently from b758b9e to 516df4e Compare June 20, 2025 16:09
@Paamicky Paamicky force-pushed the feature/windows-eventId-filtering branch from 516df4e to 2276ab2 Compare June 20, 2025 17:42
@Paamicky Paamicky marked this pull request as ready for review June 23, 2025 01:09
@Paamicky Paamicky requested a review from a team as a code owner June 23, 2025 01:09
@Paamicky Paamicky requested review from jefchien and okankoAMZ June 23, 2025 01:11
@@ -24,6 +24,7 @@
batch_read_size = 170
event_levels = ["4", "0", "2"]
event_name = "System"
log_group_class = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we know why this is being added? I don't think your changes should have affected this field.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I raised this earlier, I don't know why it is been added. I'm thinking it will be from a previous update to the agent from another commit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to find that commit and understand why this change is necessary more concretely before we merge this in. I am especially interested in the region_type field under [[outputs.cloudwatchlogs]]

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will check that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The log_group_class and region_type were implemented about 20 months ago by previous commits. It got added since I'm generating the TOML files for the first time.

You can find more about it here:
log_group_class

Region_type:
rurleRegion
regionType

@@ -118,4 +119,41 @@ func addFixedJsonConfig(result map[string]interface{}) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is comment is for line 100. I can't comment on it directly though since it wasn't modified.

The function should not return early anymore if the EventLevelsKey is not found since the event_levels key is no longer strictly necessary. As this function is right now, I believe it won't validate the event IDs if the event_levels key is not present. You'll need to change the event levels to act like what you have in the event ID section.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything related to event levels needs to go inside the if statement or there will be some behavior changes:

	if eventLevels, ok := result[EventLevelsKey]; ok {
		inputEventLevels = eventLevels.([]interface{})
		resultEventLevels := []interface{}{}
		for _, eventLevel := range inputEventLevels {
			switch eventLevel.(string) {
			case "CRITICAL":
				resultEventLevels = append(resultEventLevels, "1")
			case "ERROR":
				resultEventLevels = append(resultEventLevels, "2")
			case "WARNING":
				resultEventLevels = append(resultEventLevels, "3")
			case "INFORMATION":
				resultEventLevels = append(resultEventLevels, "4", "0")
			case "VERBOSE":
				resultEventLevels = append(resultEventLevels, "5")
			default:
				translator.AddErrorMessages(GetCurPath(), fmt.Sprintf("Cannot find the mapping for Windows event level %v.", eventLevel))
			}
		}
		result[EventLevelsKey] = resultEventLevels
	}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Comment on lines +137 to +140
const (
minEventID = 0
maxEventID = 65535
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to link to documentation or explain why these values are selected

we should also have test cases for these values and their neighbors (e.g. -1, 0, 1, 65534, 65535, 65536) to ensure they are all handled properly

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added link to documentation!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on those docs, the minimum event ID is actually 1

Category | Range | Currently in use | Comments
Microsoft Active Accessibility events (System Reserved) | 0x0001-0x00FF | 0x0001-0x0020 | EVENT_SYSTEM_* event IDs

Copy link
Collaborator Author

@Paamicky Paamicky Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that's right, but technically the event ids ranges from 0-65535. So I thought for validation or programming purposes, it would be correct as these are the absolute min and max values that can be stored in the event ids field. They are mostly reserved for if application crashes or fails to install,,
Screenshot 2025-06-23 at 22 48 11

@Paamicky Paamicky force-pushed the feature/windows-eventId-filtering branch from 9d99660 to 2667175 Compare June 23, 2025 19:27
@@ -61,6 +61,7 @@ func (s *Plugin) SampleConfig() string {
[[inputs.windows_event_log.event_config]]
event_name = "System"
event_levels = ["2", "3"]
event_ids = [1001, 1002]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are these event ids what are they representing, can we comment that here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They are generic event ids to show how the final sampleConfig of the Toml will look like. it serves as documentation and example config for users of the Windows event Log plugin. So I just add the event id since it is a feature now.

@@ -36,34 +25,7 @@ const (
UNKNOWN = "UNKNOWN"
)

var NumberOfBytesPerCharacter = UnknownBytesPerCharacter

func RenderEventXML(eventHandle EvtHandle, renderBuf []byte) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we deleting this function can you explain in your PR description?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just moved them to a different file due to the windows build tags. I wanted run some test on the linux environment so it doesn't skip those tests.

return utf16ToUTF8Bytes(renderBuf, bufferUsed)
}

func CreateBookmark(channel string, recordID uint64) (h EvtHandle, err error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we deleting this function can you explain in your PR description?

Copy link
Collaborator Author

@Paamicky Paamicky Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a different file due to the windows build tags.


func WindowsEventLogLevelName(levelId int32) string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we deleting this function can you explain in your PR description?

Copy link
Collaborator Author

@Paamicky Paamicky Jun 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to a different file due to the windows build tags.


var NumberOfBytesPerCharacter = UnknownBytesPerCharacter

func RenderEventXML(eventHandle EvtHandle, renderBuf []byte) ([]byte, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so you are moving into a different file. Please document that in your description

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright sure

region = "us-west-2"
region_type = "ACJ"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is region type and what is "ACJ"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I will investigate, it got added when I generated the files, probably from a previous commit.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The region_type was implemented about 20 months ago by previous commits. It got added since I'm generating the TOML files for the first time.

You can find more about it here:

Region_type:
rurleRegion
regionType

if err != nil {
log.Panicf("Encode to a valid TOML config fails because of %v", err)
}
return buf.String()
}

// ensure integers in arrays are preserved
func processValue(val interface{}) interface{} {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we have some kind of validation and returning error here. With the current implementation it seems like this function could never fail, is this true?

Comment on lines -98 to -100
if eventLevels, ok := result[EventLevelsKey]; !ok {
return
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With this change now, we keep going when the eventLevel is missing. Is this intentional?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the eventLevel is missing it will proceed to check the for eventids, actually this still sets eventlevels to an empty slice of string instead of nil, I will fix this.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

Comment on lines +137 to +140
const (
minEventID = 0
maxEventID = 65535
)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on those docs, the minimum event ID is actually 1

Category | Range | Currently in use | Comments
Microsoft Active Accessibility events (System Reserved) | 0x0001-0x00FF | 0x0001-0x0020 | EVENT_SYSTEM_* event IDs

@@ -25,6 +25,10 @@ func TestApplyRule(t *testing.T) {
"INFORMATION",
"SUCCESS"
],
"event_ids": [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: formatting

Comment on lines +14 to +15
// Process value to ensure integers in arrays are preserved
processedVal := processValue(val)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. There are some other integers that aren't mangled like log retention. I wonder why those work. Perhaps because they aren't in an array? We should understand why those integers don't require additional processing.

@@ -24,6 +24,7 @@
batch_read_size = 170
event_levels = ["4", "0", "2"]
event_name = "System"
log_group_class = ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to find that commit and understand why this change is necessary more concretely before we merge this in. I am especially interested in the region_type field under [[outputs.cloudwatchlogs]]

@@ -118,4 +119,41 @@ func addFixedJsonConfig(result map[string]interface{}) {
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything related to event levels needs to go inside the if statement or there will be some behavior changes:

	if eventLevels, ok := result[EventLevelsKey]; ok {
		inputEventLevels = eventLevels.([]interface{})
		resultEventLevels := []interface{}{}
		for _, eventLevel := range inputEventLevels {
			switch eventLevel.(string) {
			case "CRITICAL":
				resultEventLevels = append(resultEventLevels, "1")
			case "ERROR":
				resultEventLevels = append(resultEventLevels, "2")
			case "WARNING":
				resultEventLevels = append(resultEventLevels, "3")
			case "INFORMATION":
				resultEventLevels = append(resultEventLevels, "4", "0")
			case "VERBOSE":
				resultEventLevels = append(resultEventLevels, "5")
			default:
				translator.AddErrorMessages(GetCurPath(), fmt.Sprintf("Cannot find the mapping for Windows event level %v.", eventLevel))
			}
		}
		result[EventLevelsKey] = resultEventLevels
	}

@Paamicky Paamicky force-pushed the feature/windows-eventId-filtering branch from 2667175 to ea8ce75 Compare June 24, 2025 03:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants