Skip to content

Add TTL for retention policy #1699

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Add TTL for retention policy #1699

wants to merge 4 commits into from

Conversation

duhminick
Copy link
Contributor

@duhminick duhminick commented May 21, 2025

Description of the issue

If a customer frequently restarts the agent frequently when having a high number of deployments/high number of log groups configured, DescribeLogGroups (DLG) and PutRetentionPolicy (PRP) can be throttled. The DLG/PRP calls are used to add/update retention policies for log groups at the start of the agent.

Description of changes

A new state file named Amazon_CloudWatch_RetentionPolicyTTL is being added. It will contain the log group and the last timestamp in which the retention policy was checked (and no updates were needed) or when the retention policy was updated. The format per line of the file is loggroupname:timestamp. An example of two log groups:

log1:1234567890
log2:1234567890

RetentionPolicyTTL

  1. The state file is read on startup and stored into oldTimestamps. The IsExpired(group) call will read from oldTimestamps and used to determine if the retention policy should be checked/updated.
  2. The timestamps are cached into the new struct RetentionPolicyTTL which has the field newTimestamps.
    a. There is a scenario in which timestamps from oldTimestamps are persisted, that is when the timestamp is expired. This is so we do not lose timestamps from previous agent runs. As a side effect, this will help clean up timestamps for log groups that are no longer configured by the user.
  3. The state file is saved periodically at a 1 minute interval. It can also be saved by calling Stop().

Target

  1. Before a Target is checked/updated, the IsExpired(group) call is made. If not expired, then persist the read timestamp into the new timestamp cache using UpdateFromFile(group). If expired, then continue the logic of checking/updating the retention policy
  2. The cache is updated using Update(group) when the retention policy is valid (checked using DLG) or when the retention policy was updated (updated using PRP).

Logfile Input

  1. An additional separate change was made to make sure that the new state file does not get cleaned up since it's re-using the state folder.

Translation

  1. The path to the state folder is now configured in the output CWL configuration section for the agent TOML config.

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

  1. Unit test

Scenarios

1. Two log groups configured, first run

$ cat /opt/aws/amazon-cloudwatch-agent/logs/state/Amazon_CloudWatch_RetentionPolicyTTL fails because file does not exist

  • Log groups are updated, and file is written with content:
log.txt:1747933820935
log2.txt:1747933820943

2. Two log groups configured, restarted within 5 minutes of TTL

  • Content remains the same
log.txt:1747933820935
log2.txt:1747933820943

3. Two log groups configured, restarted after 5 minutes of TTL

  • Content is updated with the new timestamp
log.txt:1747937120236
log2.txt:1747937120243

4. Two logs to one log group configured, restarted

  • Content is updated to only have the one log group configured
log.txt:1747937120236

Requirements

Before commit the code, please do the following steps.

  1. Run make fmt and make fmt-sh
  2. Run make lint

@duhminick duhminick marked this pull request as ready for review May 22, 2025 15:47
@duhminick duhminick requested a review from a team as a code owner May 22, 2025 15:47
Copy link
Contributor

This PR was marked stale due to lack of activity.

@github-actions github-actions bot added the Stale label May 30, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant