Skip to content

Conversation

@rudransh-shrivastava
Copy link
Collaborator

@rudransh-shrivastava rudransh-shrivastava commented Dec 2, 2025

Resolves #2771

Proposed change

  • Add lifecycle for untagged images.
  • Use boto3 for ECS: load_data_task.
  • Remove use of AWS-managed IAM policies (AmazonEC*).

Checklist

  • I've read and followed the contributing guidelines.
  • I've run make check-test locally; all checks and tests passed.

@github-actions github-actions bot added docs Improvements or additions to documentation backend backend-tests makefile ci labels Dec 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Summary by CodeRabbit

  • New Features

    • Added support for container image tagging to enable more reliable deployments with specific versions instead of always using the latest.
  • Infrastructure

    • Automatic cleanup of untagged container images after 7 days to reduce storage costs.
    • Increased compute resources (CPU and memory) for data loading operations.
    • Enhanced security with granular identity and access policies for container tasks.

✏️ Tip: You can customize this high-level summary in your review settings.

Walkthrough

Adds ECR lifecycle policy for untagged images, replaces AWS-managed ECS/EventBridge IAM attachments with custom policies, introduces image_tag variable and updates task image URLs to use it, increases load_data_task resources, switches load_data_task to boto3, exposes fixtures S3 bucket name and wires staging to it.

Changes

Cohort / File(s) Summary
ECS IAM & ECR changes
infrastructure/modules/ecs/main.tf
Adds aws_ecr_lifecycle_policy to expire untagged images after 7 days; adds custom aws_iam_policy resources for ECS task execution and EventBridge-to-ECS RunTask/PassRole and their aws_iam_role_policy_attachments, replacing prior AWS-managed policy attachments.
ECS task image/tagging & scripts
infrastructure/modules/ecs/main.tf, infrastructure/modules/ecs/*
Updates multiple task module invocations to use "${aws_ecr_repository.main.repository_url}:${var.image_tag}" for image_url (sync_data_task, owasp_update_*_task, migrate_task, load_data_task, index_data_task); replaces CLI-based fixture download with a Python boto3 inline script.
ECS variables
infrastructure/modules/ecs/variables.tf
Adds variable "image_tag" { type = string, default = "latest" }; increases load_data_task_cpu default 256→512 and load_data_task_memory default 2048→4096.
Storage outputs
infrastructure/modules/storage/outputs.tf
Adds output fixtures_s3_bucket_name = module.fixtures_bucket.bucket.id.
Staging wiring
infrastructure/staging/main.tf
Updates module "ecs" input fixtures_bucket_name to module.storage.fixtures_s3_bucket_name.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

  • Check custom IAM policy ARNs and least-privilege scoping (ECS execution and EventBridge RunTask/PassRole).
  • Verify inline boto3 script correctness, boto3 import, error handling, and ECS task environment expectations.
  • Confirm all task image references now correctly include :${var.image_tag} and default behavior remains acceptable.
  • Review increased CPU/memory defaults for load_data_task for suitability.

Possibly related issues

  • Improve Zappa/Terraform Migration #2715 — Changes match requests to remove hardcoded latest, add ECR lifecycle for untagged images, switch load_data to boto3, and replace AWS-managed IAM policies.

Possibly related PRs

Suggested labels

backend

Suggested reviewers

  • kasya
  • arkid15r

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Improve ecs Module' accurately describes the main changes across the ECS module infrastructure code.
Description check ✅ Passed The description directly maps to the linked issue #2771, listing the three key implementation objectives: ECR lifecycle, boto3 usage, and removal of AWS-managed IAM policies.
Linked Issues check ✅ Passed The PR successfully implements all four objectives from linked issue #2771: removes hardcoded 'latest' tag via image_tag variable, adds ECR lifecycle policy for untagged images, replaces AWS CLI with boto3 in load_data_task, and replaces AWS-managed IAM policies with custom policies.
Out of Scope Changes check ✅ Passed All changes are scoped to the stated objectives. Updates to load_data_task CPU/memory defaults and fixtures_s3_bucket_name output are supporting changes for the load_data_task boto3 migration and are within scope.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da6cb9a and 0409690.

📒 Files selected for processing (4)
  • infrastructure/modules/ecs/main.tf (10 hunks)
  • infrastructure/modules/ecs/variables.tf (2 hunks)
  • infrastructure/modules/storage/outputs.tf (1 hunks)
  • infrastructure/staging/main.tf (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • infrastructure/modules/storage/outputs.tf
  • infrastructure/modules/ecs/variables.tf
  • infrastructure/staging/main.tf
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run backend tests
  • GitHub Check: Run frontend unit tests
  • GitHub Check: CodeQL (javascript-typescript)
🔇 Additional comments (4)
infrastructure/modules/ecs/main.tf (4)

19-39: Lifecycle policy for untagged images correctly implemented.

The ECR lifecycle policy properly addresses the PR objective by expiring untagged images after 7 days, preventing accumulation of orphaned image layers.


85-118: Custom IAM policies successfully replace AWS-managed equivalents.

The ecs_tasks_execution_policy and event_bridge_ecs_policy are appropriately scoped with least-privilege access:

  • Execution policy grants ECR token retrieval globally, repository-specific layer access, and CloudWatch Logs operations scoped to project/environment log groups.
  • EventBridge policy restricts RunTask to the specific cluster via condition, and PassRole is limited to the two required roles.

Also applies to: 171-198


217-217: Image tag variable usage eliminates hardcoded latest tag.

All six task definitions now use ${var.image_tag} instead of hardcoded image tags, enabling flexible image versioning per deployment.


304-308: boto3 implementation for fixture download is appropriate for the load_data_task.

The code uses boto3's s3.download_file() method, which automatically leverages IAM role credentials attached to the ECS task. Ensure that:

  • The fixtures_bucket_name variable is properly defined in variables.tf
  • The IAM role attached to the ECS task has read permissions on the fixtures S3 bucket
  • Error handling is configured at the task level (e.g., the shell script uses set -e or equivalent)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@rudransh-shrivastava rudransh-shrivastava changed the base branch from main to feature/nest-zappa-migration December 2, 2025 15:23
@rudransh-shrivastava rudransh-shrivastava force-pushed the feature/nest-zappa-migration-ecs-improvement branch from 96cfa89 to 8159861 Compare December 2, 2025 15:24
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (5)
infrastructure/README.md (3)

58-75: Clarify placeholder for state bucket in backend configuration file.

The new backend configuration file workflow is helpful, but users need explicit guidance on retrieving the state bucket name. Line 68 instructs updating the state bucket name from "the previous step," but users should know this comes from the backend output (e.g., terraform output from infrastructure/backend/). Consider adding a step to explicitly run cd infrastructure/backend && terraform output or similar to retrieve the bucket name.

Also, verify that -backend-config=terraform.tfbackend correctly resolves relative paths from the staging/ directory during terraform init.


68-68: Fix markdown formatting for consistency.

Lines 68 and 70 use *Note:* (emphasis), but other notes in the file use **Note:** (bold). Please standardize to bold.

Also applies to: 70-70


183-183: Clarify the <id> placeholder in fixture bucket path.

The placeholder owasp-nest-fixtures-<id> should be documented. Add a note that <id> is the random suffix output from the backend setup step, or provide a command to retrieve it dynamically (e.g., terraform output fixtures_s3_bucket_name).

infrastructure/staging/terraform.tfbackend.example (1)

1-3: Placeholder token format should match documentation convention.

The ${STATE_BUCKET_NAME} placeholder is clear, but verify it aligns with documentation. If users reference the README instructions, ensure the placeholder name matches what's documented (e.g., if README says "Copy the state bucket name from output as STATE_BUCKET_NAME", this should match).

Also, consider whether region = "us-east-2" should be a placeholder (e.g., ${AWS_REGION}) if users might deploy to different regions, or if it's intentionally hardcoded to the staging region default.

infrastructure/modules/storage/modules/s3-bucket/main.tf (1)

38-45: Prevent_destroy is a breaking change—consider making it optional per module instance.

Removing force_destroy and adding lifecycle.prevent_destroy = true is a significant safeguard, but it's also a breaking change for module consumers. All S3 buckets created via this module (fixtures, zappa) now require manual intervention (terraform state rm or prevent_destroy = false) before destruction.

This may conflict with automation workflows that expect to cleanly destroy test environments. Consider adding a prevent_destroy_bucket variable (defaulting to true for safety) so consumers can opt-out per instance if needed:

variable "prevent_destroy_bucket" {
  type    = bool
  default = true
}

lifecycle {
  prevent_destroy = var.prevent_destroy_bucket
}

Alternatively, document this as a breaking change in release notes and require explicit code changes from consumers to upgrade.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e219f12 and 8159861.

📒 Files selected for processing (15)
  • infrastructure/README.md (6 hunks)
  • infrastructure/backend/.terraform.lock.hcl (1 hunks)
  • infrastructure/backend/main.tf (4 hunks)
  • infrastructure/modules/ecs/main.tf (10 hunks)
  • infrastructure/modules/ecs/variables.tf (2 hunks)
  • infrastructure/modules/storage/main.tf (3 hunks)
  • infrastructure/modules/storage/modules/s3-bucket/main.tf (1 hunks)
  • infrastructure/modules/storage/modules/s3-bucket/variables.tf (2 hunks)
  • infrastructure/modules/storage/outputs.tf (1 hunks)
  • infrastructure/modules/storage/variables.tf (1 hunks)
  • infrastructure/staging/backend.tf (1 hunks)
  • infrastructure/staging/main.tf (2 hunks)
  • infrastructure/staging/terraform.tfbackend.example (1 hunks)
  • infrastructure/staging/terraform.tfvars.example (0 hunks)
  • infrastructure/staging/variables.tf (2 hunks)
💤 Files with no reviewable changes (1)
  • infrastructure/staging/terraform.tfvars.example
🧰 Additional context used
🧠 Learnings (7)
📚 Learning: 2025-11-08T11:16:25.725Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2551
File: infrastructure/modules/parameters/main.tf:1-191
Timestamp: 2025-11-08T11:16:25.725Z
Learning: The parameters module in infrastructure/modules/parameters/ is currently configured for staging environment only. The `configuration` and `settings_module` variables default to "Staging" and "settings.staging" respectively, and users can update parameter values via the AWS Parameter Store console. The lifecycle.ignore_changes blocks on these parameters support manual console updates without Terraform reverting them.

Applied to files:

  • infrastructure/staging/backend.tf
  • infrastructure/staging/terraform.tfbackend.example
  • infrastructure/staging/main.tf
📚 Learning: 2025-11-23T11:52:15.463Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2699
File: backend/wsgi.py:13-13
Timestamp: 2025-11-23T11:52:15.463Z
Learning: In the OWASP Nest project, the SSM parameter store setup in backend/wsgi.py (using boto3 to fetch parameters from AWS Systems Manager) is designed for staging and production environments, not just for testing purposes.

Applied to files:

  • infrastructure/staging/backend.tf
  • infrastructure/README.md
  • infrastructure/staging/terraform.tfbackend.example
📚 Learning: 2025-11-27T19:38:23.956Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2743
File: infrastructure/modules/storage/main.tf:1-9
Timestamp: 2025-11-27T19:38:23.956Z
Learning: In the OWASP/Nest infrastructure code (infrastructure/ directory), the team prefers exact version pinning for Terraform (e.g., "1.14.0") and AWS provider (e.g., "6.22.0") over semantic versioning constraints (e.g., "~> 1.14" or "~> 6.22"). This is a deliberate choice for maximum reproducibility and explicit control over version updates in their testing environment.

Applied to files:

  • infrastructure/staging/backend.tf
  • infrastructure/staging/terraform.tfbackend.example
  • infrastructure/backend/.terraform.lock.hcl
  • infrastructure/backend/main.tf
  • infrastructure/modules/storage/main.tf
📚 Learning: 2025-10-17T15:25:53.713Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/modules/database/main.tf:22-60
Timestamp: 2025-10-17T15:25:53.713Z
Learning: The infrastructure code in the `infrastructure/` directory is intended for quick testing purposes only, not production-grade deployment. Production-grade security hardening controls (such as IAM database authentication, deletion protection, Performance Insights KMS encryption) are not required for this testing infrastructure.

Applied to files:

  • infrastructure/README.md
📚 Learning: 2025-10-17T15:25:34.963Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/modules/cache/main.tf:30-30
Timestamp: 2025-10-17T15:25:34.963Z
Learning: The infrastructure/Terraform code in the OWASP Nest repository under the `infrastructure/` directory is intended for quick testing purposes only, not for production deployment.

Applied to files:

  • infrastructure/README.md
📚 Learning: 2025-10-23T19:22:23.811Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/main.tf:0-0
Timestamp: 2025-10-23T19:22:23.811Z
Learning: In Zappa-based serverless deployments, Lambda functions and IAM execution roles are managed by Zappa at application deployment time (via `zappa deploy`/`zappa update`), not via Terraform. Terraform provisions the supporting infrastructure (VPC, RDS, S3, security groups, RDS Proxy, Secrets Manager), while Zappa handles the Lambda orchestration layer.

Applied to files:

  • infrastructure/README.md
  • infrastructure/staging/variables.tf
  • infrastructure/modules/storage/main.tf
📚 Learning: 2025-10-17T15:25:55.624Z
Learnt from: rudransh-shrivastava
Repo: OWASP/Nest PR: 2431
File: infrastructure/providers.tf:1-3
Timestamp: 2025-10-17T15:25:55.624Z
Learning: The infrastructure code in the OWASP/Nest repository (infrastructure/ directory) is intended for quick testing purposes only, not for production deployment.

Applied to files:

  • infrastructure/README.md
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Run frontend e2e tests
  • GitHub Check: Run frontend unit tests
🔇 Additional comments (19)
infrastructure/backend/.terraform.lock.hcl (1)

27-45: Random provider lock file entry looks good.

The exact version pinning (3.7.2) aligns with your infrastructure standards. Since this is auto-maintained by Terraform, no manual changes needed here.

infrastructure/backend/main.tf (3)

53-55: Random suffix for bucket naming is a solid approach.

The 4-byte random suffix (8 hex characters) effectively prevents S3 bucket name collisions in shared AWS accounts. Since Terraform stores the random_id resource in state, the suffix will persist across apply cycles.


69-94: Verify prevent_destroy lifecycle is operationally acceptable.

Adding prevent_destroy = true to state, logs, and state_lock resources is excellent for protecting critical infrastructure. However, this creates a strong safeguard that requires manual intervention (terraform state rm or setting prevent_destroy = false) before any destruction is possible.

Ensure your team is aware of this requirement and that the README guidance (line 210) is highlighted during deployment. This is especially important if team members are unfamiliar with Terraform lifecycle rules.


138-147: Verify S3 Object Lock retention policy meets operational requirements.

The 30-day GOVERNANCE retention is appropriate for testing infrastructure, and GOVERNANCE mode (rather than COMPLIANCE) allows flexibility if needed. However, confirm:

  1. Does 30 days align with your state backup retention policy?
  2. Is GOVERNANCE mode (deletable with s3:BypassGovernanceRetention permission) the right choice, or should this be COMPLIANCE (immutable)?
  3. Object Lock applies only to objects created after this configuration is applied—existing state versions may not be protected.
infrastructure/staging/backend.tf (1)

2-4: Backend configuration split is clean and follows Terraform best practices.

Separating code-level defaults (encrypt, key) from environment-specific config (bucket, region) in terraform.tfbackend makes it easy to parameterize across environments without code changes. Users must remember the -backend-config flag during terraform init, which is documented in the README.

infrastructure/modules/storage/variables.tf (1)

12-12: Variable renames are breaking changes—verify all consumers are updated.

The renames fixtures_s3_bucket → fixtures_bucket_name and zappa_s3_bucket → zappa_bucket_name are semantic improvements but constitute breaking changes for any code calling the storage module.

Verify that all consumers (e.g., infrastructure/staging/main.tf) have been updated to use the new variable names. If this PR includes multiple modules across the infrastructure, ensure consistency across the entire tree.

Also applies to: 22-22

infrastructure/modules/storage/outputs.tf (1)

11-19: Verify consistency in bucket output references.

Line 13 uses module.fixtures_bucket.bucket.id while line 18 uses module.zappa_bucket.bucket (no .id suffix). These should be consistent—either both should use .id or both should omit it, depending on what the s3-bucket module outputs. Check the s3-bucket module's outputs definition to confirm whether this inconsistency is intentional or requires correction.

infrastructure/staging/variables.tf (1)

84-88: Variable renames improve consistency and clarity.

Renaming from fixtures_s3_bucketfixtures_bucket_name and zappa_s3_bucketzappa_bucket_name removes redundant qualifiers and improves naming consistency across the infrastructure module. The force_destroy_bucket removal is a good safety improvement.

Also applies to: 138-142

infrastructure/modules/ecs/variables.tf (2)

56-66: Verify the resource requirement increase for load_data_task.

The CPU allocation doubles from 256 to 512, and memory doubles from 2048 to 4096. While this change may be necessary to support the boto3-based data loading approach or handle larger fixture datasets, the PR objectives don't explain the rationale. Please clarify whether this increase is:

  • Required for boto3 operations or expected workload characteristics
  • Temporary tuning or a permanent requirement
  • Impact on deployment costs

38-42: Image tag configuration supports PR objective.

The new image_tag variable with a "latest" default enables configurable Docker image versions for ECS tasks, addressing the hardcoded tag removal objective.

infrastructure/staging/main.tf (1)

54-66: Module wiring requires verification of storage module outputs.

The staging module correctly wires the renamed variables (fixtures_bucket_name, zappa_bucket_name) to the storage module, and passes module.storage.fixtures_s3_bucket_name to the ECS module. However, the storage module's outputs.tf file is not provided for review. Please verify:

  1. The output fixtures_s3_bucket_name exists in infrastructure/modules/storage/outputs.tf
  2. The output includes the random suffix (e.g., "${aws_s3_bucket.fixtures_bucket.id}" or equivalent)
  3. This ensures the ECS module receives the actual bucket name with the suffix for boto3 S3 operations

Also applies to: 107-115

infrastructure/modules/storage/modules/s3-bucket/variables.tf (1)

1-5: Documentation updates improve clarity.

The updated descriptions for abort_incomplete_multipart_upload_days and noncurrent_version_expiration_days are more explicit and aligned with AWS S3 lifecycle terminology.

Also applies to: 12-16

infrastructure/modules/storage/main.tf (1)

8-29: Random suffix implementation ensures bucket name uniqueness.

The random provider and random_id resource with 4-byte length provide sufficient entropy to prevent bucket naming collisions across deployments. The suffix is consistently applied to both fixtures and zappa bucket names, and the IAM policy ARN correctly references the suffixed bucket name.

infrastructure/modules/ecs/main.tf (6)

19-39: ECR lifecycle policy correctly implements untagged image cleanup.

The lifecycle policy automates removal of untagged images after 7 days, addressing the PR objective to remove hardcoded image tag reliance. The configuration is valid and non-destructive to tagged images.


85-118: Custom ECS task execution policy properly replaces AWS-managed policy.

The custom policy grants least-privilege permissions for ECR registry access and CloudWatch Logs, replacing the AWS-managed AmazonEC2ContainerRegistryPowerUser policy. The policy correctly scopes:

  • ecr:GetAuthorizationToken to "*" (required for registry authentication)
  • Image operations (BatchCheckLayerAvailability, BatchGetImage, GetDownloadUrlForLayer) to the ECR repository ARN
  • Log operations to the project-specific log group pattern

171-198: Custom EventBridge policy implements least-privilege design.

The policy grants EventBridge only the necessary permissions to run ECS tasks and pass roles. The ecs:RunTask action includes a cluster ARN condition, and iam:PassRole is scoped to only the required execution and task roles. This replaces reliance on overly-permissive AWS-managed policies.


295-325: boto3 S3 download correctly implements objective and enables S3 access.

The load_data_task replaces AWS CLI commands with a Python boto3 script, achieving the PR objective. The task correctly:

  • Sets task_role_arn to ecs_task_role (Line 324), enabling IAM-based S3 access
  • Uses var.fixtures_bucket_name (which includes the random suffix from the storage module output)
  • Relies on the attached fixtures_read_only_policy_arn for S3:GetObject permissions

Verification needed: Confirm that fixtures_read_only_policy (from the storage module) includes s3:GetObject permission on the suffixed bucket. This should be verified in infrastructure/modules/storage/main.tf lines 15-25 to ensure the boto3 download succeeds.


217-217: All ECS tasks consistently use configurable image_tag variable.

Every task module (sync_data_task, owasp_update_project_health_metrics_task, owasp_update_project_health_scores_task, migrate_task, load_data_task, index_data_task) correctly references var.image_tag, enabling configurable Docker image versions across all ECS tasks.

Also applies to: 246-246, 267-267, 287-287, 318-318, 338-338


323-325: Task role assignment is correctly scoped.

Only load_data_task includes task_role_arn to enable S3 bucket access for boto3 file downloads. Other tasks do not require IAM permissions beyond execution role capabilities, so omitting task_role_arn is appropriate.

@rudransh-shrivastava rudransh-shrivastava marked this pull request as ready for review December 2, 2025 15:35
@rudransh-shrivastava rudransh-shrivastava force-pushed the feature/nest-zappa-migration-ecs-improvement branch from 8159861 to da6cb9a Compare December 3, 2025 13:53
@github-actions github-actions bot removed the docs Improvements or additions to documentation label Dec 3, 2025
Copy link
Collaborator

@arkid15r arkid15r left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@arkid15r arkid15r enabled auto-merge (squash) December 5, 2025 06:09
@rudransh-shrivastava rudransh-shrivastava force-pushed the feature/nest-zappa-migration-ecs-improvement branch from da6cb9a to 0409690 Compare December 5, 2025 06:17
@sonarqubecloud
Copy link

sonarqubecloud bot commented Dec 5, 2025

@arkid15r arkid15r merged commit 358a811 into OWASP:feature/nest-zappa-migration Dec 5, 2025
26 checks passed
@rudransh-shrivastava rudransh-shrivastava mentioned this pull request Dec 5, 2025
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Improve ecs Module

2 participants