Skip to content

feat: Add AWS cloud infrastructure logs integration#858

Open
bhaskarvilles wants to merge 45 commits intocisagov:mainfrom
bhaskarvilles:Cloud-Infrastructure-Logs-Integration
Open

feat: Add AWS cloud infrastructure logs integration#858
bhaskarvilles wants to merge 45 commits intocisagov:mainfrom
bhaskarvilles:Cloud-Infrastructure-Logs-Integration

Conversation

@bhaskarvilles
Copy link
Copy Markdown

Description

Adds comprehensive cloud infrastructure log ingestion and analysis capabilities to Malcolm, enabling monitoring of hybrid cloud + on-prem environments.

Related Issue

Closes #232

What's New

  • AWS VPC Flow Logs parser with security event detection
  • AWS CloudTrail parser with threat detection
  • Automated S3 log collector script
  • Comprehensive documentation with setup guides
  • Configuration templates for easy deployment

Features

✅ VPC Flow Logs parsing (v2 format)
✅ CloudTrail API activity parsing
✅ Automated S3 log collection
✅ ECS field mapping
✅ GeoIP/ASN enrichment integration
✅ Security tagging (unauthorized access, high-risk actions, etc.)

Security Event Detection

VPC Flow Logs

  • rejected_traffic - Blocked connections
  • high_volume_transfer - Large data transfers (>10MB)
  • potential_port_scan - Multiple connection rejections

CloudTrail

  • unauthorized_access_attempt - AccessDenied errors
  • high_risk_action - Destructive operations (DeleteBucket, TerminateInstances, etc.)
  • root_account_usage - Root user activity
  • failed_authentication - Failed console logins

Testing

  • Tested VPC Flow Logs parser with sample data
  • Tested CloudTrail parser with sample events
  • Verified ECS field mappings
  • Confirmed integration with Malcolm's enrichment pipeline
  • Tested with live AWS VPC Flow Logs (pending AWS account access)
  • Tested with live CloudTrail logs (pending)
  • Created dashboards (in progress)

Documentation

Files Changed

File Path Insertions
CONTRIBUTION.md 96
config/cloud-logs.env.example 25
docs/cloud-logs-integration.md 380
logstash/pipelines/enrichment/15_cloud_logs_vpc_flow.conf 130
logstash/pipelines/enrichment/16_cloud_logs_cloudtrail.conf 160
shared/bin/aws_log_collector.py 236
shared/bin/cloud-logs-requirements.txt 1

Summary

  • Files changed: 7
  • Total insertions: 961

Impact

This contribution enables Malcolm to:

  • 🌐 Monitor hybrid cloud + on-prem environments
  • 🔒 Detect cloud security threats
  • 🔗 Correlate cloud API activity with network traffic
  • 📊 Provide unified analysis platform for security teams

Checklist

  • Code follows Malcolm's style guide
  • Documentation updated
  • Commit messages are descriptive
  • ECS field mappings used consistently
  • Unit tests added (next step)
  • Dashboards created (next step)

Next Steps

  1. Create OpenSearch dashboards for VPC Flow and CloudTrail
  2. Add unit tests for parsers
  3. Test with live AWS infrastructure logs
  4. Address maintainer feedback

Author: @bhaskarvilles
Branch: Cloud-Infrastructure-Logs-Integration

mmguero and others added 30 commits June 20, 2025 20:51
- Apply multiple enhancements to `clean-processed-folder.py` so that it runs fast
enough to keep up with the generation of log files in pipeline capture mode. These
changes increased the file processing rate by a factor of 100.
  - Preprocess the filebeat registry into a format for checking file presence using the
    `in` operator.
  - Replace regular expression pattern matching for mime file types with list searching.
  - Refactor running `fuser` with the subprocess module for increased speed.
- Treat zero-length files, which have no mime type, as eligible log files.
- Update logging to improve the ability to monitor script performance.
- Fix the search for Suricata log files which was excluding log files created in the
pipeline mode.
- Run `clean-processed-folder.py` every minute in order to minimize the risk of
overflowing the partition where the log files are stored especially in the pipeline
capture mode.
Fix log files not removed quickly enough
mmguero and others added 9 commits December 3, 2025 16:05
- Implement Logstash parser for VPC Flow Logs (v2 format)
- Implement Logstash parser for CloudTrail API activity logs
- Add Python script for automated S3 log collection
- Follow Malcolm's ECS field mapping patterns
- Integrate with existing GeoIP/ASN enrichment pipeline
- Add security event tagging (unauthorized access, high-risk actions)

Addresses cisagov#232
- Add comprehensive user documentation for AWS cloud logs
- Include setup instructions for VPC Flow Logs and CloudTrail
- Add troubleshooting guide and best practices
- Create environment configuration template
- Update documentation table of contents
- Add Python requirements for log collector

Related: cisagov#232
@mmguero mmguero self-assigned this Jan 19, 2026
@mmguero mmguero added logstash Relating to Malcolm's use of Logstash cloud Relating to deployment of Malcolm in the cloud and/or with Kubernetes labels Jan 19, 2026
@mmguero mmguero added this to Malcolm Jan 19, 2026
@mmguero mmguero moved this to Review in Malcolm Jan 19, 2026
@mmguero mmguero added this to the v26.02.0 milestone Jan 19, 2026
@mmguero
Copy link
Copy Markdown
Collaborator

mmguero commented Jan 19, 2026

Thanks, I will review this in the next week or so and get it merged for probably a February release.

- Add comprehensive unit tests for VPC Flow Logs parser (6 tests)
- Add comprehensive unit tests for CloudTrail parser (6 tests)
- Create AWS VPC Flow Logs Overview dashboard with 6 visualizations
- Create AWS CloudTrail Activity dashboard with 6 visualizations
- Add dashboard import guide with instructions

Tests cover:
- Successful parsing of log formats
- Security event detection (unauthorized access, high-risk actions)
- Protocol mapping and field extraction
- Cloud metadata validation

Dashboards include:
- Traffic/API activity timelines
- Success/failure ratios
- Top talkers/users/actions
- Protocol/security event breakdowns
- Geographic distribution maps

Related: cisagov#232
- Add AWS ELB/ALB access logs parser (Classic ELB and ALB support)
- Add AWS S3 access logs parser with sensitive file detection
- Add AWS Route 53 query logs parser with DNS tunneling/DGA detection
- Add Azure NSG Flow Logs parser with port scan detection
- Add Azure Activity Logs parser with high-risk operation monitoring

All parsers include:
- ECS field mapping
- Security event detection and tagging
- Integration with Malcolm's GeoIP/ASN enrichment
- Comprehensive threat detection (SQL injection, DGA, tunneling, etc.)

Updated log collector to support all new log types.

Related: cisagov#232
@mmguero
Copy link
Copy Markdown
Collaborator

mmguero commented Jan 21, 2026

Just wanted to let you know I saw your updates. It will probably be next week sometime before I'm able to carve out the time to review this, but know it's appreciated and we'll get them reviewed. I may end up moving the logstash filters to a separate (new) parse pipeline rather than putting them in "enrichment" but don't worry about it for now, that's easy to adjust. Cheers.

@bhaskarvilles
Copy link
Copy Markdown
Author

@mmguero Thanks for your reply, Just adding more enhancements, i will wait for your review and for the next release, planning to integrate Azure, Akamai and other cloud provider logs as well.

Parsers:
- AWS RDS: error, slow query (DDL/DML/SELECT classify), audit, PostgreSQL
- Azure App Gateway: access logs + WAF (OWASP rule group detection)

Dashboards (7 new):
- aws-elb-overview: requests, status codes, response time, top clients, SQLi events
- aws-s3-access-overview: ops over time, bytes, sensitive file/security events
- aws-route53-overview: queries, response codes, DNS threat events (tunneling/DGA)
- azure-nsg-overview: traffic timeline, allow/deny, protocols, port scans
- azure-activity-overview: operations, outcomes, top users, high-risk events
- azure-appgw-overview: (via cloud-unified) WAF blocks, OWASP rules
- cloud-unified-security: unified cross-provider overview with geo attack map

Alerting Rules (10 monitors):
- Mass unauthorized access attempts
- DNS tunneling / DGA
- S3 sensitive file access
- SQL injection (ELB + AppGW WAF)
- Azure high-risk resource deletion
- CloudTrail root account usage
- Azure WAF RCE/LFI attack
- RDS mass authentication failures
- Large data exfiltration (>1GB)
- Azure NSG mass port scanning

Related: cisagov#232
@mmguero
Copy link
Copy Markdown
Collaborator

mmguero commented Feb 23, 2026

I'm on site at one of our funding sources this week, but I plan on looking at this and getting it merged next week. Thanks for your patience!

@mmguero mmguero removed their assignment Mar 11, 2026
@mmguero mmguero modified the milestones: v26.03.0, v26.04.0 Mar 16, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cloud Relating to deployment of Malcolm in the cloud and/or with Kubernetes logstash Relating to Malcolm's use of Logstash

Projects

Status: Review

Development

Successfully merging this pull request may close these issues.

3 participants