Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -171,7 +171,7 @@ See [Runbooks](./runbooks/overview) for step-by-step guides for specific inciden
**Goal:** Confirm the fix actually worked.

- Verify immediately after deployment
- Monitor for at least a week
- Monitor based on residual risk, blast radius, and incident type
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change improves variability, but slightly downgrades a simplicity, feels like some baseline date should be specified here, and teams can re-adjust

- Consider adding new alerts or test cases
- Document what monitoring is now in place

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,17 @@

Review and adapt these pages for your own internal incident response documentation.

This section is different from the broader [Incident Management](/incident-management/overview) guidance:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section should describe "Using This Template", but not point out differences, as it dilutes the purpose. Any other links should live under "References" section or smth like that


- **[Incident Management](/incident-management/overview)** explain concepts and practices
- **[Incident Response Templates](/incident-management/incident-response-template/overview)** are meant to be copied, customized, and used internally

Check failure on line 91 in docs/pages/incident-management/incident-response-template/overview.mdx

View workflow job for this annotation

GitHub Actions / lint

Line length

docs/pages/incident-management/incident-response-template/overview.mdx:91:121 MD013/line-length Line length [Expected: 120; Actual: 149] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md013.md

Within this template section:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't really add anything to "Using This Template" topic

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It also duplicates section below


- **[Policy](/incident-management/incident-response-template/incident-response-policy) / [roles and staffing](/incident-management/incident-response-template/roles-and-staffing) / [communications](/incident-management/incident-response-template/communications) / [contacts](/incident-management/incident-response-template/contacts)** define your operating model

Check failure on line 95 in docs/pages/incident-management/incident-response-template/overview.mdx

View workflow job for this annotation

GitHub Actions / lint

Line length

docs/pages/incident-management/incident-response-template/overview.mdx:95:121 MD013/line-length Line length [Expected: 120; Actual: 361] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md013.md
- **[Templates](/incident-management/incident-response-template/templates/overview)** are blank working documents to fill out during or after incidents

Check failure on line 96 in docs/pages/incident-management/incident-response-template/overview.mdx

View workflow job for this annotation

GitHub Actions / lint

Line length

docs/pages/incident-management/incident-response-template/overview.mdx:96:121 MD013/line-length Line length [Expected: 120; Actual: 151] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md013.md
- **[Runbooks](/incident-management/incident-response-template/runbooks/overview)** are scenario-specific response procedures

### What's Included

| Document | Purpose |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -180,9 +180,8 @@
| | | |
| | | |

These people should be reachable 24/7 for critical incidents. Consider:

There should be a 24/7 escalation path to these people for critical incidents. Consider:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was it changed to escalation path? Not sure if it makes a lot of sense

- Founders / C-level

Check failure on line 184 in docs/pages/incident-management/incident-response-template/roles-and-staffing.mdx

View workflow job for this annotation

GitHub Actions / lint

Lists should be surrounded by blank lines

docs/pages/incident-management/incident-response-template/roles-and-staffing.mdx:184 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "- Founders / C-level"] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md032.md
- Security Lead
- Engineering Lead
- Legal (for incidents with legal implications)
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Runbook: Build Pipeline Compromise | Security Alliance"
description: "Stub runbook. Customize with your CI/CD platform and procedures."
description: "Example runbook for CI/CD compromise. Review and customize for your platform, release process, and trust boundaries before use."
tags:
- Security Specialist
- Operations & Strategy
Expand All @@ -21,7 +21,7 @@
<TagList tags={frontmatter.tags} />
<AttributionList contributors={frontmatter.contributors} />

> **Stub runbook.** Customize with your CI/CD platform and procedures.
> **This is an example runbook.** Review and customize for your CI/CD platform, artifact flow, deployment model, and approval process before use.

Check failure on line 24 in docs/pages/incident-management/incident-response-template/runbooks/build-pipeline-compromise.mdx

View workflow job for this annotation

GitHub Actions / lint

Line length

docs/pages/incident-management/incident-response-template/runbooks/build-pipeline-compromise.mdx:24:121 MD013/line-length Line length [Expected: 120; Actual: 145] https://github.com/DavidAnson/markdownlint/blob/v0.38.0/doc/md013.md

## Quick Reference

Expand All @@ -36,39 +36,149 @@

### Symptoms

- [ ] Unexpected code in deployed artifacts
- [ ] CI/CD configuration changed without approval
- [ ] Secrets accessed or exfiltrated
- [ ] Unauthorized workflow runs
- [ ] Unexpected workflow runs or releases
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These probably should be list items, as it's a bit unclear what is the purpose of a checklists here. Will team need to click them through? Why? I see that they were checklists before, it probably slipped through a previous review iteration

- [ ] CI/CD configuration changed without expected approval
- [ ] Secrets accessed, exported, or rotated unexpectedly
- [ ] Build artifacts differ from expected source or prior reproducible output
- [ ] Deployments reference an unexpected commit, artifact, or builder identity

### Confirm Compromise
### Likely Scope Questions
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though these questions are meaningful, it's a bit unclear how do they align with the purpose of this document? Is that a step to follow? Who should follow? Why do they go before Immediate actions?


- Is this limited to CI configuration, or were artifacts actually produced from a compromised pipeline?
- Did the pipeline have deploy permissions, signing authority, or production credentials?
- Were any releases, containers, frontend bundles, or packages published during the exposure window?

### Differentiation
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feels like this section is excessive in runbook, but belongs to some educational material/policy


- Unauthorized code merged without pipeline abuse may be a repository compromise first
- Malicious package updates without CI tampering may be a dependency incident first
- A bad deployment from a legitimate commit may be an operational failure rather than compromise

- Review CI/CD audit logs
- Compare build artifacts to source
- Check for config changes in CI/CD platform

## Immediate Actions

1. [ ] Disable compromised pipelines
2. [ ] Rotate all secrets and tokens
3. [ ] Take down potentially compromised deployments
4. [ ] Audit recent builds and deployments
### Step 1: Freeze the pipeline
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't say anything regarding keys revocation/rotation, as some keys may be used to push & approve


**Why:** Stop additional malicious builds, releases, or secret access.

- [ ] Disable affected workflows/pipelines
- [ ] Revoke or pause auto-deploy jobs
- [ ] Block manual approvals until scope is understood

### Step 2: Preserve evidence
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is too excessive for "Immediate Actions" most of these evidences can be collected later, it's too inefficient to do that during an incident itself, when we need to limit a damage as fast as we can


**Why:** CI audit logs, workflow definitions, artifact metadata, and deployment history are easy to overwrite.

- [ ] Export CI audit logs
- [ ] Save workflow/job history for the exposure window
- [ ] Record affected commits, workflow files, artifact digests, release IDs, and deployment targets
- [ ] Preserve runner details if self-hosted runners were involved

### Step 3: Rotate credentials by blast radius
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rotation usually takes much longer than just revocation, credentials must be revoked asap and never rotated right away, as without clear root cause identification this will just lead to compromise of rotated keys


**Why:** Pipeline compromise often becomes credential compromise.

Prioritize rotation of:
- [ ] CI platform tokens
- [ ] cloud deploy credentials
- [ ] package registry tokens
- [ ] artifact signing keys or release credentials
- [ ] secrets available to self-hosted runners

### Step 4: Stop trust in recent outputs
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this as a separate section feels a bit excessive, as it's more or less obvious step and can be described simpler


**Why:** Do not assume recent artifacts or deployments are clean.

- [ ] Identify all artifacts built during the exposure window
- [ ] Identify all deployments and releases from those artifacts
- [ ] Quarantine or withdraw suspicious outputs where possible

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Section regarding internal & external comms is missing, but it's the uttermost important

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to stop/block/disable all user-facing/sensitive apps before the end of investigation and a cleanup


## Investigation

### Key Questions

- [ ] What was the initial access path: CI platform, repository permissions, runner compromise, or stolen token?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who needs to answer these questions? do answers needs to be written here? Previous section was specific steps, this section is open-ended questions, why is that? these are still actions as a part of incident, shouldn't they be steps as well? Existing format is somehow mixing policy/runbook steps

- [ ] What permissions did the compromised pipeline actually have?
- [ ] Were secrets exposed only to logs/runtime, or used to publish or deploy?
- [ ] Which environments were reachable: build only, staging, production?
- [ ] Which outputs must now be treated as untrusted?

### Information to Gather

| Data | Source |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It could have been a checklist or just a comma-separated list

|------|--------|
| CI audit logs | CI/CD platform |
| workflow/config diffs | repository history |
| release/deployment history | CI/CD platform, cloud provider, registry |
| artifact digests / provenance | registry, signing system, artifact store |
| runner access and execution logs | runner host / CI platform |


## Containment and Recovery
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following these recovery steps without fixing the root cause first will just lead to re-contamination, and Fix/Mitigate step is missing is this guide


### Option A: Rebuild from a known-good commit using a clean pipeline

**When:** You can identify a trusted commit and re-establish a trusted build path.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assumption is too broad, as attacker could have gained access to other sensitive parts of a system

**Impact:** Release cadence slows, but trust is restored more safely.

1. Stand up a clean pipeline or isolated builder
2. Re-verify repository state and workflow definitions
3. Rebuild from a known-good commit
4. Compare output metadata against expected source and release intent
5. Redeploy only from the rebuilt trusted output

### Option B: Roll back to last known-good release

**When:** A trusted prior release exists and rollback is operationally safe.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above

**Impact:** Feature loss or temporary service degradation may occur.

1. Identify the last trusted artifact and deployment
2. Roll back affected services
3. Verify rollback success in production
4. Continue investigation before resuming normal release flow

### Option C: Keep service paused until trust is re-established
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is basically is a single valid option to do, all other options are too dangerous/risky to do until issue is resolved/fixed


**When:** You cannot distinguish clean from compromised outputs.
**Impact:** Operational disruption, but lower risk of serving malicious artifacts.

1. Pause releases/deployments
2. Communicate impact internally and externally as needed
3. Rebuild trust in source, pipeline, credentials, and artifacts before resuming


## Verification Before Resuming
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These should be steps, as it is a part of runbook


Do not resume normal delivery until you can answer these clearly:

- [ ] The initial access path is understood well enough to prevent immediate recurrence
- [ ] Compromised credentials have been rotated or invalidated
- [ ] Untrusted artifacts and releases have been identified and handled
- [ ] Build and deploy permissions are re-scoped appropriately
- [ ] A known-good artifact has been rebuilt or a known-good release has been restored


## Hardening After the Incident
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't belong to runbook


- [ ] Separate build permissions from deploy permissions
- [ ] Require stronger approval controls for workflow and release changes
- [ ] Use short-lived credentials where possible
- [ ] Reduce secret exposure to only the jobs that need them
- [ ] Restrict or harden self-hosted runners if used
- [ ] Improve artifact provenance, signing, and release verification


## Mitigation
## Escalation

1. [ ] Audit CI/CD configuration for unauthorized changes
2. [ ] Rebuild from trusted commit using clean pipeline
3. [ ] Implement additional approval requirements
4. [ ] Review and restrict pipeline permissions
Escalate immediately if:
- [ ] production deployments may have been modified
- [ ] signing keys or release credentials may be exposed
- [ ] user-facing artifacts may have been maliciously published
- [ ] the pipeline had access to broader cloud or infrastructure credentials

## Prevention
See [Contacts](../contacts) and [Incident Response Policy](../incident-response-policy).

- [ ] Require approval for CI/CD config changes
- [ ] Use short-lived credentials
- [ ] Implement branch protection
- [ ] Audit pipeline access regularly
- [ ] Use signed commits
- [ ] Separate build and deploy permissions

## Related

Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More or less same concerns/problems with updated runbooks as in one above

title: "Runbook: Dependency Attack | Security Alliance"
description: "Stub runbook. Customize with your package management and build procedures."
description: "Example runbook for dependency compromise. Review and customize for your package manager, build flow, and release process before use."
tags:
- Security Specialist
- Operations & Strategy
Expand All @@ -21,7 +21,7 @@ import { TagList, AttributionList, TagProvider, TagFilter, ContributeFooter } fr
<TagList tags={frontmatter.tags} />
<AttributionList contributors={frontmatter.contributors} />

> **Stub runbook.** Customize with your package management and build procedures.
> **This is an example runbook.** Review and customize for your package manager, dependency policy, and build procedures before use.

## Quick Reference

Expand All @@ -38,41 +38,128 @@ import { TagList, AttributionList, TagProvider, TagFilter, ContributeFooter } fr

- [ ] Unexpected behavior after dependency update
- [ ] Security advisory for a package you use
- [ ] Malicious code found in node_modules or similar
- [ ] Lockfile changes you didn't make
- [ ] Malicious code found in installed dependencies or build output
- [ ] Lockfile changes you did not expect
- [ ] Frontend bundle or released artifact changed more than the source diff would explain
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dependency can be in different parts of a system, not only frontend


### Confirm Dependency Attack
### Scope Questions

```
npm audit
# or
yarn audit
```
- Did the malicious dependency reach production, or is this still limited to source/build environments?
- Was the dependency executed only during build time, or also in user-facing runtime code?
- Did the dependency have access to CI secrets, registry tokens, or signing material?
- Were any artifacts published or deployed during the exposure window?

Check for recent lockfile changes in git history.
### Differentiation

- If the compromised package was only introduced through your CI/build path, also review [Build Pipeline Compromise](./build-pipeline-compromise)
- If the issue is malicious code being served to users, also review [Frontend Compromise](./frontend-compromise)
- If this is only an advisory on an unused code path, severity may differ from an actively weaponized package

## Immediate Actions

1. [ ] Take down site to stop serving malicious code
2. [ ] Identify the malicious package
3. [ ] Pin dependencies to last known good version
4. [ ] Rebuild from clean environment
### Step 1: Freeze releases that rely on the affected dependency

**Why:** Prevent additional malicious artifacts from being built or published.

- [ ] Pause affected builds and deployments
- [ ] Stop package publishing if your pipeline republishes downstream artifacts
- [ ] Block dependency auto-update jobs until scope is understood

### Step 2: Identify the exact bad package/version path

**Why:** You need a precise package, version, and introduction path before you can cleanly contain it.

- [ ] Record package name and version
- [ ] Identify whether it is direct or transitive
- [ ] Identify which commits or lockfile changes introduced it
- [ ] Determine whether multiple repos or services consume it

### Step 3: Stop trusting recent outputs

**Why:** If malicious code ran during build or packaging, recent artifacts may be untrusted even if source looks clean.

- [ ] Identify builds created during the exposure window
- [ ] Identify deployed versions and published packages that include the dependency
- [ ] Quarantine or withdraw suspicious outputs where possible

### Step 4: Preserve evidence

**Why:** Registry state, lockfiles, and CI evidence can change quickly.

- [ ] Save lockfile and manifest versions from affected builds
- [ ] Save CI logs and artifact metadata
- [ ] Record package tarball hashes, integrity values, and registry metadata where available

## Investigation

### Key Questions

- [ ] Was the package actually malicious, or just vulnerable?
- [ ] Was the malicious code executed in your environment?
- [ ] Did it affect build-time systems, runtime users, or both?
- [ ] Were secrets, signing credentials, or deploy tokens exposed?
- [ ] Which artifacts or releases must now be treated as untrusted?

### Information to Gather

| Data | Source |
|------|--------|
| package/version metadata | registry, lockfile, manifest |
| introduction point | git history, dependency bot PRs |
| build logs | CI/CD platform |
| deployed artifacts using bad dependency | release history, artifact store |
| advisories / upstream incident details | package registry, maintainer advisories |


## Containment and Recovery

### Option A: Revert to the last known-good dependency state

**When:** A trusted prior lockfile/package set exists.
**Impact:** Fastest path in many cases.

1. Revert the dependency or lockfile to a known-good state
2. Rebuild from a clean environment
3. Verify the bad package is no longer present in the output
4. Redeploy only after validating the resulting artifact

### Option B: Replace or remove the affected package

**When:** A clean replacement exists, or the package is non-essential.
**Impact:** May require code changes or degraded functionality.

1. Remove or replace the affected package
2. Regenerate lockfile carefully
3. Rebuild from a clean environment
4. Validate both functionality and resulting dependency tree

### Option C: Keep service paused until trust is re-established

**When:** You cannot clearly determine which artifacts are clean.
**Impact:** Operational disruption, but lower chance of serving malicious code.

1. Pause affected deployments
2. Reconstruct a trusted dependency set
3. Rebuild from a clean environment
4. Resume only after verifying outputs and rotation of any exposed credentials


## Mitigation
## Verification Before Resuming

1. [ ] Remove or replace malicious package
2. [ ] Update lockfile with known good versions
3. [ ] Rebuild using `npm ci` or `yarn --frozen-lockfile`
4. [ ] Redeploy verified build
- [ ] The malicious or suspect dependency is removed from the build
- [ ] A clean rebuild has been produced from a trusted environment
- [ ] Any exposed credentials have been rotated
- [ ] Suspicious builds, releases, or packages have been identified and handled
- [ ] The resulting artifact has been checked against expected changes

## Prevention

- [ ] Use lockfiles and commit them
- [ ] Use `npm ci` / `yarn --frozen-lockfile` in CI
- [ ] Regularly audit dependencies
- [ ] Consider using a private registry
- [ ] Pin exact versions for critical packages
- [ ] Review dependency changes in PRs
- [ ] Use deterministic install commands in CI (`npm ci`, `pnpm install`, `yarn install --immutable`, etc.)
- [ ] Review dependency and lockfile changes in PRs
- [ ] Restrict who can approve dependency update automation
- [ ] Audit critical dependencies and minimize unnecessary package surface area
- [ ] Treat build-time dependencies as part of the production attack surface

## Related

Expand Down
Loading
Loading