Skip to content

Conversation

@livermush
Copy link

@livermush livermush commented Nov 25, 2025

This PR fixes three critical authentication issues that prevented successful CLI authentication when using Cognito User Pools with Direct STS federation.

Problems Fixed

  1. Cognito User Pool Client Secret Issue

Symptom: Token exchange failed with error:
Error: Token exchange failed: {"error":"invalid_client","error_description":"invalid_client_secret"}

Root Cause: CloudFormation template created Cognito client with a client secret by default. CLI applications are public clients and must use PKCE (Proof Key for Code
Exchange) instead of client secrets.

Fix: Added GenerateSecret: false to cognito-user-pool-setup.yaml UserPoolClient resource.

  1. Path Expansion in Claude Code Settings

Symptom: Claude Code failed to start with error:
error: otelHeadersHelper did not return a valid value

Root Cause: The install scripts used ~ for helper paths, but Claude Code doesn't expand ~ for the otelHeadersHelper setting (though it does for awsAuthRefresh).

Fix:

  • Changed install scripts to use $HOME instead of ~ so shell expands to absolute path
  • Added CREDENTIAL_PROCESS_PATH placeholder for consistency with OTEL_HELPER_PATH
  • Both helper paths now resolve to absolute paths during installation
  1. Credential Process Infinite Recursion

Symptom: Authentication hung with recursive error:
Error: Failed to get AWS credentials via Direct STS: Error when retrieving credentials from custom-process:

Root Cause: When using Direct STS federation, the credential-process created an STS client without clearing the AWS_PROFILE environment variable. Boto3 saw the profile,
which pointed back to the credential-process, causing infinite recursion.

Fix: Added environment variable clearing logic to get_aws_credentials_direct() method, matching the pattern already used in get_aws_credentials_cognito().

Changes Made

Files Modified

  1. deployment/infrastructure/cognito-user-pool-setup.yaml
    - Added GenerateSecret: false to UserPoolClient
  2. source/claude_code_with_bedrock/cli/commands/package.py
    - Changed ~ to $HOME in install.sh sed command
    - Changed ~ to $HOME in install.bat PowerShell command
    - Added CREDENTIAL_PROCESS_PATH placeholder
    - Updated placeholder replacement logic for both helpers
  3. source/credential_provider/main.py
    - Added environment variable clearing before STS client creation
    - Added finally block to restore environment variables
    - Matches existing pattern in Cognito Identity Pool path

Testing

Before these fixes:

  • ❌ Browser authentication opened but token exchange failed
  • ❌ Claude Code failed to start with OTEL helper error
  • ❌ Credential process hung in infinite loop

After these fixes:

  • ✅ Browser authentication completes successfully
  • ✅ Token exchange with PKCE works correctly
  • ✅ Claude Code starts and can authenticate
  • ✅ AWS CLI with AWS_PROFILE=ClaudeCode works correctly
  • ✅ 12-hour sessions with Direct STS federation

Test Steps:

1. Deploy fresh Cognito User Pool (creates public client)

cd source && poetry run ccwb init
poetry run ccwb deploy

2. Package and install

poetry run ccwb package
cd dist && ./install.sh

3. Test authentication

AWS_PROFILE=ClaudeCode aws sts get-caller-identity

Should show assumed-role/BedrockCognitoFederatedRole

4. Test Claude Code integration

claude

Should start successfully and authenticate

Impact

  • Critical fix for all users deploying with Cognito User Pools
  • Required for CLI authentication to work at all
  • No breaking changes - fixes broken functionality

Related Issues

Discovered during GovCloud partition support testing (#PR_NUMBER).

This commit fixes three critical authentication issues:

1. Cognito User Pool client secret requirement
   - Added GenerateSecret: false to cognito-user-pool-setup.yaml
   - CLI apps are public clients and use PKCE instead of client secrets
   - Previous behavior: Token exchange failed with "invalid_client_secret"

2. Path expansion in Claude Code settings
   - Changed ~ to $HOME in install.sh and install.bat scripts
   - Claude Code doesn't expand ~ for otelHeadersHelper setting
   - Added __CREDENTIAL_PROCESS_PATH__ placeholder for consistency
   - Both helpers now use absolute paths after installation

3. Credential process recursive loop with Direct STS
   - Added environment variable clearing in get_aws_credentials_direct()
   - Prevents infinite recursion when AWS_PROFILE is set
   - Matches existing pattern in get_aws_credentials_cognito()
   - Previous behavior: Credential process called itself infinitely

All three issues prevented successful authentication with Cognito User Pool
and Direct STS federation. Authentication now works correctly for CLI and
Claude Code integration.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <[email protected]>
@livermush
Copy link
Author

Testing Summary: Environment Variable Clearing Fix

Understanding Your Use Case

We interpreted your concern as this workflow:

  1. Export AWS credentials in your terminal (your admin/resource credentials):

    export AWS_ACCESS_KEY_ID="AKIA..."
    export AWS_SECRET_ACCESS_KEY="..."
  2. Set AWS_PROFILE=ClaudeCode (for Cognito authentication to Bedrock):

    export AWS_PROFILE=ClaudeCode
  3. Start Claude Code - authenticates to Bedrock via Cognito

  4. Inside Claude Code, run AWS CLI commands that should use your exported credentials (not the ClaudeCode profile)

Your concern: Does the env clearing in credential-process wipe out your exported credentials, breaking Claude's ability to run AWS commands against your resources?


How We Tested

Test 1: Verify the recursion bug exists (without fix)

# Branch: fix/cognito-authentication-issues-no-env-clear
rm -f ~/.aws/credentials
AWS_PROFILE=ClaudeCode aws sts get-caller-identity --debug

Result: Hung for 43 seconds, then failed with nested error:

Error when retrieving credentials from custom-process:
  Error: Failed to get AWS credentials via Direct STS:
    Error when retrieving credentials from custom-process:

The error shows "custom-process" twice - proving the recursion.


Test 2: Verify the fix works

# Branch: fix/cognito-authentication-issues (with env clearing)
rm -f ~/.aws/credentials
AWS_PROFILE=ClaudeCode aws sts get-caller-identity

Result: Browser opened, authenticated, returned:

{
    "UserId": "AROAXTX6EUZAUCUEJCHZ6:claude-code-...",
    "Account": "523445249601",
    "Arn": "arn:aws:sts::523445249601:assumed-role/BedrockCognitoFederatedRole/..."
}

Test 3: Verify exported credentials survive credential-process call

export AWS_ACCESS_KEY_ID="AKIAXTX6EUZA4NG4GXME"
export AWS_SECRET_ACCESS_KEY="kIJ1io..."

echo "Before: AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID"
~/claude-code-with-bedrock/credential-process --profile ClaudeCode > /dev/null
echo "After: AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID"

aws sts get-caller-identity

Result:

Before: AWS_ACCESS_KEY_ID=AKIAXTX6EUZA4NG4GXME
After: AWS_ACCESS_KEY_ID=AKIAXTX6EUZA4NG4GXME
{
    "UserId": "AIDAXTX6EUZA3QRTKRUFN",
    "Account": "523445249601",
    "Arn": "arn:aws:iam::523445249601:user/cli-user"
}

Exported credentials unchanged. Still returns cli-user.


Test 4: Full Claude Code integration (with OTEL enabled)

export AWS_ACCESS_KEY_ID="AKIAXTX6EUZA4NG4GXME"
export AWS_SECRET_ACCESS_KEY="kIJ1io..."
export AWS_PROFILE=ClaudeCode
claude

Inside Claude Code:

> use the aws cli and list out the first three s3 buckets in my account

⏺ Bash(aws sts get-caller-identity)
  ⎿  "UserId": "AIDAXTX6EUZA3QRTKRUFN"    <-- cli-user (exported creds)
     "Arn": "arn:aws:iam::523445249601:user/cli-user"

⏺ Bash(aws s3 ls | head -3)
  ⎿  2024-10-21 a-doughai-bucket
     2024-09-18 a-epa-test-bucket
     2025-02-03 amazon-sagemaker-...

Result: Claude Code authenticated to Bedrock via Cognito, but CLI commands used the exported cli-user credentials.


Why The Change Is Safe

Process Isolation:

┌─────────────────────────────────────────────────────────┐
│  Parent Shell (your terminal)                           │
│  AWS_ACCESS_KEY_ID=AKIA...  ← NEVER CHANGES             │
│  AWS_SECRET_ACCESS_KEY=...  ← NEVER CHANGES             │
│  AWS_PROFILE=ClaudeCode                                 │
│                                                         │
│  ┌───────────────────────────────────────────────────┐  │
│  │  credential-process (subprocess)                   │  │
│  │  - Receives COPY of env vars                       │  │
│  │  - Clears AWS_* in its own memory                  │  │
│  │  - Calls STS AssumeRoleWithWebIdentity             │  │
│  │  - Returns credentials                             │  │
│  │  - Exits (changes discarded)                       │  │
│  └───────────────────────────────────────────────────┘  │
│                                                         │
│  Parent shell env vars UNCHANGED                        │
└─────────────────────────────────────────────────────────┘

When a subprocess modifies environment variables, those changes only exist in the subprocess's memory. The parent shell is never affected. This is fundamental Unix process isolation.

Additionally, the fix restores the env vars in a finally block before returning - though this is technically unnecessary since the subprocess exits anyway.


Risk of NOT Having This Change

Without the env clearing, the recursion bug occurs:

  1. User sets AWS_PROFILE=ClaudeCode
  2. AWS CLI calls credential-process --profile ClaudeCode
  3. Inside credential-process, boto3 creates STS client
  4. boto3 sees AWS_PROFILE=ClaudeCode still set
  5. boto3 tries to get credentials by calling credential-process again
  6. Infinite loop → timeout/failure

This breaks fresh authentication entirely when AWS_PROFILE is set - which is exactly how Claude Code is configured to run.


Summary

Scenario Without Fix With Fix
Fresh auth with AWS_PROFILE ❌ Hangs (recursion) ✅ Works
Exported creds + ClaudeCode profile ✅ Works ✅ Works
Your workflow (export creds → Claude Code → CLI commands) ❌ Fresh auth broken ✅ Everything works

The fix is required for basic functionality and does not affect your workflow.

@schuettc
Copy link
Contributor

schuettc commented Dec 1, 2025

32e1825

Cherry picked in the parts that hadn't already been fixed. Everything should be live now. Thanks!

@schuettc schuettc closed this Dec 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants