Skip to main content

Command Palette

Search for a command to run...

AWS Secrets Manager for DevOps Engineers: Secure Secrets Management Explained

Published
11 min read
AWS Secrets Manager for DevOps Engineers: Secure Secrets Management Explained
V
Hi there! I’m a DevOps enthusiast, certified in AWS and Terraform, passionate about crafting innovative cloud solutions. From designing scalable CI/CD pipelines to deploying microservices on cloud platforms, I’ve immersed myself in transforming ideas into impactful technologies.

Introduction

It's Monday morning. You or your security team runs a routine scan of your GitHub repositories.

Alert: SECRET DETECTED IN COMMIT

Your heart sinks. Someone committed a .env file containing the production database password. The commit was made 3 months ago. GitHub's scrapers found it within minutes. The password has been in the wild for 90 days.

Your incident response:

  1. Immediately rotate the database password (15 minutes)

  2. Update password in 12 places:

    • 3 Lambda function environment variables

    • 2 ECS task definitions

    • 4 EC2 instances via SSH

    • CI/CD pipeline variables

    • 2 developer local .env files

  3. Redeploy everything (2 hours)

  4. Brief downtime during rotation (users affected)

  5. Post-mortem: "Never hardcode secrets again"

Total incident time: 4-6 hours.

Business impact: Moderate.

Stress level: Maximum.

This scenario repeats in organizations daily. AWS Secrets Manager exists to prevent it, provide secure storage, enable automatic rotation, support programmatic access, and provide complete audit trails for all credentials.

The Problem: Hardcoded Secrets and Manual Rotation

Challenge 1: Secrets Sprawl

Where secrets typically live:

The Secret Sprawl Problem (assumptions):

RDS database password exists in:
├── .env file (committed to git 3 months ago)
├── Lambda function environment variables
│   ├── UserServiceFunction
│   ├── OrderServiceFunction
│   └── PaymentServiceFunction
├── ECS task definition (multiple revisions)
├── EC2 user data scripts
├── CI/CD pipeline variables
│   ├── GitHub Actions secrets
│   └── Jenkins credentials
├── Developer local .env files (5 developers)
├── Documentation (wiki, Confluence)
└── Slack messages (someone pasted it for troubleshooting)

Count: Password exists in 15+ places
Rotation complexity: Must update all 15 places
Leak risk: Any one compromised = full database access

The math of manual rotation:

Rotate database password:

Time per location: 5 minutes (find, update, test)
Number of locations: 15
Total rotation time: 75 minutes

Deployment overhead:
• Lambda: Redeploy 3 functions
• ECS: New task definition, rolling update
• EC2: SSH into 4 servers, update config, restart app
• CI/CD: Update pipeline variables

Total downtime risk: High
Human error probability: 15-20%

Challenge 2: No Rotation Discipline

Typical password lifetime:

Without enforced rotation:
├── Database password set: Jan 2022
├── Last changed: Jan 2022
├── Current date: Feb 2026
└── Password age: 4 years

Risk factors:
• Multiple employees had access (some left company)
• Password possibly shared verbally/Slack
• May have been committed to git
• No audit trail of who accessed when

Industry recommendation (best practices):

  • Critical credentials: 30-90 day rotation

  • Compliance (PCI DSS): 90 days maximum

  • Best practice: 30 days

Challenge 3: No Audit Trail

Security audit questions:

Auditor: "Who accessed the production database password in Q4?"
You: "I don't know. It's in a .env file on the servers."

Auditor: "How do you know an unauthorized person didn't access it?"
You: "We don't."

Auditor: "When was it last rotated?"
You: "Not sure. Maybe 2 years ago?"

Result: Compliance violation, mandatory findings

Challenge 4: Cross-Environment Chaos

How credentials typically differ:

Development:
DB_HOST=dev-rds.amazonaws.com
DB_PASSWORD=dev123 (weak, acceptable for dev)

Staging:
DB_HOST=staging-rds.amazonaws.com  
DB_PASSWORD=staging_secure_2024 (stronger)

Production:
DB_HOST=prod-rds.amazonaws.com
DB_PASSWORD=Pr0d!Secur3#2024 (strong, but hardcoded)

Problems:
• Same .env template, easy to mix up values
• Accidentally deploying prod creds to dev
• Accidentally deploying dev creds to prod
• No programmatic enforcement of secret strength

What is AWS Secrets Manager?

AWS Secrets Manager is a secrets management service that helps protect access to applications, services, and IT resources. It enables rotation, management, and retrieval of database credentials, API keys, and other secrets throughout their lifecycle.

The Value Proposition

Secrets Manager vs Parameter Store

Since we covered Parameter Store in Systems Manager:

┌─────────────────────┬──────────────────┬────────────────┐
│ Feature             │ Parameter Store  │ Secrets Manager│
├─────────────────────┼──────────────────┼────────────────┤
│ Use Case            │ Config + Secrets │ Secrets only   │
│ Price               │ Free (standard)  │ $0.40/secret   │
│ Rotation            │ Manual           │ Automatic      │
│ RDS integration     │ No               │ Yes (native)   │
│ Cross-account       │ Complex          │ Built-in       │
│ Multi-region        │ Manual           │ Replication    │
│ Versioning          │ 100 versions     │ Unlimited      │
│ Max secret size     │ 8 KB             │ 65,536 bytes   │
│ Fine-grained access │ IAM only         │ IAM + Resource │
└─────────────────────┴──────────────────┴────────────────┘

Decision Guide:

  • Application config → Parameter Store (free)

  • Database passwords → Secrets Manager (rotation)

  • API keys (no rotation) → Parameter Store

  • API keys (rotation needed) → Secrets Manager

  • RDS/Redshift credentials → Secrets Manager

  • Third-party SaaS tokens → Secrets Manager

Understanding Secrets Manager Core Concepts

1. Secret Types

JSON secrets (structured):

{
  "username": "admin",
  "password": "Secur3P@ssw0rd!",
  "engine": "postgres",
  "host": "prod-rds.amazonaws.com",
  "port": 5432,
  "dbname": "production"
}

Plaintext secrets (unstructured):

API_KEY=sk-proj-abc123def456ghi789

Best practice: Use JSON for database credentials

2. Secret Versions and Staging Labels

How versioning works:

Secret: prod/database/credentials

Versions:
├── Version 1 (uuid-abc123)
│   └── Label: AWSPREVIOUS
├── Version 2 (uuid-def456)
│   └── Label: AWSCURRENT (active)
└── Version 3 (uuid-ghi789) [pending rotation]
    └── Label: AWSPENDING

When app calls GetSecretValue():
• Default: Returns AWSCURRENT
• During rotation: AWSCURRENT = old, AWSPENDING = new
• After rotation: AWSPENDING becomes AWSCURRENT

Staging labels:

AWSCURRENT:
└── The version currently in use (default)

AWSPREVIOUS:
└── The previous version (for rollback)

AWSPENDING:
└── New version being created during rotation

3. Automatic Rotation

Rotation process:

Secrets Manager Rotation Flow:

1. Create New Secret
   ├── Lambda creates new password
   ├── Creates AWSPENDING version
   └── Does not affect current traffic

2. Set New Secret
   ├── Lambda updates database with new password
   ├── Both old and new passwords valid
   └── Zero downtime

3. Test New Secret
   ├── Lambda tests connection with new password
   ├── If fails, rotation aborted
   └── Old password remains AWSCURRENT

4. Finish Rotation
   ├── AWSPENDING becomes AWSCURRENT
   ├── Old AWSCURRENT becomes AWSPREVIOUS
   └── Applications automatically use new password

Supported rotation:

Native (AWS-provided Lambda):
• RDS MySQL
• RDS PostgreSQL
• RDS MariaDB
• RDS Oracle
• RDS SQL Server
• Amazon Redshift
• Amazon DocumentDB

Custom (your Lambda):
• Third-party databases
• API keys
• OAuth tokens
• Any credential with API to rotate

4. Resource Policies

Resource-based policy (on secret):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::123456789012:role/ProductionAppRole"
      },
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "*",
      "Condition": {
        "StringEquals": {
          "secretsmanager:VersionStage": "AWSCURRENT"
        }
      }
    },
    {
      "Effect": "Deny",
      "Principal": "*",
      "Action": "secretsmanager:GetSecretValue",
      "Resource": "*",
      "Condition": {
        "StringNotEquals": {
          "aws:PrincipalAccount": "123456789012"
        }
      }
    }
  ]
}

5. Cross-Account Access

Architecture:

Account A (Security/Shared Services):
└── Secrets Manager secret: prod/database/credentials
    └── Resource policy: Allow Account B

Account B (Application):
└── IAM role: ProductionAppRole
    └── Policy: Allow secretsmanager:GetSecretValue

Application in Account B:
└── Assumes ProductionAppRole
    └── Calls GetSecretValue in Account A
        └── Success (both IAM and resource policy allow)

6. Multi-Region Replication

Replica secrets:

Primary Secret: us-east-1
├── Name: prod/database/credentials
├── Rotation: Enabled (30 days)
└── Replicas:
    ├── us-west-2 (auto-synced)
    └── eu-west-1 (auto-synced)

Benefits:
• Disaster recovery (region failure)
• Lower latency (fetch from nearest region)
• No code changes (same secret name)

Sync:
• < 1 second propagation
• Rotation in primary → replicas updated
• Replicas read-only

Top 3 Best Practices for DevOps

Best Practice 1: Enable Automatic Rotation for Database Credentials

Why automatic rotation matters:

Manual rotation risks:
├── Forgotten (password never rotated)
├── Breaks application (typo in new password)
├── Downtime (coordination required)
└── Compliance violations (stale credentials)

Automatic rotation benefits:
├── Enforced rotation (30, 60, 90 days)
├── Zero downtime (old + new both work briefly)
├── Tested (AWS-provided Lambda functions)
└── Compliant (audit trail in CloudTrail)

Rotation schedule strategy:

Development:
• Rotation: 7 days (aggressive for testing)
• Impact: Low (can tolerate issues)

Staging:
• Rotation: 30 days
• Test rotation process before production

Production:
• Rotation: 30-90 days (compliance requirement)
• PCI DSS: 90 days maximum
• Best practice: 30 days

Shared services (CI/CD, monitoring):
• Rotation: 60-90 days
• Less aggressive (infrastructure dependencies)

Best Practice 2: Use Cross-Account Access for Centralized Secrets

Why centralize secrets:

Problem: Secrets duplicated per account
├── Dev account: dev/database/credentials
├── Staging account: staging/database/credentials  
├── Prod account: prod/database/credentials
└── Security audit: 3X the secrets to review

Solution: Centralized secrets in security account
├── Security account: All secrets
├── Dev/Staging/Prod accounts: Reference via cross-account
└── Security audit: Single source of truth

Architecture:

┌─────────────────────────────────────────────────────────┐
│         Cross-Account Secrets Architecture              │
└─────────────────────────────────────────────────────────┘

Security Account (111111111111):
├── prod/database/credentials
│   └── Resource policy: Allow Prod Account
├── staging/database/credentials
│   └── Resource policy: Allow Staging Account
└── dev/database/credentials
    └── Resource policy: Allow Dev Account

Production Account (222222222222):
├── IAM role: ProductionAppRole
│   └── Policy: secretsmanager:GetSecretValue
└── Application
    └── Calls GetSecretValue(prod/database/credentials)
        → Success (cross-account)

Benefits:

Centralization:
• Single source of truth
• Easier audit (one account)
• Centralized rotation management
• Consistent secret naming

Security:
• Security account locked down (limited access)
• Principle of least privilege per application
• Easier compliance review

Operations:
• Simpler secret management
• Fewer secrets to track
• Reduced duplication

Best Practice 3: Implement Secret Caching to Reduce Costs and Latency

Why caching matters:

Without caching:
• Every request = API call to Secrets Manager
• High-traffic API: 10M requests/month = $50 in API costs
• Latency: 50-100ms per request

With caching:
• Fetch once per instance/container
• Cache for 1 hour (configurable)
• API calls: 720/month per instance = $0
• Latency: 0ms (in-memory)

AWS Secrets Manager Caching Library:

# Python caching implementation
from aws_secretsmanager_caching import SecretCache, SecretCacheConfig
import boto3

# Create cache with config
client = boto3.client('secretsmanager', region_name='us-east-1')
cache_config = SecretCacheConfig(
    max_cache_size=10,           # Max secrets in cache
    secret_refresh_interval=3600,  # 1 hour TTL
    secret_version_stage='AWSCURRENT'
)
cache = SecretCache(config=cache_config, client=client)

# Usage (cached automatically)
def get_database_password():
    secret = cache.get_secret_string('prod/database/credentials')
    return json.loads(secret)['password']

# First call: Fetches from Secrets Manager (API call)
# Subsequent calls (within 1 hour): Returns from cache (no API call)
password = get_database_password()

Cache TTL strategy:

High-frequency secrets (DB credentials):
• TTL: 1 hour (3600 seconds)
• Balance: Freshness vs cost

Low-frequency secrets (API keys):
• TTL: 6-12 hours
• Infrequent rotation, longer cache acceptable

During rotation window:
• Reduce TTL temporarily (5 minutes)
• Faster pickup of new credentials
• Resume normal TTL after rotation

Top 3 DevOps Use Cases

Use Case 1: RDS Password Rotation Without Downtime

The scenario:

Automatically rotate RDS database password every 30 days with zero application downtime.

Implementation:

# Step 1: Create secret for existing RDS instance
aws secretsmanager create-secret \
  --name prod/rds/mysql \
  --description "Production MySQL credentials" \
  --secret-string '{
    "username": "admin",
    "password": "currentPassword123!",
    "engine": "mysql",
    "host": "prod-mysql.abc123.us-east-1.rds.amazonaws.com",
    "port": 3306,
    "dbname": "production"
  }'

# Step 2: Enable rotation
aws secretsmanager rotate-secret \
  --secret-id prod/rds/mysql \
  --rotation-lambda-arn "arn:aws:lambda:us-east-1:123456789012:function:SecretsManagerRDSMySQLRotationSingleUser" \
  --rotation-rules "AutomaticallyAfterDays=30"

# Step 3: Test rotation immediately
aws secretsmanager rotate-secret \
  --secret-id prod/rds/mysql

Results:

Rotation timeline:

Day 0: Secret created, rotation scheduled
Day 30: Automatic rotation triggered
  ├── 14:00:00 - Rotation starts
  ├── 14:00:05 - New password generated (AWSPENDING)
  ├── 14:00:10 - RDS password updated (both old+new valid)
  ├── 14:00:15 - Connection test successful
  ├── 14:00:20 - AWSPENDING → AWSCURRENT
  └── 14:00:20 - Rotation complete

Application impact:
• Downtime: 0 seconds
• Errors: 0 (seamless rotation)
• Cache refreshes within 1 hour
• Old connections: Work until cache refresh
• New connections: Use new password

Day 60: Automatic rotation #2 (repeat)

Use Case 2: Third-Party API Key Rotation

The scenario:

Rotate API keys for third-party services (Stripe, SendGrid, Twilio) using a custom rotation Lambda.

Use Case 3: Multi-Region Disaster Recovery with Secret Replication

The scenario:

Application deployed in multiple regions. Each region needs access to the same secrets. Use replication for low-latency access and disaster recovery.

Implementation:

# Create primary secret in us-east-1
aws secretsmanager create-secret \
  --name prod/database/credentials \
  --secret-string '{"username":"admin","password":"..."}' \
  --region us-east-1

# Enable replication to us-west-2 and eu-west-1
aws secretsmanager replicate-secret-to-regions \
  --secret-id prod/database/credentials \
  --add-replica-regions Region=us-west-2,KmsKeyId=arn:aws:kms:us-west-2:123456789012:key/west-key \
  --add-replica-regions Region=eu-west-1,KmsKeyId=arn:aws:kms:eu-west-1:123456789012:key/eu-key \
  --region us-east-1

Application code (region-aware):

import boto3
import os

# Automatically uses the region the app is running in
region = os.environ.get('AWS_REGION', 'us-east-1')
client = boto3.client('secretsmanager', region_name=region)

def get_secret(secret_name):
    """Fetch secret from local region (replica)."""
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response['SecretString'])

# us-east-1 app → fetches from us-east-1
# us-west-2 app → fetches from us-west-2 replica
# eu-west-1 app → fetches from eu-west-1 replica

# Benefits:
# • Lower latency (local region)
# • Disaster recovery (region failure)
# • Same secret name everywhere

Common Pitfalls to Avoid

Pitfall 1: Not Caching Secrets

Problem: Every request = API call = high cost + latency
Solution: Use AWS caching library or implement a 1-hour TTL cache

Pitfall 2: Hardcoding Secret ARNs

Problem: Different ARNs per environment
Solution: Use a secret name (works across regions)

Pitfall 3: Fetching Secrets on Every Request

Problem: Fetch once at app startup, cache in memory
Solution: Use the singleton pattern with periodic refresh

Pitfall 4: Deleting Secrets Immediately

Problem: Can't recover if accidentally deleted
Solution: Use a minimum 7-day recovery window

Pitfall 5: Using Secrets Manager for Everything

Problem: Expensive for non-sensitive config
Solution: Use Parameter Store for config, Secrets Manager for credentials

Pitfall 6: No Rotation Testing

Problem: Rotation fails in production
Solution: Test rotation in dev/staging first

Conclusion

AWS Secrets Manager eliminates hardcoded credentials, automates rotation, and provides complete audit trails for all secret access—transforming secrets management from a manual, error-prone process into an automated, secure system.

Key takeaways:

  • Enable automatic rotation: 30-day rotation for DB credentials

  • Cross-account centralization: The security account owns all secrets

  • Caching: 1-hour TTL reduces costs 99%+

  • RDS integration: Native rotation for RDS/Redshift

  • Multi-region replication: DR + low latency

Secrets Manager vs Parameter Store:

Use Parameter Store when:

  • Application config (non-sensitive)

  • No rotation needed

  • Cost-sensitive (free tier)

Use Secrets Manager when:

  • Database credentials (RDS, Redshift)

  • Third-party API keys needing rotation

  • Cross-account sharing required

  • Compliance requires rotation

Questions or secrets management tips? Drop a comment!

Follow for more AWS deep dives from a DevOps perspective.

#AWS #SecretsManager #Security #DevOps #Automation #Compliance #CredentialManagement

Essential AWS Services For DevOps Engineer

Part 15 of 16

In this series, I will share the top 15 essential AWS services that every DevOps engineer should know. I will not only share what these services are but also share how and why those services are used in a production from a DevOps perspective.

Up next

How DevOps Engineers Use AWS CloudTrail for Comprehensive Activity Auditing

Introduction It's Tuesday morning. You arrive at the office to find an urgent Slack message from your CTO: "Production S3 bucket containing customer data was deleted at 2:47 AM. Need to know: Who did it? How? Was data exfiltrated first?" You open the...