AWS Systems Manager: Simplify DevOps Without SSH

Introduction

You're a DevOps engineer responsible for 50 EC2 instances. It's Friday at 4 PM. A critical CVE ( Common Vulnerabilities and Exposures ) just dropped, a vulnerability in OpenSSL affecting every Linux server. Your security team wants all instances patched before Monday morning.

The manual approach:

SSH into each of 50 instances (one at a time)
Run sudo apt update && sudo apt upgrade -y openssl
Verify the patch applied
Log everything for compliance
Time: 4-6 hours minimum. Risk: You'll miss something.

The Systems Manager approach:

Create a Run Command targeting all 50 instances with Environment=production tag
Execute the patch command fleet-wide
View results in a single dashboard
Session logs auto-saved to CloudWatch
Time: 15 minutes. Zero SSH. Full audit trail.

This is what AWS Systems Manager enables: operational tasks at scale, without the toil of manual SSH, without bastion hosts, and with complete visibility into your entire fleet.

The Problem: Manual Operations at Scale

Challenge 1: SSH at Scale is Unsustainable

The math of manual SSH:

Fleet size: 50 EC2 instances

Per-instance task: 5 minutes
SSH overhead: 2 minutes per instance

Total time: 50 × 7 minutes = 350 minutes = ~6 hours

For weekly maintenance:
• 50 instances × 30 minutes/week = 25 hours/week
• For one engineer: unsustainable
• For a team: expensive and error-prone

SSH security risks:

SSH drawbacks:
├── Port 22 open to internet (or VPN)
│   └── Attack surface for brute force
├── SSH key management
│   ├── Multiple key pairs per team member
│   ├── Keys shared (security anti-pattern)
│   ├── Leaked keys = full server access
│   └── No central key revocation
├── No session recording
│   ├── Who ran what command?
│   └── When? What output?
└── Bastion host complexity
    ├── Extra server to maintain
    ├── Single point of failure
    └── Costs money

Challenge 2: Secrets Sprawl

Where configuration and secrets typically live:

Secrets scattered across:
├── .env files (committed to git accidentally)
├── application.properties (plain text)
├── Hardcoded in source code
├── EC2 user data scripts
├── Environment variables (visible in process list)
└── Jenkins/CI pipeline variables

Problems:
• Secrets visible to anyone with server access
• No rotation workflow
• No audit trail (who accessed what?)
• No versioning (can't roll back config)
• Different values per environment (drift)

Real-world consequences:

Developer commits .env file to GitHub
Scrapers find it within minutes
Database compromised
Incident response: 48 hours

Challenge 3: Configuration Drift

What drift looks like:

Monday: Deploy app with config version 1.2
Tuesday: Engineer SSHes in, manually tweaks nginx config
Wednesday: Another engineer SSHes in, changes memory settings
Thursday: Auto Scaling launches new instance (original config)
Friday: 3 instances with config 1.2, 1 with nginx tweak, 1 with memory tweak

Result:
• Inconsistent behavior across fleet
• "Works on some servers but not others"
• Impossible to diagnose issues
• Can't recreate state for debugging

Challenge 4: Patching Compliance

Manual patch management:

Security audit asks:
"Which servers are running OpenSSL < 3.0.8?"

Manual answer:
• SSH into each server
• Run: openssl version
• Record result in spreadsheet
• Report takes 3 days

Compliance status:
• Unknown until audit
• Patches applied inconsistently
• No enforcement mechanism

What is AWS Systems Manager?

AWS Systems Manager (SSM) is an operations hub for AWS infrastructure. It provides a unified interface to view and control your infrastructure, automating operational tasks across AWS resources.

The Value Proposition

SSM Capabilities Overview

AWS Systems Manager
├── Operations Management
│   ├── OpsCenter (incident tracking)
│   ├── Explorer (operations dashboard)
│   └── Incident Manager
├── Application Management
│   ├── Parameter Store (config/secrets)
│   ├── AppConfig (feature flags)
│   └── Application Manager
├── Change Management
│   ├── Automation (runbooks)
│   ├── Change Manager
│   └── Maintenance Windows
├── Node Management
│   ├── Fleet Manager (EC2 management)
│   ├── Session Manager (shell access)
│   ├── Run Command (fleet commands)
│   ├── Patch Manager (OS patching)
│   └── Inventory (system data)
└── Shared Resources
    ├── Documents (runbook definitions)
    └── Parameter Store

SSM Agent

SSM Agent = lightweight software running on managed instances

Supported platforms:
├── Amazon Linux 2 (pre-installed)
├── Amazon Linux 2023 (pre-installed)
├── Ubuntu 16.04+ (pre-installed on AMIs)
├── Windows Server 2008+
├── macOS (managed nodes)
└── On-premises servers (hybrid activation)

Requirements:
├── SSM Agent installed
├── SSM IAM role attached (AmazonSSMManagedInstanceCore)
└── Outbound HTTPS (port 443) to SSM endpoints
    (no inbound ports required!)

Understanding SSM Core Capabilities

1. Session Manager

Session Manager = Secure, browser-based or CLI-based shell access to instances without SSH.

How it works:

Traditional SSH flow:
User → Internet → Bastion Host (port 22) → EC2 Instance (port 22)

Session Manager flow:
User → AWS Console/CLI → SSM Service → SSM Agent on EC2
(No ports open, no keys needed)

Architecture:

┌─────────────────────────────────────────────────────────┐
│              Session Manager Flow                       │
└─────────────────────────────────────────────────────────┘

Engineer                SSM Service              EC2 Instance
   │                        │                        │
   │   StartSession API     │                        │
   │──────────────────────► │                        │
   │                        │   WebSocket tunnel     │
   │   ◄────────────────────│───────────────────────►│
   │                        │  (SSM Agent polls SSM) │
   │   Interactive session  │                        │
   │◄──────────────────────►│◄──────────────────────►│
   │                        │                        │
   │   All commands logged  │                        │
   │                   ┌────▼──────┐                 │
   │                   │CloudWatch │                 │
   │                   │  Logs     │                 │
   │                   └───────────┘                 │

2. Parameter Store

Parameter Store = Hierarchical secrets and configuration storage.

Parameter types:

String:
• Plain text values
• Use: Non-sensitive config
• Example: /prod/app/log-level → "INFO"

StringList:
• Comma-separated values
• Use: Lists of values
• Example: /prod/app/allowed-ips → "10.0.0.1,10.0.0.2"

SecureString:
• Encrypted with KMS
• Use: Secrets, credentials
• Example: /prod/db/password → "s3cr3t!" (stored encrypted)

Hierarchy:

/
├── /prod/
│   ├── /prod/database/
│   │   ├── /prod/database/host
│   │   ├── /prod/database/port
│   │   ├── /prod/database/username
│   │   └── /prod/database/password (SecureString)
│   ├── /prod/redis/
│   │   └── /prod/redis/url
│   └── /prod/api/
│       └── /prod/api/stripe-secret-key (SecureString)
├── /staging/
│   └── ... (same structure, different values)
└── /dev/
    └── ... (same structure, different values)

3. Run Command

Run Command = Execute commands across multiple instances simultaneously.

Document types:

AWS-Managed Documents:
├── AWS-RunShellScript (Linux commands)
├── AWS-RunPowerShellScript (Windows)
├── AWS-InstallApplication (software install)
├── AWS-UpdateSSMAgent (update SSM agent)
├── AWS-GatherSoftwareInventory (collect inventory)
└── AWS-ApplyPatchBaseline (apply patches)

Custom Documents:
└── Your own YAML/JSON runbooks

4. Patch Manager

Patch Manager = Automate OS and software patching.

Patch Process:

1. Patch Baseline:
   • Define approved/rejected patches
   • Auto-approve patches after N days
   • Exceptions per CVE

2. Patch Group:
   • Group instances by tag (PatchGroup=prod)
   • Associate baseline with group

3. Maintenance Window:
   • Schedule: cron(0 2 ? * TUE *)  (Tuesdays 2 AM)
   • Max concurrency: 25%
   • Max error: 1%
   • Tasks: Scan + Install

4. Results:
   • Compliance report per instance
   • Missing patches list
   • CloudWatch metrics

5. Inventory

Inventory = Collect metadata from managed instances.

Collected Data:
├── Applications installed
├── AWS components (SSM Agent, CloudWatch Agent versions)
├── Network configuration (IPs, MACs)
├── Windows updates
├── Instance details (OS, CPU, memory)
├── Services running
├── Files (custom queries)
└── Registry (Windows)

Queried via:
• SSM Console
• Resource Data Sync → S3 → Athena
• AWS Config integration

Top 3 Best Practices for DevOps

Best Practice 1: Replace All SSH with Session Manager

Implementation steps:

1. Attach SSM role to EC2 instances:

# CloudFormation: EC2 with SSM role
Resources:
  EC2InstanceRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal: {Service: ec2.amazonaws.com}
            Action: sts:AssumeRole
      ManagedPolicyArns:
        - arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore

  EC2InstanceProfile:
    Type: AWS::IAM::InstanceProfile
    Properties:
      Roles: [!Ref EC2InstanceRole]

  EC2Instance:
    Type: AWS::EC2::Instance
    Properties:
      IamInstanceProfile: !Ref EC2InstanceProfile
      # No key pair needed!
      # KeyName: my-key-pair  ← Remove this

  # Security Group: No port 22
  EC2SecurityGroup:
    Type: AWS::EC2::SecurityGroup
    Properties:
      GroupDescription: No SSH needed with SSM
      SecurityGroupIngress:
        # Only allow traffic from ALB on port 80/443
        - IpProtocol: tcp
          FromPort: 80
          ToPort: 80
          SourceSecurityGroupId: !Ref ALBSecurityGroup

2. Remove bastion host:

Before (Costly, Complex):
Internet (SSH) → Bastion Host → Private EC2

After (Free, Simple):
Engineer → AWS Console / CLI → Session Manager → Private EC2

3. Session logging for compliance:

# Create S3 bucket for session logs
aws s3 mb s3://my-ssm-session-logs

# Bucket policy (deny unencrypted)
aws s3api put-bucket-policy \
  --bucket my-ssm-session-logs \
  --policy '{
    "Statement": [{
      "Effect": "Deny",
      "Principal": "*",
      "Action": "s3:PutObject",
      "Resource": "arn:aws:s3:::my-ssm-session-logs/*",
      "Condition": {
        "StringNotEquals": {
          "s3:x-amz-server-side-encryption": "aws:kms"
        }
      }
    }]
  }'

4. Restrict who can start sessions:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": "ssm:StartSession",
      "Resource": [
        "arn:aws:ec2:us-east-1:123456789012:instance/*"
      ],
      "Condition": {
        "StringEquals": {
          "ssm:resourceTag/Environment": ["production"],
          "ssm:resourceTag/AllowSSM": ["true"]
        }
      }
    }
  ]
}

Audit trail:

Every session records:
• Who connected (IAM user/role)
• When session started/ended
• All commands executed
• All output
• Session stored in S3/CloudWatch

Compliance benefits:
• SOC 2: Access logging
• PCI DSS: System access records
• HIPAA: Audit controls
• ISO 27001: Access management

Best Practice 2: Hierarchical Parameter Store for All Config

Problem with flat secrets management:

Bad: Flat parameters
/db-host-prod
/db-password-prod
/db-host-staging
/db-password-staging

Issues:
• No clear ownership
• Hard to grant environment-specific access
• Hard to retrieve all params for an environment

Solution: Hierarchical namespacing with IAM paths:

Good: Hierarchical structure
/prod/database/host
/prod/database/password
/staging/database/host
/staging/database/password

Benefits:
• IAM policy by path prefix
• GetParametersByPath = all app config in one call
• Clear ownership

IAM policy using paths:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": [
        "ssm:GetParameter",
        "ssm:GetParameters",
        "ssm:GetParametersByPath"
      ],
      "Resource": "arn:aws:ssm:us-east-1:123456789012:parameter/prod/*"
    },
    {
      "Effect": "Allow",
      "Action": "kms:Decrypt",
      "Resource": "arn:aws:kms:us-east-1:123456789012:key/prod-ssm-key"
    }
  ]
}

Rotate secrets:

# Update a parameter (rotation)
aws ssm put-parameter \
  --name "/prod/database/password" \
  --value "new-rotated-password-here" \
  --type SecureString \
  --key-id "alias/prod-ssm-key" \
  --overwrite

# Parameter versioning is automatic
# Application fetches latest on restart
# Old version preserved (rollback capability)

Parameter Store vs Secrets Manager:

Parameter Store (Free tier):
✓ Config values + secrets
✓ Free for standard parameters
✓ KMS encryption
✓ IAM-controlled
✗ No auto-rotation
✗ No cross-account sharing
Use: App config, API keys, DB passwords

Secrets Manager ($0.40/secret/month):
✓ Auto-rotation built-in
✓ Cross-account access
✓ Native RDS/Redshift rotation
✓ Better for frequently rotated creds
Use: RDS master passwords, API keys needing rotation

Best Practice 3: Automate Patching with Maintenance Windows

Patching strategy:

Dev Environment:
• Patch baseline: All available patches
• Schedule: Daily
• Concurrency: 100% (patch all at once)
• No maintenance window needed

Staging:
• Patch baseline: Security patches (7-day approval delay)
• Schedule: Weekly (Mondays 2 AM)
• Concurrency: 50%
• Test before production

Production:
• Patch baseline: Security (30-day delay) + Critical (7-day)
• Schedule: Monthly (last Tuesday 2 AM)
• Concurrency: 25% (roll through fleet)
• Max error threshold: 1%
• Notification before + after

Top 3 DevOps Use Cases

Use Case 1: Zero-Bastion Secure Access Architecture

The scenario:

Provide developers and ops engineers secure, audited access to EC2 instances without bastion hosts or open SSH ports.

Architecture:

┌─────────────────────────────────────────────────────────┐
│          Zero-Bastion Architecture with SSM             │
└─────────────────────────────────────────────────────────┘

Before:
Engineer → VPN → Bastion (port 22) → Private EC2 (port 22)
                                     ↑ Port 22 open in SG

After:
Engineer → AWS CLI / Console
              ↓
         IAM Auth check
              ↓
         SSM Service
              ↓ (WSS tunnel, no inbound ports)
         SSM Agent on EC2
              ↓
         Shell session
              ↓
         Session logged to CloudWatch

Access tiers:

Tier 1: Read-only access (developers)
├── Can start Session Manager sessions
├── Cannot sudo to root
└── Session logged and monitored

Tier 2: Application access (senior devs)
├── Can start sessions
├── Can restart application services
└── Cannot modify system config

Tier 3: Admin access (ops engineers)
├── Full shell access
├── Sudo allowed
└── All sessions logged, reviewed weekly

All tiers:
└── No SSH keys, no bastion hosts

Use Case 2: Fleet-Wide Configuration Management

The scenario:

Roll out configuration changes (install agents, update config files, restart services) across entire EC2 fleet simultaneously.

Common fleet operations with Run Command:

# 1. Install CloudWatch agent on all production servers
aws ssm send-command \
  --document-name "AWS-RunShellScript" \
  --targets "Key=tag:Environment,Values=production" \
  --parameters 'commands=[
    "sudo yum install -y amazon-cloudwatch-agent",
    "sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:/prod/cloudwatch-config -s",
    "echo CloudWatch agent installed and configured"
  ]'

# 2. Rotate application config (after Parameter Store update)
aws ssm send-command \
  --document-name "AWS-RunShellScript" \
  --targets "Key=tag:App,Values=api-server" \
  --parameters 'commands=[
    "sudo systemctl restart api-server",
    "sleep 10",
    "systemctl is-active api-server && echo SUCCESS || echo FAILED"
  ]'

# 3. Collect diagnostic info from all servers
aws ssm send-command \
  --document-name "AWS-RunShellScript" \
  --targets "Key=tag:Environment,Values=production" \
  --parameters 'commands=[
    "echo === System Info ===",
    "uname -a",
    "df -h",
    "free -h",
    "echo === App Status ===",
    "systemctl status api-server --no-pager",
    "echo === Recent Errors ===",
    "journalctl -u api-server -n 20 --no-pager"
  ]' \
  --output-s3-bucket-name ssm-diagnostics-output

Use Case 3: Centralized Application Configuration Without Restart

The scenario:

Use Parameter Store + SSM AppConfig for dynamic feature flags and configuration that updates without application restart.

Architecture:

┌─────────────────────────────────────────────────────────┐
│         Dynamic Config with Parameter Store             │
└─────────────────────────────────────────────────────────┘

Deploy sequence:
1. Update parameter in SSM:
   /prod/feature-flags/new-checkout → "enabled"

2. Application polls Parameter Store every 60s
   (or triggered by EventBridge)

3. Application reads new value
   → Feature flag enabled without restart

Benefits:
• No deployment for config changes
• Instant rollback (update parameter back)
• Audit trail in CloudTrail
• Per-environment values

Common Pitfalls to Avoid

Pitfall 1: Forgetting SSM Agent + IAM Role

Problem: SSM doesn't work without both
Solution: Always attach AmazonSSMManagedInstanceCore policy and verify the agent is running

Pitfall 2: Flat Parameter Naming

Problem: Can't grant environment-specific access
Solution: Use hierarchical paths (/env/service/param)

Pitfall 3: Not Setting Session Idle Timeout

Problem: Long-running abandoned sessions
Solution: Set a 20-minute idle timeout in the Session Manager preferences

Pitfall 4: Using String for Secrets

Problem: Passwords visible in plain text
Solution: Always use SecureString with KMS for secrets

Pitfall 5: 100% Concurrency for Patching

Problem: All instances patching simultaneously = downtime
Solution: Use 25% concurrency with 1% error threshold

Pitfall 6: No Parameter Versioning Strategy

Problem: Can't rollback to previous config
Solution: SSM auto-versions parameters (default 100 versions retained)

Conclusion

AWS Systems Manager transforms how DevOps teams operate EC2 infrastructure—eliminating bastion hosts, centralizing secrets management, and automating patching across fleets of any size.

Key takeaways:

Session Manager: Replace SSH and bastion hosts entirely
Parameter Store: Centralize all config and secrets
Patch Manager: Automate OS patching with compliance reporting
Run Command: Fleet-wide operations without SSH
Inventory: Full visibility into your fleet state

SSM vs alternatives:

Task	Manual	SSM	Notes
Shell access	SSH + Bastion	Session Manager	SSM free, more secure
Config/secrets	.env files	Parameter Store	Free tier generous
OS patching	Manual SSH	Patch Manager	Free for EC2
Fleet commands	Ansible	Run Command	Free, no infra needed

Questions or SSM tips? Drop a comment!

#AWS #SystemsManager #SSM #DevOps #Security #Automation #SecretManagement

Command Palette

Introduction

The Problem: Manual Operations at Scale

Challenge 1: SSH at Scale is Unsustainable

Challenge 2: Secrets Sprawl

Challenge 3: Configuration Drift

Challenge 4: Patching Compliance

What is AWS Systems Manager?

The Value Proposition

SSM Capabilities Overview

SSM Agent

Understanding SSM Core Capabilities

1. Session Manager

2. Parameter Store

3. Run Command

4. Patch Manager

5. Inventory

Top 3 Best Practices for DevOps

Best Practice 1: Replace All SSH with Session Manager

Best Practice 2: Hierarchical Parameter Store for All Config

Best Practice 3: Automate Patching with Maintenance Windows

Top 3 DevOps Use Cases

Use Case 1: Zero-Bastion Secure Access Architecture

Use Case 2: Fleet-Wide Configuration Management

Use Case 3: Centralized Application Configuration Without Restart

Common Pitfalls to Avoid

Pitfall 1: Forgetting SSM Agent + IAM Role

Pitfall 2: Flat Parameter Naming

Pitfall 3: Not Setting Session Idle Timeout

Pitfall 4: Using String for Secrets

Pitfall 5: 100% Concurrency for Patching

Pitfall 6: No Parameter Versioning Strategy

Conclusion

Comments

Essential AWS Services For DevOps Engineer

AWS Secrets Manager for DevOps Engineers: Secure Secrets Management Explained

More from this blog