Amazon EBS

Amazon EBS (Elastic Block Store)

Amazon EBS is AWS’s block storage for EC2—fast, durable, and flexible—but it’s also where surprise costs creep in from over-provisioned IOPS/throughput and forgotten snapshots. This page blends Grok’s highlights with a pragmatic, FinOps-oriented playbook: what you’re using, what you’re paying for, what to change, and which native AWS tools help you do it quickly.


🚀 What is EBS?

Amazon Elastic Block Store (EBS) provides persistent block volumes for EC2 with single-digit-ms latency, online resize, and snapshots for backup/DR. Volumes are zonal (attach within the same AZ) and behave like raw block devices for filesystems, databases, and applications.

Key features

  • Provisioned performance: choose volume types, IOPS, and throughput to match workloads.

  • Snapshots: incremental, point-in-time backups; copy/share across accounts/Regions.

  • Encryption: integrate with AWS KMS; enable encryption by default per account/Region.

  • Elastic Volumes: modify size, type, and (where supported) IOPS/throughput online.

  • Broad regional availability; designed for high durability and availability.


⚙️ Volume Types — pick the right drive

Type
Primary Use Case
Notes

gp3 (General Purpose SSD)

Boot volumes, app servers, most databases/dev/test

Cost-effective default; baseline 3,000 IOPS / 125 MiB/s; independently scale IOPS/throughput as needed. Great gp2→gp3 upgrade path.

io2 Block Express (PIOPS SSD)

Mission-critical, latency-sensitive DBs

Highest single-volume ceilings (e.g., up to 256k IOPS / 4,000 MiB/s); supports Multi-Attach; Nitro-based instances recommended.

io2 (PIOPS SSD)

High IOPS OLTP, consistent latency

Up to 64k IOPS / 1,000 MiB/s; durable PIOPS with Multi-Attach.

st1 (Throughput-optimized HDD)

Large, sequential I/O (ETL, big logs)

Up to ~500 MiB/s throughput; not for random I/O or boot volumes.

sc1 (Cold HDD)

Cold, infrequently accessed sequential data

Lowest $/GiB HDD; not for boot; suitable for cold scans and large archives.

Quick rule: Random/low-latency → SSD (gp3/io2); Large sequential → HDD (st1/sc1). Start with gp3 unless you have measured needs for PIOPS.


🧬 Performance & advanced features

Enhancement
What it does
Where to use it
Notes

Elastic Volumes

Online resize/type/perf changes

Routine rightsizing

No detach required for supported changes; common for gp2→gp3 or tuning gp3 IOPS/throughput.

Multi-Attach

One volume attached to multiple instances

Clustered apps (e.g., RAC, quorum disks)

io1/io2 only, same AZ, up to 16 Nitro instances; use a cluster-aware filesystem.

Fast Snapshot Restore (FSR)

Instant full-speed restores from snapshots

Fast cutovers, DR drills, fleet rollouts

Billed while enabled per snapshot/AZ; enable only where RTO demands.

Snapshot Archive tier

Lower-cost storage for old snapshots

Long-term retention

Cheapest snapshot tier; retrieval takes longer and has a minimum duration.

Recycle Bin & Snapshot Lock

Guardrails against deletion & tamper

Compliance, ransomware resilience

Recycle Bin = time-based recovery; Snapshot Lock = governance/WORM.


🏛️ Attachment & deployment patterns

Pattern
When to use
Notes

Single-Attach

Most EC2 workloads

Default mode; pair with EBS-optimized instances for consistent bandwidth/latency.

Multi-Attach

Shared-disk cluster designs

io1/io2 only; same-AZ constraint; application must coordinate writes.

RAID0 (striping)

Very high throughput with SSDs

Prefer io2 Block Express when one big volume suffices; stripe only when needed and documented.

EBS is zonal—use snapshots (and/or AWS Backup) for cross-AZ/Region copy and recovery patterns.


🧠 EBS optimization strategy (FinOps + reliability)

Strategy
Actions
Tools/Notes

Migrate gp2 → gp3

Convert in place with Elastic Volumes; keep size, raise perf only as needed

Typical 15–20% $/GiB savings vs gp2; independent IOPS/throughput knobs.

Right-size IOPS/throughput

Match to observed p95/p99 + headroom (not worst day ever)

CloudWatch (IOPS, queue depth, throughput %) + AWS Compute Optimizer volume recs.

Tune by access pattern

Sequential → st1/sc1; random/latency → gp3/io2

Don’t boot from HDD; beware credit/burst models on HDD.

Use EBS-optimized instances

Ensure EC2’s EBS bandwidth isn’t the bottleneck

Many Nitro types include it; check docs before chasing volume limits.

Snapshot hygiene

Set lifecycle policies; archive or delete stale snaps

DLM or AWS Backup; tag and expire snapshots on schedule.

FSR only where needed

Enable for migrations/drills; disable after

Minimizes ongoing FSR metering.

Find & delete orphans

Remove unattached volumes and stale snapshots

Tag rigorously; use Resource Explorer/Config or scripts to list detached volumes.

Common bill-busters

  • Paying for PIOPS you don’t use (io1/io2) or extra gp3 IOPS/throughput with low utilization

  • Forgotten snapshots (and copied snapshots across Regions)

  • FSR left enabled after cutovers

  • Detached volumes with no owner tags


💸 Pricing model & gotchas

  • Volumes: billed per provisioned GiB-month.

    • gp3 includes baseline performance; extra provisioned IOPS and throughput are billed separately.

    • io1/io2 charge for provisioned IOPS (in addition to GiB).

  • Snapshots: incremental storage billed per GiB-month; Archive tier is cheaper with longer retrieval and a minimum duration.

  • FSR: metered per snapshot/AZ while enabled.

  • No Savings Plans/RIs for EBS: storage isn’t covered by compute Savings Plans; cost control = rightsizing + lifecycle.

Avoid embedding hard regional prices in docs; maintain a link to your Region’s pricing page and model with your real metrics.


⏱️ Automation patterns

  • Amazon Data Lifecycle Manager (DLM): automate snapshot/AMI create-retain-delete (incl. cross-Region/account copies).

  • AWS Backup: centralized policies, vault lock, cross-account protections, and compliance reporting.

  • EventBridge + Lambda: alert/auto-remediate when snapshots are public, volumes are unencrypted, or FSR is left on.


🔒 Security & compliance

  • Turn on EBS encryption by default; scope KMS keys by env/app; rotate and audit key usage.

  • Use Snapshot Lock (governance/compliance) and Backup Vault Lock for immutability.

  • Block public access for snapshots at the account/Region level; share snapshots explicitly and temporarily.

  • Enforce least privilege IAM on ec2:CreateVolume, CreateSnapshot, ModifyVolume, and KMS actions.


📊 Monitoring & tools

  • CloudWatch (EBS): VolumeReadOps/WriteOps, VolumeReadBytes/WriteBytes, VolumeThroughputPercentage, VolumeQueueLength, BurstBalance (gp2), latency metrics where available.

  • EC2 instance metrics: confirm you’re not saturating the instance’s EBS bandwidth/IOPS caps.

  • AWS Compute Optimizer: per-volume recommendations (type, size, IOPS/throughput).

  • Cost Explorer / CUR: tag volumes/snapshots; watch gp3 add-ons, snapshot growth, archive vs standard tiers.


🧪 Quick selection cheat-sheet

  • Default: gp3 for most workloads; raise IOPS/throughput only with evidence.

  • High IOPS / tight latency: io2 or io2 Block Express (prefer single large BE volume over many striped gp3 when feasible).

  • Big sequential reads/writes: st1; cold sequential: sc1.

  • Shared-disk clusters: io1/io2 with Multi-Attach and a cluster-aware filesystem.


✅ Checklist


  • EBS pricing (volumes, PIOPS, snapshots, archive, FSR)

  • EBS volume types & limits; Multi-Attach docs

  • Elastic Volumes, DLM, AWS Backup, Snapshot Lock, Recycle Bin

  • CloudWatch metrics & EBS-optimized instances

  • Compute Optimizer for EBS; CUR field guide for EBS spend

Features and limits evolve. Validate in your Region before production changes.

Last updated