Amazon EBS

Amazon EBS (Elastic Block Store)

Amazon EBS is AWS’s block storage for EC2—fast, durable, and flexible—but it’s also where surprise costs creep in from over-provisioned IOPS/throughput and forgotten snapshots. This page blends Grok’s highlights with a pragmatic, FinOps-oriented playbook: what you’re using, what you’re paying for, what to change, and which native AWS tools help you do it quickly.

🚀 What is EBS?

Amazon Elastic Block Store (EBS) provides persistent block volumes for EC2 with single-digit-ms latency, online resize, and snapshots for backup/DR. Volumes are zonal (attach within the same AZ) and behave like raw block devices for filesystems, databases, and applications.

Key features

Provisioned performance: choose volume types, IOPS, and throughput to match workloads.
Snapshots: incremental, point-in-time backups; copy/share across accounts/Regions.
Encryption: integrate with AWS KMS; enable encryption by default per account/Region.
Elastic Volumes: modify size, type, and (where supported) IOPS/throughput online.
Broad regional availability; designed for high durability and availability.

⚙️ Volume Types — pick the right drive

Type

Primary Use Case

Notes

gp3 (General Purpose SSD)

Boot volumes, app servers, most databases/dev/test

Cost-effective default; baseline 3,000 IOPS / 125 MiB/s; independently scale IOPS/throughput as needed. Great gp2→gp3 upgrade path.

io2 Block Express (PIOPS SSD)

Mission-critical, latency-sensitive DBs

Highest single-volume ceilings (e.g., up to 256k IOPS / 4,000 MiB/s); supports Multi-Attach; Nitro-based instances recommended.

io2 (PIOPS SSD)

High IOPS OLTP, consistent latency

Up to 64k IOPS / 1,000 MiB/s; durable PIOPS with Multi-Attach.

st1 (Throughput-optimized HDD)

Large, sequential I/O (ETL, big logs)

Up to ~500 MiB/s throughput; not for random I/O or boot volumes.

sc1 (Cold HDD)

Cold, infrequently accessed sequential data

Lowest $/GiB HDD; not for boot; suitable for cold scans and large archives.

Quick rule: Random/low-latency → SSD (gp3/io2); Large sequential → HDD (st1/sc1). Start with gp3 unless you have measured needs for PIOPS.

🧬 Performance & advanced features

Enhancement

What it does

Where to use it

Notes

Elastic Volumes

Online resize/type/perf changes

Routine rightsizing

No detach required for supported changes; common for gp2→gp3 or tuning gp3 IOPS/throughput.

Multi-Attach

One volume attached to multiple instances

Clustered apps (e.g., RAC, quorum disks)

io1/io2 only, same AZ, up to 16 Nitro instances; use a cluster-aware filesystem.

Fast Snapshot Restore (FSR)

Instant full-speed restores from snapshots

Fast cutovers, DR drills, fleet rollouts

Billed while enabled per snapshot/AZ; enable only where RTO demands.

Snapshot Archive tier

Lower-cost storage for old snapshots

Long-term retention

Cheapest snapshot tier; retrieval takes longer and has a minimum duration.

Recycle Bin & Snapshot Lock

Guardrails against deletion & tamper

Compliance, ransomware resilience

Recycle Bin = time-based recovery; Snapshot Lock = governance/WORM.

🏛️ Attachment & deployment patterns

Pattern

When to use

Notes

Single-Attach

Most EC2 workloads

Default mode; pair with EBS-optimized instances for consistent bandwidth/latency.

Multi-Attach

Shared-disk cluster designs

io1/io2 only; same-AZ constraint; application must coordinate writes.

RAID0 (striping)

Very high throughput with SSDs

Prefer io2 Block Express when one big volume suffices; stripe only when needed and documented.

EBS is zonal—use snapshots (and/or AWS Backup) for cross-AZ/Region copy and recovery patterns.

🧠 EBS optimization strategy (FinOps + reliability)

Strategy

Actions

Tools/Notes

Migrate gp2 → gp3

Convert in place with Elastic Volumes; keep size, raise perf only as needed

Typical 15–20% $/GiB savings vs gp2; independent IOPS/throughput knobs.

Right-size IOPS/throughput

Match to observed p95/p99 + headroom (not worst day ever)

CloudWatch (IOPS, queue depth, throughput %) + AWS Compute Optimizer volume recs.

Tune by access pattern

Sequential → st1/sc1; random/latency → gp3/io2

Don’t boot from HDD; beware credit/burst models on HDD.

Use EBS-optimized instances

Ensure EC2’s EBS bandwidth isn’t the bottleneck

Many Nitro types include it; check docs before chasing volume limits.

Snapshot hygiene

Set lifecycle policies; archive or delete stale snaps

DLM or AWS Backup; tag and expire snapshots on schedule.

FSR only where needed

Enable for migrations/drills; disable after

Minimizes ongoing FSR metering.

Find & delete orphans

Remove unattached volumes and stale snapshots

Tag rigorously; use Resource Explorer/Config or scripts to list detached volumes.

Common bill-busters

Paying for PIOPS you don’t use (io1/io2) or extra gp3 IOPS/throughput with low utilization
Forgotten snapshots (and copied snapshots across Regions)
FSR left enabled after cutovers
Detached volumes with no owner tags

💸 Pricing model & gotchas

Volumes: billed per provisioned GiB-month.
- gp3 includes baseline performance; extra provisioned IOPS and throughput are billed separately.
- io1/io2 charge for provisioned IOPS (in addition to GiB).
Snapshots: incremental storage billed per GiB-month; Archive tier is cheaper with longer retrieval and a minimum duration.
FSR: metered per snapshot/AZ while enabled.
No Savings Plans/RIs for EBS: storage isn’t covered by compute Savings Plans; cost control = rightsizing + lifecycle.

Avoid embedding hard regional prices in docs; maintain a link to your Region’s pricing page and model with your real metrics.

⏱️ Automation patterns

Amazon Data Lifecycle Manager (DLM): automate snapshot/AMI create-retain-delete (incl. cross-Region/account copies).
AWS Backup: centralized policies, vault lock, cross-account protections, and compliance reporting.
EventBridge + Lambda: alert/auto-remediate when snapshots are public, volumes are unencrypted, or FSR is left on.

🔒 Security & compliance

Turn on EBS encryption by default; scope KMS keys by env/app; rotate and audit key usage.
Use Snapshot Lock (governance/compliance) and Backup Vault Lock for immutability.
Block public access for snapshots at the account/Region level; share snapshots explicitly and temporarily.
Enforce least privilege IAM on ec2:CreateVolume, CreateSnapshot, ModifyVolume, and KMS actions.

📊 Monitoring & tools

CloudWatch (EBS): VolumeReadOps/WriteOps, VolumeReadBytes/WriteBytes, VolumeThroughputPercentage, VolumeQueueLength, BurstBalance (gp2), latency metrics where available.
EC2 instance metrics: confirm you’re not saturating the instance’s EBS bandwidth/IOPS caps.
AWS Compute Optimizer: per-volume recommendations (type, size, IOPS/throughput).
Cost Explorer / CUR: tag volumes/snapshots; watch gp3 add-ons, snapshot growth, archive vs standard tiers.

🧪 Quick selection cheat-sheet

Default: gp3 for most workloads; raise IOPS/throughput only with evidence.
High IOPS / tight latency: io2 or io2 Block Express (prefer single large BE volume over many striped gp3 when feasible).
Big sequential reads/writes: st1; cold sequential: sc1.
Shared-disk clusters: io1/io2 with Multi-Attach and a cluster-aware filesystem.

✅ Checklist

Enable EBS encryption by default and define per-env KMS keys.
Standardize on gp3; migrate gp2→gp3.
Set snapshot lifecycle + archive for long-term retention.
Review IOPS/throughput monthly; align to p95/p99 + headroom.
Use EBS-optimized instances for I/O-heavy workloads.
Audit detached volumes and public/shared snapshots.
Keep FSR off by default; enable only for migrations/drills.
Tag everything: owner, app, env, cost-center; enforce via SCPs/Config.

References (fill with your org’s canonical links)

EBS pricing (volumes, PIOPS, snapshots, archive, FSR)
EBS volume types & limits; Multi-Attach docs
Elastic Volumes, DLM, AWS Backup, Snapshot Lock, Recycle Bin
CloudWatch metrics & EBS-optimized instances
Compute Optimizer for EBS; CUR field guide for EBS spend

Features and limits evolve. Validate in your Region before production changes.

PreviousAmazon Cloudfront NextAmazon Cloudwatch

Last updated 12 days ago