Amazon EBS
Amazon EBS (Elastic Block Store)
Amazon EBS is AWS’s block storage for EC2—fast, durable, and flexible—but it’s also where surprise costs creep in from over-provisioned IOPS/throughput and forgotten snapshots. This page blends Grok’s highlights with a pragmatic, FinOps-oriented playbook: what you’re using, what you’re paying for, what to change, and which native AWS tools help you do it quickly.
🚀 What is EBS?
Amazon Elastic Block Store (EBS) provides persistent block volumes for EC2 with single-digit-ms latency, online resize, and snapshots for backup/DR. Volumes are zonal (attach within the same AZ) and behave like raw block devices for filesystems, databases, and applications.
Key features
Provisioned performance: choose volume types, IOPS, and throughput to match workloads.
Snapshots: incremental, point-in-time backups; copy/share across accounts/Regions.
Encryption: integrate with AWS KMS; enable encryption by default per account/Region.
Elastic Volumes: modify size, type, and (where supported) IOPS/throughput online.
Broad regional availability; designed for high durability and availability.
⚙️ Volume Types — pick the right drive
gp3 (General Purpose SSD)
Boot volumes, app servers, most databases/dev/test
Cost-effective default; baseline 3,000 IOPS / 125 MiB/s; independently scale IOPS/throughput as needed. Great gp2→gp3 upgrade path.
io2 Block Express (PIOPS SSD)
Mission-critical, latency-sensitive DBs
Highest single-volume ceilings (e.g., up to 256k IOPS / 4,000 MiB/s); supports Multi-Attach; Nitro-based instances recommended.
io2 (PIOPS SSD)
High IOPS OLTP, consistent latency
Up to 64k IOPS / 1,000 MiB/s; durable PIOPS with Multi-Attach.
st1 (Throughput-optimized HDD)
Large, sequential I/O (ETL, big logs)
Up to ~500 MiB/s throughput; not for random I/O or boot volumes.
sc1 (Cold HDD)
Cold, infrequently accessed sequential data
Lowest $/GiB HDD; not for boot; suitable for cold scans and large archives.
Quick rule: Random/low-latency → SSD (gp3/io2); Large sequential → HDD (st1/sc1). Start with gp3 unless you have measured needs for PIOPS.
🧬 Performance & advanced features
Elastic Volumes
Online resize/type/perf changes
Routine rightsizing
No detach required for supported changes; common for gp2→gp3 or tuning gp3 IOPS/throughput.
Multi-Attach
One volume attached to multiple instances
Clustered apps (e.g., RAC, quorum disks)
io1/io2 only, same AZ, up to 16 Nitro instances; use a cluster-aware filesystem.
Fast Snapshot Restore (FSR)
Instant full-speed restores from snapshots
Fast cutovers, DR drills, fleet rollouts
Billed while enabled per snapshot/AZ; enable only where RTO demands.
Snapshot Archive tier
Lower-cost storage for old snapshots
Long-term retention
Cheapest snapshot tier; retrieval takes longer and has a minimum duration.
Recycle Bin & Snapshot Lock
Guardrails against deletion & tamper
Compliance, ransomware resilience
Recycle Bin = time-based recovery; Snapshot Lock = governance/WORM.
🏛️ Attachment & deployment patterns
Single-Attach
Most EC2 workloads
Default mode; pair with EBS-optimized instances for consistent bandwidth/latency.
Multi-Attach
Shared-disk cluster designs
io1/io2 only; same-AZ constraint; application must coordinate writes.
RAID0 (striping)
Very high throughput with SSDs
Prefer io2 Block Express when one big volume suffices; stripe only when needed and documented.
EBS is zonal—use snapshots (and/or AWS Backup) for cross-AZ/Region copy and recovery patterns.
🧠 EBS optimization strategy (FinOps + reliability)
Migrate gp2 → gp3
Convert in place with Elastic Volumes; keep size, raise perf only as needed
Typical 15–20% $/GiB savings vs gp2; independent IOPS/throughput knobs.
Right-size IOPS/throughput
Match to observed p95/p99 + headroom (not worst day ever)
CloudWatch (IOPS, queue depth, throughput %) + AWS Compute Optimizer volume recs.
Tune by access pattern
Sequential → st1/sc1
; random/latency → gp3/io2
Don’t boot from HDD; beware credit/burst models on HDD.
Use EBS-optimized instances
Ensure EC2’s EBS bandwidth isn’t the bottleneck
Many Nitro types include it; check docs before chasing volume limits.
Snapshot hygiene
Set lifecycle policies; archive or delete stale snaps
DLM or AWS Backup; tag and expire snapshots on schedule.
FSR only where needed
Enable for migrations/drills; disable after
Minimizes ongoing FSR metering.
Find & delete orphans
Remove unattached volumes and stale snapshots
Tag rigorously; use Resource Explorer/Config or scripts to list detached volumes.
Common bill-busters
Paying for PIOPS you don’t use (io1/io2) or extra gp3 IOPS/throughput with low utilization
Forgotten snapshots (and copied snapshots across Regions)
FSR left enabled after cutovers
Detached volumes with no owner tags
💸 Pricing model & gotchas
Volumes: billed per provisioned GiB-month.
gp3 includes baseline performance; extra provisioned IOPS and throughput are billed separately.
io1/io2 charge for provisioned IOPS (in addition to GiB).
Snapshots: incremental storage billed per GiB-month; Archive tier is cheaper with longer retrieval and a minimum duration.
FSR: metered per snapshot/AZ while enabled.
No Savings Plans/RIs for EBS: storage isn’t covered by compute Savings Plans; cost control = rightsizing + lifecycle.
Avoid embedding hard regional prices in docs; maintain a link to your Region’s pricing page and model with your real metrics.
⏱️ Automation patterns
Amazon Data Lifecycle Manager (DLM): automate snapshot/AMI create-retain-delete (incl. cross-Region/account copies).
AWS Backup: centralized policies, vault lock, cross-account protections, and compliance reporting.
EventBridge + Lambda: alert/auto-remediate when snapshots are public, volumes are unencrypted, or FSR is left on.
🔒 Security & compliance
Turn on EBS encryption by default; scope KMS keys by env/app; rotate and audit key usage.
Use Snapshot Lock (governance/compliance) and Backup Vault Lock for immutability.
Block public access for snapshots at the account/Region level; share snapshots explicitly and temporarily.
Enforce least privilege IAM on
ec2:CreateVolume
,CreateSnapshot
,ModifyVolume
, and KMS actions.
📊 Monitoring & tools
CloudWatch (EBS):
VolumeReadOps/WriteOps
,VolumeReadBytes/WriteBytes
,VolumeThroughputPercentage
,VolumeQueueLength
,BurstBalance
(gp2), latency metrics where available.EC2 instance metrics: confirm you’re not saturating the instance’s EBS bandwidth/IOPS caps.
AWS Compute Optimizer: per-volume recommendations (type, size, IOPS/throughput).
Cost Explorer / CUR: tag volumes/snapshots; watch gp3 add-ons, snapshot growth, archive vs standard tiers.
🧪 Quick selection cheat-sheet
Default:
gp3
for most workloads; raise IOPS/throughput only with evidence.High IOPS / tight latency:
io2
or io2 Block Express (prefer single large BE volume over many striped gp3 when feasible).Big sequential reads/writes:
st1
; cold sequential:sc1
.Shared-disk clusters:
io1/io2
with Multi-Attach and a cluster-aware filesystem.
✅ Checklist
References (fill with your org’s canonical links)
EBS pricing (volumes, PIOPS, snapshots, archive, FSR)
EBS volume types & limits; Multi-Attach docs
Elastic Volumes, DLM, AWS Backup, Snapshot Lock, Recycle Bin
CloudWatch metrics & EBS-optimized instances
Compute Optimizer for EBS; CUR field guide for EBS spend
Features and limits evolve. Validate in your Region before production changes.
Last updated