Amazon Elasticache
Amazon ElastiCache
Amazon ElastiCache is the managed in-memory caching service for Valkey, Redis OSS, and Memcached, delivering sub-millisecond latency to accelerate apps and offload databases. Watch for cost hot spots: node hours, cross-AZ data transfer, backup storage, Global Datastore replication egress, and (in Serverless) GB-hours + request/CPU usage.
🚀 What is ElastiCache?
ElastiCache makes it simple to run distributed, in-memory caches on AWS. It’s fully managed (patching, monitoring, backups), offers provisioned clusters and Serverless caches, integrates with VPC/IAM/CloudWatch, and supports Multi-AZ failover for Valkey/Redis. Serverless supports both Redis (Valkey/Redis OSS) and Memcached, so you can start without capacity planning.
Key features
Provisioned or Serverless deployment models.
Multi-AZ replication & automatic failover (Valkey/Redis).
Global Datastore for cross-Region read replicas/DR.
Data tiering on r6gd nodes (Valkey/Redis) to stretch memory with local SSD.
Supported engines (examples)
Valkey (current 8.x supported)
Redis OSS (v6+ widely used; plan upgrades off v4/v5)
Memcached (1.6+) — simple, ephemeral key/value
⚙️ Node families — pick the right cache
cache.t
(Burstable)
Dev/test, bursty or intermittent workloads
CPU credits; tiny/cheap labs and prototypes.
cache.m
(General purpose)
Balanced app caches
Good default for many web workloads; Graviton (m6g/m7g) = strong price/perf.
cache.r
(Memory optimized)
Large in-memory sets, high concurrency
Choose r6g/r7g for price/perf; r6gd enables data tiering to SSD.
Rule of thumb: Random, low-latency access → SSD families (
m
,r
). For huge working sets with cooler tails, preferr6gd
data tiering or sharding.
🧬 Generations & engine notes
Graviton (g/d)
ARM
Valkey/Redis, Memcached
Generally better price/perf; validate client libs for ARM images.
x86 (m5/r5, etc.)
Intel/AMD
Valkey/Redis, Memcached
Broad compatibility; use when specific dependencies require x86.
Valkey 8.x
—
Valkey
Newer Valkey releases add performance & memory-efficiency gains—plan upgrades to benefit.
🏛️ Deployment options
Serverless
Spiky/unknown demand; minimize ops
Pay for data stored (GB-hours) and request/CPU usage; no node sizing; supports Redis/Valkey and Memcached.
Provisioned (cluster mode disabled)
Simple primaries with replicas
Easiest to run; scale vertically; enable Multi-AZ for HA.
Provisioned (cluster mode enabled)
Large datasets / high throughput
Shard across multiple nodes; many shards with replicas per shard.
Global Datastore
Cross-Region reads / DR
Managed async replication for Valkey/Redis; plan for egress and failover between Regions.
🧠 ElastiCache optimization strategy (FinOps + reliability)
Quick wins
Co-locate by AZ (clients ↔ nodes) to avoid cross-AZ data transfer; align ASGs with node AZs.
Turn on Multi-AZ for production Valkey/Redis and test failover regularly.
Prefer Graviton node families (r6g/r7g/m6g/m7g) for better price/perf.
Shape the architecture
Start small: minimal node size/shard count; scale out only when hit ratios or latency require it.
For large, semi-hot datasets, enable data tiering (r6gd)—keep hot keys in RAM, cold keys spill to SSD. Validate tail-latency SLOs.
Use reader endpoints for read scaling; reserve writes for primaries.
Control memory & CPU
Set TTLs everywhere; choose an eviction policy (
volatile-*
for true caches).Track Evictions, CacheHitRate, CurrConnections, CPUUtilization, EngineCPUUtilization, ReplicationLag; right-size before you miss SLOs.
Trim oversized values and hot keys; compress at the client if payloads are large.
Right-size purchasing
Reserved nodes for steady baselines (1/3-yr; size-flexible); keep spiky or experimental workloads on On-Demand or Serverless.
💸 Cost levers (where the money goes)
Node hours (provisioned)
Per-node hourly billing
Right-size; prefer Graviton; shard only when metrics warrant.
Serverless usage
GB-hours stored + request/CPU usage
Keep keys small; batch operations; avoid cross-Region chatter.
Cross-AZ transfer
EC2↔ElastiCache traffic across AZs is billed
Pin clients to the same AZ as their node; minimize rebalances.
Global Datastore egress
Cross-Region replication out of primary
Use only when latency/DR require; measure write volume carefully.
Backups
Snapshot GB-months
Set retention; export to S3 with lifecycle rules.
Data tiering
r6gd nodes, RAM+SSD
Works best when ≤~20% of data is hot; validate P95/P99 before rollout.
🧩 Engines — when to pick which
Valkey / Redis OSS
Sessions, counters, leaderboards, queues, token stores
Multi-AZ failover, cluster mode, persistence options, Global Datastore
Snapshots ≠ database durability; if you need DB-grade durability, consider MemoryDB.
Memcached
Ephemeral, simple KV, massive fan-out
Multithreaded, client-side sharding, easy scale-out
No Multi-AZ failover or snapshots; treat as cache only.
🏗️ Practical patterns
Session & token stores / rate limiting: Valkey/Redis, Multi-AZ, short TTLs.
Read-through / write-around cache in front of RDS/ES: Memcached or Redis; start small and measure hit ratio.
Large, semi-hot datasets: Valkey/Redis on r6gd with data tiering; confirm latency SLOs under mixed hits/misses.
Cross-Region read fan-out: Global Datastore; plan egress and failover runbooks.
Don’t want capacity management: Serverless; keep an eye on request/CPU usage.
🔒 Security & compliance
In-transit encryption (TLS) and at-rest encryption (always on for Serverless); manage keys with KMS.
AUTH/ACLs for Valkey/Redis; SGs for network boundaries; no public endpoints.
Audit snapshot access and parameter changes; keep backups in dedicated, locked S3 buckets.
📊 Monitoring & tools
CloudWatch:
CPUUtilization
,EngineCPUUtilization
(Valkey/Redis),FreeableMemory
,Evictions
,GetTypeCmds/SetTypeCmds
,CurrConnections
,ReplicationLag
,NetworkBytesIn/Out
.Backups: snapshot windows/retention; export to S3 and set lifecycle.
Cost Explorer/CUR: track node hours, GB-months of snapshots, cross-AZ bytes, replication egress.
✅ Checklist
References
Serverless (Redis/Valkey & Memcached), getting started
Data tiering (Valkey/Redis on r6gd) — how & when
Global Datastore (cross-Region replication) & failover caveats
Valkey 8.x support and improvements
Redis v4/v5 upgrade guidance
Features and limits evolve. Always confirm specifics for your Region in AWS docs/pricing before rollout.
Last updated