Amazon Elasticache

Amazon ElastiCache

Amazon ElastiCache is the managed in-memory caching service for Valkey, Redis OSS, and Memcached, delivering sub-millisecond latency to accelerate apps and offload databases. Watch for cost hot spots: node hours, cross-AZ data transfer, backup storage, Global Datastore replication egress, and (in Serverless) GB-hours + request/CPU usage.


🚀 What is ElastiCache?

ElastiCache makes it simple to run distributed, in-memory caches on AWS. It’s fully managed (patching, monitoring, backups), offers provisioned clusters and Serverless caches, integrates with VPC/IAM/CloudWatch, and supports Multi-AZ failover for Valkey/Redis. Serverless supports both Redis (Valkey/Redis OSS) and Memcached, so you can start without capacity planning.

Key features

  • Provisioned or Serverless deployment models.

  • Multi-AZ replication & automatic failover (Valkey/Redis).

  • Global Datastore for cross-Region read replicas/DR.

  • Data tiering on r6gd nodes (Valkey/Redis) to stretch memory with local SSD.

Supported engines (examples)

  • Valkey (current 8.x supported)

  • Redis OSS (v6+ widely used; plan upgrades off v4/v5)

  • Memcached (1.6+) — simple, ephemeral key/value


⚙️ Node families — pick the right cache

Family
Best for
Notes

cache.t (Burstable)

Dev/test, bursty or intermittent workloads

CPU credits; tiny/cheap labs and prototypes.

cache.m (General purpose)

Balanced app caches

Good default for many web workloads; Graviton (m6g/m7g) = strong price/perf.

cache.r (Memory optimized)

Large in-memory sets, high concurrency

Choose r6g/r7g for price/perf; r6gd enables data tiering to SSD.

Rule of thumb: Random, low-latency access → SSD families (m, r). For huge working sets with cooler tails, prefer r6gd data tiering or sharding.


🧬 Generations & engine notes

Generation
Arch
Engines
Guidance

Graviton (g/d)

ARM

Valkey/Redis, Memcached

Generally better price/perf; validate client libs for ARM images.

x86 (m5/r5, etc.)

Intel/AMD

Valkey/Redis, Memcached

Broad compatibility; use when specific dependencies require x86.

Valkey 8.x

Valkey

Newer Valkey releases add performance & memory-efficiency gains—plan upgrades to benefit.


🏛️ Deployment options

Option
When to use
Notes

Serverless

Spiky/unknown demand; minimize ops

Pay for data stored (GB-hours) and request/CPU usage; no node sizing; supports Redis/Valkey and Memcached.

Provisioned (cluster mode disabled)

Simple primaries with replicas

Easiest to run; scale vertically; enable Multi-AZ for HA.

Provisioned (cluster mode enabled)

Large datasets / high throughput

Shard across multiple nodes; many shards with replicas per shard.

Global Datastore

Cross-Region reads / DR

Managed async replication for Valkey/Redis; plan for egress and failover between Regions.


🧠 ElastiCache optimization strategy (FinOps + reliability)

Quick wins

  • Co-locate by AZ (clients ↔ nodes) to avoid cross-AZ data transfer; align ASGs with node AZs.

  • Turn on Multi-AZ for production Valkey/Redis and test failover regularly.

  • Prefer Graviton node families (r6g/r7g/m6g/m7g) for better price/perf.

Shape the architecture

  • Start small: minimal node size/shard count; scale out only when hit ratios or latency require it.

  • For large, semi-hot datasets, enable data tiering (r6gd)—keep hot keys in RAM, cold keys spill to SSD. Validate tail-latency SLOs.

  • Use reader endpoints for read scaling; reserve writes for primaries.

Control memory & CPU

  • Set TTLs everywhere; choose an eviction policy (volatile-* for true caches).

  • Track Evictions, CacheHitRate, CurrConnections, CPUUtilization, EngineCPUUtilization, ReplicationLag; right-size before you miss SLOs.

  • Trim oversized values and hot keys; compress at the client if payloads are large.

Right-size purchasing

  • Reserved nodes for steady baselines (1/3-yr; size-flexible); keep spiky or experimental workloads on On-Demand or Serverless.


💸 Cost levers (where the money goes)

Area
Why it costs
How to keep it sane

Node hours (provisioned)

Per-node hourly billing

Right-size; prefer Graviton; shard only when metrics warrant.

Serverless usage

GB-hours stored + request/CPU usage

Keep keys small; batch operations; avoid cross-Region chatter.

Cross-AZ transfer

EC2↔ElastiCache traffic across AZs is billed

Pin clients to the same AZ as their node; minimize rebalances.

Global Datastore egress

Cross-Region replication out of primary

Use only when latency/DR require; measure write volume carefully.

Backups

Snapshot GB-months

Set retention; export to S3 with lifecycle rules.

Data tiering

r6gd nodes, RAM+SSD

Works best when ≤~20% of data is hot; validate P95/P99 before rollout.


🧩 Engines — when to pick which

Engine
Best for
Highlights
Watch-outs

Valkey / Redis OSS

Sessions, counters, leaderboards, queues, token stores

Multi-AZ failover, cluster mode, persistence options, Global Datastore

Snapshots ≠ database durability; if you need DB-grade durability, consider MemoryDB.

Memcached

Ephemeral, simple KV, massive fan-out

Multithreaded, client-side sharding, easy scale-out

No Multi-AZ failover or snapshots; treat as cache only.


🏗️ Practical patterns

  • Session & token stores / rate limiting: Valkey/Redis, Multi-AZ, short TTLs.

  • Read-through / write-around cache in front of RDS/ES: Memcached or Redis; start small and measure hit ratio.

  • Large, semi-hot datasets: Valkey/Redis on r6gd with data tiering; confirm latency SLOs under mixed hits/misses.

  • Cross-Region read fan-out: Global Datastore; plan egress and failover runbooks.

  • Don’t want capacity management: Serverless; keep an eye on request/CPU usage.


🔒 Security & compliance

  • In-transit encryption (TLS) and at-rest encryption (always on for Serverless); manage keys with KMS.

  • AUTH/ACLs for Valkey/Redis; SGs for network boundaries; no public endpoints.

  • Audit snapshot access and parameter changes; keep backups in dedicated, locked S3 buckets.


📊 Monitoring & tools

  • CloudWatch: CPUUtilization, EngineCPUUtilization (Valkey/Redis), FreeableMemory, Evictions, GetTypeCmds/SetTypeCmds, CurrConnections, ReplicationLag, NetworkBytesIn/Out.

  • Backups: snapshot windows/retention; export to S3 and set lifecycle.

  • Cost Explorer/CUR: track node hours, GB-months of snapshots, cross-AZ bytes, replication egress.


✅ Checklist


References

  • Serverless (Redis/Valkey & Memcached), getting started

  • Data tiering (Valkey/Redis on r6gd) — how & when

  • Global Datastore (cross-Region replication) & failover caveats

  • Valkey 8.x support and improvements

  • Redis v4/v5 upgrade guidance

Features and limits evolve. Always confirm specifics for your Region in AWS docs/pricing before rollout.

Last updated