# Amazon Elasticache

## Amazon ElastiCache

Amazon ElastiCache is the managed in-memory caching service for **Valkey**, **Redis OSS**, and **Memcached**, delivering sub-millisecond latency to accelerate apps and offload databases. Watch for cost hot spots: **node hours**, **cross-AZ data transfer**, **backup storage**, **Global Datastore replication egress**, and (in Serverless) **GB-hours + request/CPU usage**.

***

### 🚀 What is ElastiCache?

ElastiCache makes it simple to run distributed, in-memory caches on AWS. It’s fully managed (patching, monitoring, backups), offers **provisioned** clusters and **Serverless** caches, integrates with VPC/IAM/CloudWatch, and supports **Multi-AZ** failover for Valkey/Redis. **Serverless** supports both **Redis (Valkey/Redis OSS)** and **Memcached**, so you can start without capacity planning.

**Key features**

* **Provisioned or Serverless** deployment models.
* **Multi-AZ** replication & automatic failover (Valkey/Redis).
* **Global Datastore** for cross-Region read replicas/DR.
* **Data tiering** on r6gd nodes (Valkey/Redis) to stretch memory with local SSD.

**Supported engines (examples)**

* **Valkey** (current 8.x supported)
* **Redis OSS** (v6+ widely used; plan upgrades off v4/v5)
* **Memcached** (1.6+) — simple, ephemeral key/value

***

### ⚙️ Node families — pick the right cache

| Family                           | Best for                                   | Notes                                                                        |
| -------------------------------- | ------------------------------------------ | ---------------------------------------------------------------------------- |
| **`cache.t` (Burstable)**        | Dev/test, bursty or intermittent workloads | CPU credits; tiny/cheap labs and prototypes.                                 |
| **`cache.m` (General purpose)**  | Balanced app caches                        | Good default for many web workloads; Graviton (m6g/m7g) = strong price/perf. |
| **`cache.r` (Memory optimized)** | Large in-memory sets, high concurrency     | Choose **r6g/r7g** for price/perf; **r6gd** enables **data tiering** to SSD. |

> Rule of thumb: **Random, low-latency** access → SSD families (`m`, `r`). For huge working sets with cooler tails, prefer **`r6gd` data tiering** or sharding.

***

### 🧬 Generations & engine notes

| Generation            | Arch      | Engines                 | Guidance                                                                                  |
| --------------------- | --------- | ----------------------- | ----------------------------------------------------------------------------------------- |
| **Graviton (g/d)**    | ARM       | Valkey/Redis, Memcached | Generally better price/perf; validate client libs for ARM images.                         |
| **x86 (m5/r5, etc.)** | Intel/AMD | Valkey/Redis, Memcached | Broad compatibility; use when specific dependencies require x86.                          |
| **Valkey 8.x**        | —         | Valkey                  | Newer Valkey releases add performance & memory-efficiency gains—plan upgrades to benefit. |

***

### 🏛️ Deployment options

| Option                                  | When to use                        | Notes                                                                                                                      |
| --------------------------------------- | ---------------------------------- | -------------------------------------------------------------------------------------------------------------------------- |
| **Serverless**                          | Spiky/unknown demand; minimize ops | Pay for **data stored (GB-hours)** and **request/CPU usage**; no node sizing; supports **Redis/Valkey** and **Memcached**. |
| **Provisioned (cluster mode disabled)** | Simple primaries with replicas     | Easiest to run; scale vertically; enable Multi-AZ for HA.                                                                  |
| **Provisioned (cluster mode enabled)**  | Large datasets / high throughput   | Shard across multiple nodes; many shards with replicas per shard.                                                          |
| **Global Datastore**                    | Cross-Region reads / DR            | Managed async replication for Valkey/Redis; plan for egress and failover between Regions.                                  |

***

### 🧠 ElastiCache optimization strategy (FinOps + reliability)

**Quick wins**

* **Co-locate by AZ** (clients ↔ nodes) to avoid cross-AZ data transfer; align ASGs with node AZs.
* Turn on **Multi-AZ** for production Valkey/Redis and **test failover** regularly.
* Prefer **Graviton** node families (r6g/r7g/m6g/m7g) for better price/perf.

**Shape the architecture**

* Start small: minimal node size/shard count; **scale out** only when hit ratios or latency require it.
* For large, semi-hot datasets, enable **data tiering (r6gd)**—keep hot keys in RAM, cold keys spill to SSD. Validate tail-latency SLOs.
* Use **reader endpoints** for read scaling; reserve writes for primaries.

**Control memory & CPU**

* Set **TTLs** everywhere; choose an eviction policy (`volatile-*` for true caches).
* Track **Evictions, CacheHitRate, CurrConnections, CPUUtilization, EngineCPUUtilization, ReplicationLag**; right-size before you miss SLOs.
* Trim oversized values and hot keys; compress at the client if payloads are large.

**Right-size purchasing**

* **Reserved nodes** for steady baselines (1/3-yr; size-flexible); keep spiky or experimental workloads on **On-Demand** or **Serverless**.

***

### 💸 Cost levers (where the money goes)

| Area                         | Why it costs                                 | How to keep it sane                                                     |
| ---------------------------- | -------------------------------------------- | ----------------------------------------------------------------------- |
| **Node hours (provisioned)** | Per-node hourly billing                      | Right-size; prefer Graviton; shard only when metrics warrant.           |
| **Serverless usage**         | **GB-hours** stored + **request/CPU usage**  | Keep keys small; batch operations; avoid cross-Region chatter.          |
| **Cross-AZ transfer**        | EC2↔ElastiCache traffic across AZs is billed | Pin clients to the **same AZ** as their node; minimize rebalances.      |
| **Global Datastore egress**  | Cross-Region replication out of primary      | Use only when latency/DR require; measure write volume carefully.       |
| **Backups**                  | Snapshot GB-months                           | Set retention; export to S3 with lifecycle rules.                       |
| **Data tiering**             | r6gd nodes, RAM+SSD                          | Works best when ≤\~20% of data is hot; validate P95/P99 before rollout. |

***

### 🧩 Engines — when to pick which

| Engine                 | Best for                                               | Highlights                                                                 | Watch-outs                                                                           |
| ---------------------- | ------------------------------------------------------ | -------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
| **Valkey / Redis OSS** | Sessions, counters, leaderboards, queues, token stores | Multi-AZ failover, cluster mode, persistence options, **Global Datastore** | Snapshots ≠ database durability; if you need DB-grade durability, consider MemoryDB. |
| **Memcached**          | Ephemeral, simple KV, massive fan-out                  | Multithreaded, client-side sharding, easy scale-out                        | No Multi-AZ failover or snapshots; treat as **cache only**.                          |

***

### 🏗️ Practical patterns

* **Session & token stores / rate limiting:** Valkey/Redis, Multi-AZ, short TTLs.
* **Read-through / write-around cache in front of RDS/ES:** Memcached or Redis; start small and measure hit ratio.
* **Large, semi-hot datasets:** Valkey/Redis on **r6gd** with data tiering; confirm latency SLOs under mixed hits/misses.
* **Cross-Region read fan-out:** **Global Datastore**; plan egress and failover runbooks.
* **Don’t want capacity management:** **Serverless**; keep an eye on request/CPU usage.

***

### 🔒 Security & compliance

* **In-transit encryption (TLS)** and **at-rest encryption** (always on for Serverless); manage keys with KMS.
* **AUTH/ACLs** for Valkey/Redis; SGs for network boundaries; no public endpoints.
* Audit snapshot access and parameter changes; keep backups in dedicated, locked S3 buckets.

***

### 📊 Monitoring & tools

* **CloudWatch**: `CPUUtilization`, `EngineCPUUtilization` (Valkey/Redis), `FreeableMemory`, `Evictions`, `GetTypeCmds/SetTypeCmds`, `CurrConnections`, `ReplicationLag`, `NetworkBytesIn/Out`.
* **Backups**: snapshot windows/retention; export to S3 and set lifecycle.
* **Cost Explorer/CUR**: track node hours, GB-months of snapshots, cross-AZ bytes, replication egress.

***

### ✅ Checklist

* [ ] Choose engine: **Valkey/Redis** (features) vs **Memcached** (simple, ephemeral).
* [ ] For prod Valkey/Redis: **Multi-AZ** enabled & failover tested.
* [ ] Keep clients and nodes **in the same AZ** to curb transfer costs.
* [ ] Start with minimal size/shards; scale by **hit ratio/latency** evidence.
* [ ] Set **TTLs**; monitor **Evictions** & **CacheHitRate**; prune oversized keys.
* [ ] Consider **r6gd data tiering** for cooler tails; validate P95/P99.
* [ ] Use **Reserved nodes** for steady baselines; **Serverless** for spiky/unknown.
* [ ] Encrypt in transit/at rest; lock down KMS policies & SGs.
* [ ] Review monthly: node hours, cross-AZ traffic, snapshot growth, replication egress.

***

### References

* Serverless (Redis/Valkey & Memcached), getting started
* Data tiering (Valkey/Redis on r6gd) — how & when
* Global Datastore (cross-Region replication) & failover caveats
* Valkey 8.x support and improvements
* Redis v4/v5 upgrade guidance

> *Features and limits evolve. Always confirm specifics for your Region in AWS docs/pricing before rollout.*
