> For the complete documentation index, see [llms.txt](https://aws.cloudshim.com/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://aws.cloudshim.com/aws-top-services/amazon-cloudwatch.md).

# Amazon Cloudwatch

## Amazon CloudWatch

Amazon CloudWatch is AWS’s observability backbone—metrics, logs, traces, alarms, dashboards, Synthetics, and RUM in one place. It’s powerful, but costs can spike fast from **custom-metric cardinality**, **log ingestion & Logs Insights scans**, **per-alarm charges**, and “nice-to-have” features left on forever. This page blends Grok’s highlights with a pragmatic, FinOps-oriented playbook.

***

### 🚀 What is CloudWatch?

CloudWatch collects and visualizes **metrics**, **logs**, and **traces**; alerts via **alarms**; powers **dashboards**; runs **Synthetics canaries** and **Real User Monitoring (RUM)**; and integrates with **EventBridge** for automation. Use it to detect issues, trigger actions, and build SLO dashboards across AWS, hybrid, and on-prem.

**Core building blocks**

* **Metrics** — native service metrics + **custom** metrics you publish.
* **Logs** — centralized log ingestion, storage, Live Tail, and **Logs Insights** (SQL-like queries).
* **Alarms** — threshold & anomaly detection; composite alarms to reduce noise.
* **Dashboards** — service/team KPIs and executive views.
* **Traces** — **X-Ray/ServiceLens** for distributed tracing.
* **Synthetics** — headless/browser/API checks; synthetic journeys.
* **RUM** — client-side performance & UX telemetry for web apps.
* **Application Signals / Application Insights** — faster app observability setup (auto metrics/SLOs; legacy app monitors).

***

### 🔗 Quicklinks (bookmark these)

* **Pricing** for Metrics, Logs, Alarms, Synthetics, RUM, Traces
* **Logs cost levers**: log classes, retention, Insights scanning
* **Real-time metrics/logs** & **Metric Streams**
* **Cross-account observability** (centralized viewing)
* **Data protection for logs** (PII detection/masking)

*(Keep org-specific links here to your runbooks, dashboards, and cost guardrails.)*

***

### ⚙️ Components — pick the right one

| Component                | Use cases                                    | Notes                                                                              |
| ------------------------ | -------------------------------------------- | ---------------------------------------------------------------------------------- |
| **Metrics**              | Infra KPIs (CPU, mem, net), app KPIs         | Billing is per metric/time series; watch **dimensions** (cardinality).             |
| **Logs**                 | App/server/platform logs, access/error/debug | Pay to **ingest** (uncompressed), **store** (compressed), and **scan** (Insights). |
| **Alarms**               | Paging, autoscaling, remediation triggers    | Anomaly detection consumes multiple internal series; use where it pays off.        |
| **Dashboards**           | Team & exec views                            | Priced per dashboard beyond a small free allowance.                                |
| **Traces (X-Ray)**       | Microservices & dependency analysis          | Sampling controls cost; first cross-account trace copy may be included.            |
| **Synthetics**           | API/browser canaries                         | Bill per run + supporting Lambda/Logs/Metrics.                                     |
| **RUM**                  | Web UX telemetry                             | Bill per event; sample to control volume.                                          |
| **Contributor Insights** | Top-N patterns from logs                     | Bill per matched event; great for hot keys/actors.                                 |
| **Metric Streams**       | Push metrics to Firehose/partners            | Bill per metric update; good when `GetMetricData` polling is heavy.                |

***

### 🗂️ Logs classes & retention

| Choice                     | Best for                      | Key behaviors                                                               |
| -------------------------- | ----------------------------- | --------------------------------------------------------------------------- |
| **Standard**               | Operational logs you alert on | Full features (metric filters, alarming, Live Tail, data protection).       |
| **Infrequent Access (IA)** | Keep-but-rarely-use archives  | Lower ingest price with feature trade-offs (no Live Tail/filters/alarming). |

**Retention tips**\
Set per-group retention (e.g., 7–30 days for noisy app logs, longer for audit). Export very long-term history to **S3** and query with **Athena** to avoid high Insights scan costs.

***

### 🧬 Resolution & retention knobs

| Knob                           | What it does                                   | Practical guidance                                                         |
| ------------------------------ | ---------------------------------------------- | -------------------------------------------------------------------------- |
| **Metric resolution**          | Basic (5-min), Detailed (1-min), High-res (1s) | Use 1-min for key resources; reserve 1-s for spiky SLOs and short windows. |
| **Logs retention**             | 1 day → infinite                               | Shorten non-prod; keep prod tight; archive to S3 if needed.                |
| **Trace sampling & retention** | Control % sampled & days kept                  | Start low (e.g., 5–10%), raise on incident or critical paths.              |

***

### 🏛️ Integrations & ingestion

| Option                               | When to use                            | Notes                                                                                        |
| ------------------------------------ | -------------------------------------- | -------------------------------------------------------------------------------------------- |
| **CloudWatch Agent**                 | EC2/On-prem metrics & logs             | Unified agent; supports **EMF** (Embedded Metric Format) for low-cardinality custom metrics. |
| **ADOT/OpenTelemetry**               | Standardized metrics/traces            | Use for polyglot microservices; export to CW + partner backends.                             |
| **PutMetricData / EMF**              | App-emitted KPIs                       | Batch & aggregate to cap cardinality; avoid request-ID dimensions.                           |
| **Logs subscription filters**        | Stream logs to Kinesis/Firehose/Lambda | Offload analytics or real-time processing; mind downstream costs.                            |
| **EventBridge (formerly CW Events)** | Event-driven automation                | Schedules, rules, cross-service triggers for auto-remediation.                               |

**Cross-account observability** lets you view many accounts/Regions from a single “monitoring” account without duplicating data.

***

### 🧠 CloudWatch FinOps playbook

#### Metrics (cardinality killers)

* Design **dimensions** intentionally (service, endpoint, status class) — never user/request IDs.
* Prefer **metric math** & **percentiles** over emitting many near-duplicate series.
* Downsample where 1-min is enough; avoid 1-s except where it’s proven necessary.
* Consider **Metric Streams** if polling (`GetMetricData`) is heavy/expensive.

#### Logs (volume + scan)

* Set **retention per log group**; don’t keep debug forever.
* Use **IA class** for keep-but-rarely-use streams; keep alert-worthy streams in **Standard**.
* Partition groups by **app/env/Region** so Insights scans stay small; always scope time windows & fields.
* Drop noise at the **agent** (filters/sampling) before ingestion where safe.
* Export archives to **S3** and query with Athena for long-term analytics.

#### Alarms & analytics

* Use **composite alarms** to reduce pages; gate noisy series behind a single “service health” alarm.
* Reserve **anomaly detection** for seasonally noisy metrics where static thresholds fail.
* Scope **real-time logs** and **Live Tail** to bursts; don’t leave on by default.

#### Org layout

* Centralize views with **cross-account observability**; standardize dashboards & alarms via IaC.
* Track spend in **Cost Explorer/CUR** by namespace/log group; add **Budgets** alerts for ingestion and Insights scans.

***

### 💸 Pricing model & common gotchas

* **Metrics**: pay per **custom metric/time series** (dimensions explode cost); API requests bill beyond free allowances.
* **Logs**: pay to **ingest** (uncompressed), **store** (compressed), and **scan** (Insights). IA class lowers ingest price but removes some features.
* **Alarms**: per alarm; **anomaly detection** meters multiple internal series.
* **Synthetics/RUM**: per run/per event; sample deliberately.
* **Vended logs credits**: some services credit part of log delivery (reduces Logs charges).
* **Regional & tiered pricing** varies — always model with your Region’s pricing page or AWS Pricing Calculator.

> **Rule of thumb**: Don’t hard-code prices in docs. Keep links to pricing and your internal calculator/runbooks.

***

### ⏱️ Automation patterns

* **Retention-as-code**: set default retention per account/OU; shorten non-prod.
* **Lifecycle to IA**: move cold groups to **Infrequent Access**; expire fast-churn logs quickly.
* **EventBridge + Lambda**: auto-remediate when ingestion spikes, when new high-cardinality dimensions appear, or when logs go unencrypted.
* **Pipelines**: auto-create dashboards/alarms from **tags**; version them with Terraform/CloudFormation/CDK.

***

### 🔒 Security & compliance

* **Encryption**: Logs are encrypted at rest; use **KMS CMKs** per env/app where policy requires.
* **Data protection for logs**: targeted PII detection/masking (priced per GB scanned) — enable only on the groups that need it.
* **Least privilege**: scope IAM on `PutMetricData`, `PutLogEvents`, `GetMetricData`, `StartQuery`, and KMS actions.
* **Private access**: use VPC endpoints for private ingestion and queries.

***

### 📊 Monitoring & tools

* **CloudWatch Metrics & Alarms** — golden signals (latency, errors, traffic, saturation).
* **Dashboards** — SLOs & cost owner views; keep to essentials to control dashboard charges.
* **Logs Insights** — ad-hoc queries; always narrow time & fields to cut GB scanned.
* **ServiceLens/X-Ray** — dependency maps & traces for incident drill-downs.
* **Cost Explorer/CUR + Budgets** — monthly review of metric counts, log ingest vs retention, Insights scans, Synthetics/RUM volume.

***

### 🧪 Practical selection cheat-sheet

* **Infra basics**: native service metrics + a handful of custom metrics → standard alarms + a team dashboard.
* **Heavy logs**: keep prod/alerted streams **Standard**; move bulk debug to **IA**; 7–30d retention; archive to **S3**.
* **API SLOs**: **Application Signals** + anomaly alarms where needed; 1-min metrics, 1-s only for critical hot spots.
* **User experience**: **RUM** (sampled) + a few **Synthetics** on checkout/login/search.
* **Multi-account**: turn on **cross-account observability**; central dashboards/alarms; enforce guardrails via SCP/Config.

***

### ✅ Checklist

* [ ] Define **metric naming & dimensions** (avoid high cardinality).
* [ ] Set **log retention** defaults; route cold groups to **IA**; export archives to **S3**.
* [ ] Budget **Logs Insights** scans (time-bounded queries).
* [ ] Use **composite** / **anomaly** alarms selectively.
* [ ] Encrypt logs with **KMS** where required; use VPC endpoints.
* [ ] Centralize via **cross-account observability**.
* [ ] Review monthly: metric counts, log ingest/storage, Insights scans, canary/RUM volume.

***

### References (fill with your org’s canonical links)

* CloudWatch pricing & AWS Pricing Calculator
* Logs classes, retention, and Logs Insights best practices
* Cross-account observability & ServiceLens/X-Ray
* Data protection for logs
* Metric Streams, Application Signals/Insights
* Internal runbooks: dimension standards, retention defaults, cost guardrails

> *Features & prices evolve. Validate in your Region before production changes.*


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://aws.cloudshim.com/aws-top-services/amazon-cloudwatch.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
