Amazon Redshift

Amazon Redshift

Amazon Redshift is AWS’s cloud data warehousing workhorse—distributed SQL analytics at (multi-)petabyte scale with tight AWS integrations. It’s also easy to overspend without discipline—compute left running, managed storage growth, Spectrum scans on raw files, and inefficient queries can snowball.

→ What you’re using → What you’re paying → What you should be doing → AWS-native tools to make it happen.


🚀 What is Redshift?

Amazon Redshift is a fully managed, columnar, MPP (massively parallel processing) data warehouse. It runs ANSI SQL, plugs into mainstream BI/ETL tools, and supports both provisioned clusters and Serverless (auto-scaling) workgroups.

Highlights

  • Workload isolation: multiple warehouses/workgroups & data sharing to separate ELT, BI, and data science.

  • Zero-ETL & native connectors: ingest from Aurora/RDS, DynamoDB, Kinesis/MSK streams, S3, and more.

  • Open table/query: query open formats in S3 via Spectrum; integrate with Lake Formation.

  • Performance features: AQUA (hardware-accelerated caching), materialized views, result cache, SQA/Concurrency Scaling.

  • ML & GenAI hooks: Redshift ML (SageMaker under the hood) and integrations for LLM workloads.

  • Serverless: no cluster sizing; pay for actual compute seconds and managed storage.


⚙️ Node types — pick the right hammer

Choose based on how independently you need to scale compute vs storage, and whether your dataset is small & hot or large & growing.

Family
Best for
Notes

RA3 (e.g., ra3.xlplus, ra3.4xlarge, ra3.16xlarge)

Most new deployments; variable or growing storage

Redshift Managed Storage (RMS) separates storage from compute (SSD + S3 hierarchy). Elastic resize; AQUA eligible. Storage billed separately from nodes.

DC2 (e.g., dc2.large, dc2.8xlarge)

Small, compute-bound warehouses with <~TB-scale compressed data

Local NVMe SSD; fixed storage per node. Good for tight, hot datasets—otherwise prefer RA3.

New build? Start with RA3 unless you have a strong reason not to.


🧬 Deployment & topology

Mode
Architecture
Use when
Notes

Provisioned (cluster)

Leader node + compute nodes

Steady/known workloads, fine-grained control

Elastic resize (classic/elastic) to scale; pause/resume to save cost off-hours.

Serverless

No nodes to manage; per-second compute

Spiky/unknown demand, many small teams

Auto-scales; isolate by workgroup; set base capacity & cost controls.

Single-node

Leader+compute combined

Dev/test, POCs

Not for prod.

Multi-node

1 leader + N compute

Production

Distribution/Sort Keys still matter (even with ATO).

Use data sharing to expose curated schemas to other accounts/workgroups without copying.


🧠 Redshift optimization strategy (FinOps + performance)

Quick wins

  • Pause non-prod clusters nightly/weekends; for Serverless, set concurrency/cost limits.

  • Right-size compute: on provisioned, prefer elastic resize over permanent upsizing; on Serverless, tune base capacity.

  • Separate workloads: isolate ELT from BI with data sharing or dedicated workgroups to avoid noisy neighbors.

Tables & storage

  • Let Automatic Table Optimization (ATO) pick distribution & sort keys and compression; override only when profile proves better.

  • Regularly VACUUM/ANALYZE (or verify auto-maintenance is doing its job), especially after large deletes/copies.

  • Use materialized views for heavy joins/aggregations; enable auto-refresh on incremental loads.

  • Prune managed storage: drop/partition cold data; unload snapshots to S3 (Parquet) and query via Spectrum.

Spectrum & lake

  • Store S3 data as columnar (Parquet/ORC) with partitioning and compression; avoid tiny files; push down predicates.

  • Control scans with proper WHERE clauses and partition filters; use Lake Formation for centralized permissions.

Concurrency & query control

  • Enable Short Query Acceleration (SQA) for sub-second BI queries.

  • Use Workload Management (WLM) & Query Monitoring Rules (QMR) to throttle outliers and protect SLAs.

  • Watch the result cache hit rate; design dashboards to re-use results.

Governance

  • Track warehouse usage by tags (team/app/env); alert on cost/throughput spikes via Budgets + CloudWatch.


💸 Purchase model optimization

Model
Use when
How to save

On-Demand (provisioned)

Variable but mostly steady

Combine pause/resume with elastic resize.

Reserved

Predictable baselines (months+ of steady usage)

1- or 3-year commitments can significantly reduce compute cost; keep a fraction on On-Demand for bursts.

Serverless

Spiky, unpredictable, many small tenants

Pay per compute second & RMS; set workgroup limits and budgets.

Concurrency Scaling

Short peaks

You accrue free credits daily; after that it bills per-second—monitor usage.

Spectrum

Query S3 data directly

Charged by data scanned—use Parquet/ORC + partitions to cut scan costs dramatically.

Avoid hard-coding prices in docs; model with AWS Pricing Calculator and keep Region-specific sheets.


🧩 Cluster configurations & migration tips

  • DC2 → RA3: migrate when storage growth or mixed hot/cold data makes local SSD a constraint. Re-test distribution keys; enable AQUA where available.

  • Single → multi-node: for HA and parallelism; mind DISTKEY/SORTKEY and skew.

  • Provisioned → Serverless: great for labs or multi-team self-service; set guardrails first.

  • Multiple warehouses: use data sharing to fan out read-only access across teams without duplicating data.


📊 Additional optimization tips

  • ETL/ELT: prefer COPY from staged Parquet in S3; batch writes; avoid single-row INSERT loops.

  • Compression & encoding: keep column encodings fresh (ATO/ANALYZE COMPRESSION).

  • Skew busting: check SVV_TABLE_INFO, SVL_QUERY_SUMMARY for distribution & query skew.

  • Result set limits: cap BI exports/SELECT * patterns; paginate.

  • Cost hygiene: auto-purge orphaned snapshots, unload rarely used tables to S3, and retire stale materialized views.


🔒 Security & compliance

  • Encryption: at rest with KMS; in transit with SSL.

  • Network: VPC-only; Enhanced VPC Routing for deterministic egress paths.

  • Access control: IAM federation + Redshift RBAC; row-level and column-level security; data masking for sensitive columns.

  • Audit: system tables & STL/SVL logs; export to S3/CloudWatch; integrate with Lake Formation for lake-wide governance.


📈 Monitoring & tools

  • Amazon Redshift Console: performance insights, Advisor, auto-tuning status.

  • CloudWatch metrics/alarms: CPU %, WLM queue length, query runtime, read/write IOPS, Spectrum bytes scanned.

  • System views: STL_QUERY, SVL_QLOG, SVV_TABLE_INFO, SVL_S3QUERY_SUMMARY, SVL_STATEMENTTEXT.

  • Cost Explorer / CUR: tag by team/workload; watch RMS GB-months, Spectrum TB scanned, Concurrency Scaling seconds.

  • AWS Budgets: caps/alerts per cluster or workgroup.

  • dbt/Glue: model governance; auto-docs, tests, and lineage.


🧪 Practical selection cheat-sheet

  • Small, hot dataset; steady BIDC2 (if it truly fits) or small RA3; materialized views + SQA.

  • Growing lake + warehouseRA3 + Spectrum; store raw in Parquet; data share curated marts.

  • Spiky analytics / many teamsServerless with workgroup guardrails and budgets.

  • Heavy joins & wide scans → Consider AQUA, sort keys, and MVs; verify distribution style and skew.


✅ Checklist


  • Redshift pricing & calculator (provisioned, Serverless, RMS, Spectrum)

  • Best practices: distribution/sort keys, ATO, VACUUM/ANALYZE, materialized views

  • Spectrum & Lake Formation setup (partitions, Parquet/ORC, governance)

  • Concurrency Scaling, SQA, WLM & QMR runbooks

  • Redshift Advisor & performance dashboards; dbt project template

Features and limits evolve—always validate in the AWS console/docs for your Region before rollout.

Last updated