Amazon Redshift

Amazon Redshift is AWS’s cloud data warehousing workhorse—distributed SQL analytics at (multi-)petabyte scale with tight AWS integrations. It’s also easy to overspend without discipline—compute left running, managed storage growth, Spectrum scans on raw files, and inefficient queries can snowball.

→ What you’re using → What you’re paying → What you should be doing → AWS-native tools to make it happen.

🚀 What is Redshift?

Amazon Redshift is a fully managed, columnar, MPP (massively parallel processing) data warehouse. It runs ANSI SQL, plugs into mainstream BI/ETL tools, and supports both provisioned clusters and Serverless (auto-scaling) workgroups.

Highlights

Workload isolation: multiple warehouses/workgroups & data sharing to separate ELT, BI, and data science.
Zero-ETL & native connectors: ingest from Aurora/RDS, DynamoDB, Kinesis/MSK streams, S3, and more.
Open table/query: query open formats in S3 via Spectrum; integrate with Lake Formation.
Performance features: AQUA (hardware-accelerated caching), materialized views, result cache, SQA/Concurrency Scaling.
ML & GenAI hooks: Redshift ML (SageMaker under the hood) and integrations for LLM workloads.
Serverless: no cluster sizing; pay for actual compute seconds and managed storage.

⚙️ Node types — pick the right hammer

Choose based on how independently you need to scale compute vs storage, and whether your dataset is small & hot or large & growing.

Family

Best for

Notes

RA3 (e.g., ra3.xlplus, ra3.4xlarge, ra3.16xlarge)

Most new deployments; variable or growing storage

Redshift Managed Storage (RMS) separates storage from compute (SSD + S3 hierarchy). Elastic resize; AQUA eligible. Storage billed separately from nodes.

DC2 (e.g., dc2.large, dc2.8xlarge)

Small, compute-bound warehouses with <~TB-scale compressed data

Local NVMe SSD; fixed storage per node. Good for tight, hot datasets—otherwise prefer RA3.

New build? Start with RA3 unless you have a strong reason not to.

🧬 Deployment & topology

Mode

Architecture

Use when

Notes

Provisioned (cluster)

Leader node + compute nodes

Steady/known workloads, fine-grained control

Elastic resize (classic/elastic) to scale; pause/resume to save cost off-hours.

Serverless

No nodes to manage; per-second compute

Spiky/unknown demand, many small teams

Auto-scales; isolate by workgroup; set base capacity & cost controls.

Single-node

Leader+compute combined

Dev/test, POCs

Not for prod.

Multi-node

1 leader + N compute

Production

Distribution/Sort Keys still matter (even with ATO).

Use data sharing to expose curated schemas to other accounts/workgroups without copying.

🧠 Redshift optimization strategy (FinOps + performance)

Quick wins

Pause non-prod clusters nightly/weekends; for Serverless, set concurrency/cost limits.
Right-size compute: on provisioned, prefer elastic resize over permanent upsizing; on Serverless, tune base capacity.
Separate workloads: isolate ELT from BI with data sharing or dedicated workgroups to avoid noisy neighbors.

Tables & storage

Let Automatic Table Optimization (ATO) pick distribution & sort keys and compression; override only when profile proves better.
Regularly VACUUM/ANALYZE (or verify auto-maintenance is doing its job), especially after large deletes/copies.
Use materialized views for heavy joins/aggregations; enable auto-refresh on incremental loads.
Prune managed storage: drop/partition cold data; unload snapshots to S3 (Parquet) and query via Spectrum.

Spectrum & lake

Store S3 data as columnar (Parquet/ORC) with partitioning and compression; avoid tiny files; push down predicates.
Control scans with proper WHERE clauses and partition filters; use Lake Formation for centralized permissions.

Concurrency & query control

Enable Short Query Acceleration (SQA) for sub-second BI queries.
Use Workload Management (WLM) & Query Monitoring Rules (QMR) to throttle outliers and protect SLAs.
Watch the result cache hit rate; design dashboards to re-use results.

Governance

Track warehouse usage by tags (team/app/env); alert on cost/throughput spikes via Budgets + CloudWatch.

💸 Purchase model optimization

Model

Use when

How to save

On-Demand (provisioned)

Variable but mostly steady

Combine pause/resume with elastic resize.

Reserved

Predictable baselines (months+ of steady usage)

1- or 3-year commitments can significantly reduce compute cost; keep a fraction on On-Demand for bursts.

Serverless

Spiky, unpredictable, many small tenants

Pay per compute second & RMS; set workgroup limits and budgets.

Concurrency Scaling

Short peaks

You accrue free credits daily; after that it bills per-second—monitor usage.

Spectrum

Query S3 data directly

Charged by data scanned—use Parquet/ORC + partitions to cut scan costs dramatically.

Avoid hard-coding prices in docs; model with AWS Pricing Calculator and keep Region-specific sheets.

🧩 Cluster configurations & migration tips

DC2 → RA3: migrate when storage growth or mixed hot/cold data makes local SSD a constraint. Re-test distribution keys; enable AQUA where available.
Single → multi-node: for HA and parallelism; mind DISTKEY/SORTKEY and skew.
Provisioned → Serverless: great for labs or multi-team self-service; set guardrails first.
Multiple warehouses: use data sharing to fan out read-only access across teams without duplicating data.

📊 Additional optimization tips

ETL/ELT: prefer COPY from staged Parquet in S3; batch writes; avoid single-row INSERT loops.
Compression & encoding: keep column encodings fresh (ATO/ANALYZE COMPRESSION).
Skew busting: check SVV_TABLE_INFO, SVL_QUERY_SUMMARY for distribution & query skew.
Result set limits: cap BI exports/SELECT * patterns; paginate.
Cost hygiene: auto-purge orphaned snapshots, unload rarely used tables to S3, and retire stale materialized views.

🔒 Security & compliance

Encryption: at rest with KMS; in transit with SSL.
Network: VPC-only; Enhanced VPC Routing for deterministic egress paths.
Access control: IAM federation + Redshift RBAC; row-level and column-level security; data masking for sensitive columns.
Audit: system tables & STL/SVL logs; export to S3/CloudWatch; integrate with Lake Formation for lake-wide governance.

📈 Monitoring & tools

Amazon Redshift Console: performance insights, Advisor, auto-tuning status.
CloudWatch metrics/alarms: CPU %, WLM queue length, query runtime, read/write IOPS, Spectrum bytes scanned.
System views: STL_QUERY, SVL_QLOG, SVV_TABLE_INFO, SVL_S3QUERY_SUMMARY, SVL_STATEMENTTEXT.
Cost Explorer / CUR: tag by team/workload; watch RMS GB-months, Spectrum TB scanned, Concurrency Scaling seconds.
AWS Budgets: caps/alerts per cluster or workgroup.
dbt/Glue: model governance; auto-docs, tests, and lineage.

🧪 Practical selection cheat-sheet

Small, hot dataset; steady BI → DC2 (if it truly fits) or small RA3; materialized views + SQA.
Growing lake + warehouse → RA3 + Spectrum; store raw in Parquet; data share curated marts.
Spiky analytics / many teams → Serverless with workgroup guardrails and budgets.
Heavy joins & wide scans → Consider AQUA, sort keys, and MVs; verify distribution style and skew.

✅ Checklist

Start on RA3 (or Serverless); only pick DC2 for tiny, fixed, compute-bound datasets.
Turn on pause/resume (provisioned) or workgroup limits (Serverless).
Use Parquet + partitions in S3; keep files moderately sized (128–512 MB).
Let ATO manage dist/sort/encoding; override only with evidence.
Enable SQA, set WLM queues, and add QMR for runaway queries.
Track RMS growth, Spectrum scan bytes, Concurrency Scaling seconds, and snapshot counts.
Share data via data sharing, not copies.
Tag everything (team/app/env) and wire Budgets alerts.

References (add your org’s canonical links)

Redshift pricing & calculator (provisioned, Serverless, RMS, Spectrum)
Best practices: distribution/sort keys, ATO, VACUUM/ANALYZE, materialized views
Spectrum & Lake Formation setup (partitions, Parquet/ORC, governance)
Concurrency Scaling, SQA, WLM & QMR runbooks
Redshift Advisor & performance dashboards; dbt project template

Features and limits evolve—always validate in the AWS console/docs for your Region before rollout.

PreviousAmazon Elasticache

Last updated 11 days ago