Heatmaps are one of the most effective ways to show patterns across many variables simultaneously — gene expression across samples, protein abundance across conditions, or correlation matrices. A poorly made heatmap is uninterpretable; a well-made one communicates complex biology immediately.

What a heatmap shows

A heatmap displays a matrix of values as a grid of colored cells, where color encodes magnitude. In biology, common uses include:

RNA-seq — gene expression across samples or conditions
Proteomics — protein abundance across groups
Correlation matrices — pairwise correlations between variables
Drug response — cell viability across drug × concentration grids
ChIP-seq signal — chromatin accessibility across genomic regions

Step 1 — Prepare and normalize your data

Raw counts are rarely appropriate for heatmaps. Pre-processing:

RNA-seq: Use variance-stabilized or rlog-transformed counts (DESeq2 vst() or rlog()), or TPM/RPKM for cross-sample comparison
Proteomics: Use log₂-transformed, median-normalized intensities
Correlation matrices: Pearson or Spearman correlation coefficients (−1 to 1)

Z-score normalization by row (gene) is standard for expression heatmaps — it shows relative expression changes across samples, removing absolute expression level differences. Each gene's values are scaled to mean = 0, SD = 1. Reviewers expect row Z-scores unless you have a specific reason to use raw values.

Step 2 — Select genes or features to display

Showing all 20,000 genes in a heatmap is meaningless. Select a meaningful subset:

Differentially expressed genes: Top 50–200 by FDR, filtered from your volcano plot analysis
Gene sets: Specific pathway members or curated signatures
Variable features: Top features by variance across samples
Hand-curated lists: Specific genes of biological interest

Aim for 20–200 rows on a single heatmap. More than 200 rows without clustering resolution becomes uninterpretable.

Step 3 — Clustering

Hierarchical clustering organizes rows and columns by similarity, revealing patterns invisible in unordered matrices.

Standard parameters:

Distance metric: Euclidean distance (for Z-scored expression data) or 1 − Pearson correlation
Linkage: Complete or Ward's D2 — Ward's tends to produce more balanced clusters
Cluster rows: Yes (genes cluster by expression pattern)
Cluster columns: Yes by default, but sometimes column order (e.g., time points) should be preserved

Cluster the columns? If columns represent ordered conditions (time series, dose response), keep them in order and do not cluster columns. If columns are independent samples within groups, clustering can reveal batch effects and outliers.

Step 4 — Choose the right color palette

Color palette choice is critical and frequently done wrong:

For diverging data (Z-scores, correlations, L2FC): Use a diverging palette centered at zero: blue → white → red is the most common and readable. Options: RdBu, coolwarm, PuOr.

Avoid: Rainbow (jet) palettes — they are perceptually non-linear, create false visual boundaries, and are colorblind-unfriendly.

For sequential data (expression intensity, enrichment scores): Use a sequential palette: white/yellow → orange → red (Reds, YlOrRd) or viridis for colorblind safety.

Colorblind considerations: Red-green heatmaps (a legacy standard in genomics) are uninterpretable to ~8% of readers. Use blue-white-red or viridis instead.

Saturation limits: Set explicit min and max color limits (e.g., Z-score −2 to +2) and cap outliers. Without limits, a single extreme outlier can wash out all color variation.

Step 5 — Add annotations

Column and row annotations convey metadata visually:

Column annotations: Sample group, treatment, batch, sex, time point — displayed as colored bars above the heatmap
Row annotations: Gene cluster membership, pathway, chromosomal location — displayed as colored bars to the right
Dendrogram: Show on rows and/or columns to indicate clustering

Annotation color palettes must have a legend. Avoid annotating with more than 4–5 categories per annotation bar — it becomes unreadable.

Step 6 — Final formatting

Cell borders: Use thin borders (0.1–0.3 pt) only for small matrices (<50 rows); omit for large matrices
Font size: Minimum 6 pt for row and column labels at print size; for >100 genes, omit row labels entirely
Color scale bar: Include a labeled color scale (e.g., "Row Z-score")
Figure size: At journal column width with cells large enough to read

Code examples

R (ComplexHeatmap — the standard for publication):

library(ComplexHeatmap)
library(circlize)

# Row-scale the data
mat_scaled <- t(scale(t(mat)))  # Z-score by row

# Color function centered at 0
col_fun <- colorRamp2(c(-2, 0, 2), c("#2166AC", "white", "#B2182B"))

# Column annotation
col_annot <- HeatmapAnnotation(
  Group = metadata$group,
  col = list(Group = c(Control = "#888888", Treatment = "#E05252"))
)

Heatmap(mat_scaled,
  col = col_fun,
  top_annotation = col_annot,
  clustering_method_rows = "ward.D2",
  clustering_method_columns = "ward.D2",
  show_row_names = FALSE,
  name = "Row Z-score",
  width = unit(85, "mm"),   # single-column Cell width
  height = unit(120, "mm")
)

Python (seaborn clustermap):

import seaborn as sns

# Z-score normalize
from scipy.stats import zscore
mat_z = mat.apply(zscore, axis=1)

g = sns.clustermap(mat_z,
    cmap="RdBu_r", center=0, vmin=-2, vmax=2,
    method="ward", metric="euclidean",
    figsize=(3.35, 4.7),  # single-column Cell width in inches
    yticklabels=False,
    cbar_kws={"label": "Row Z-score"})

g.savefig("heatmap.tiff", dpi=1200, bbox_inches="tight")

Common mistakes

Not Z-scoring — showing raw counts makes all samples with high total counts look uniformly bright
Using rainbow (jet) color palette — creates misleading visual boundaries
No color scale bar — readers cannot interpret the colors
Clustering time series columns — breaks the temporal order
Showing row labels for >100 genes — unreadable at any font size
Not setting color limits — outlier samples or genes wash out all variation

FAQ

Should I always Z-score my heatmap? Row Z-scoring is standard for expression heatmaps because it shows relative changes across samples. If you want to show absolute expression levels (e.g., to compare low vs highly expressed genes), do not Z-score but use a log-transformed sequential color scale. State clearly in the legend which normalization you used.

How many genes is too many for a heatmap? There is no hard limit, but >500 rows becomes uninterpretable unless clustering is very clear. For very large matrices, show a representative subset or use a summary statistic plot instead.

ComplexHeatmap or pheatmap — which should I use? ComplexHeatmap is more powerful and flexible, especially for complex annotations. pheatmap is simpler to learn. For publication figures, ComplexHeatmap is the standard in top journals.

What is the difference between Pearson distance and Euclidean distance for clustering? Euclidean distance groups genes by similar absolute values; Pearson (1 − correlation) groups genes by similar patterns regardless of magnitude. For Z-scored data, either works similarly. For non-normalized data, Pearson is often preferred.

Can I use FigureGuild to make heatmaps? Yes — FigureGuild's Graph Builder supports heatmaps from pasted data, with controls for clustering, color palette, Z-score normalization, and annotation. Export at journal-required DPI.

Build your heatmap in FigureGuild →