Heatmaps are one of the most effective ways to show patterns across many variables simultaneously — gene expression across samples, protein abundance across conditions, or correlation matrices. A poorly made heatmap is uninterpretable; a well-made one communicates complex biology immediately.
What a heatmap shows
A heatmap displays a matrix of values as a grid of colored cells, where color encodes magnitude. In biology, common uses include:
- RNA-seq — gene expression across samples or conditions
- Proteomics — protein abundance across groups
- Correlation matrices — pairwise correlations between variables
- Drug response — cell viability across drug × concentration grids
- ChIP-seq signal — chromatin accessibility across genomic regions
Step 1 — Prepare and normalize your data
Raw counts are rarely appropriate for heatmaps. Pre-processing:
- RNA-seq: Use variance-stabilized or rlog-transformed counts (DESeq2
vst()orrlog()), or TPM/RPKM for cross-sample comparison - Proteomics: Use log₂-transformed, median-normalized intensities
- Correlation matrices: Pearson or Spearman correlation coefficients (−1 to 1)
Z-score normalization by row (gene) is standard for expression heatmaps — it shows relative expression changes across samples, removing absolute expression level differences. Each gene's values are scaled to mean = 0, SD = 1. Reviewers expect row Z-scores unless you have a specific reason to use raw values.
Step 2 — Select genes or features to display
Showing all 20,000 genes in a heatmap is meaningless. Select a meaningful subset:
- Differentially expressed genes: Top 50–200 by FDR, filtered from your volcano plot analysis
- Gene sets: Specific pathway members or curated signatures
- Variable features: Top features by variance across samples
- Hand-curated lists: Specific genes of biological interest
Aim for 20–200 rows on a single heatmap. More than 200 rows without clustering resolution becomes uninterpretable.
Step 3 — Clustering
Hierarchical clustering organizes rows and columns by similarity, revealing patterns invisible in unordered matrices.
Standard parameters:
- Distance metric: Euclidean distance (for Z-scored expression data) or 1 − Pearson correlation
- Linkage: Complete or Ward's D2 — Ward's tends to produce more balanced clusters
- Cluster rows: Yes (genes cluster by expression pattern)
- Cluster columns: Yes by default, but sometimes column order (e.g., time points) should be preserved
Cluster the columns? If columns represent ordered conditions (time series, dose response), keep them in order and do not cluster columns. If columns are independent samples within groups, clustering can reveal batch effects and outliers.
Step 4 — Choose the right color palette
Color palette choice is critical and frequently done wrong:
For diverging data (Z-scores, correlations, L2FC): Use a diverging palette centered at zero: blue → white → red is the most common and readable. Options: RdBu, coolwarm, PuOr.
Avoid: Rainbow (jet) palettes — they are perceptually non-linear, create false visual boundaries, and are colorblind-unfriendly.
For sequential data (expression intensity, enrichment scores): Use a sequential palette: white/yellow → orange → red (Reds, YlOrRd) or viridis for colorblind safety.
Colorblind considerations: Red-green heatmaps (a legacy standard in genomics) are uninterpretable to ~8% of readers. Use blue-white-red or viridis instead.
Saturation limits: Set explicit min and max color limits (e.g., Z-score −2 to +2) and cap outliers. Without limits, a single extreme outlier can wash out all color variation.
Step 5 — Add annotations
Column and row annotations convey metadata visually:
- Column annotations: Sample group, treatment, batch, sex, time point — displayed as colored bars above the heatmap
- Row annotations: Gene cluster membership, pathway, chromosomal location — displayed as colored bars to the right
- Dendrogram: Show on rows and/or columns to indicate clustering
Annotation color palettes must have a legend. Avoid annotating with more than 4–5 categories per annotation bar — it becomes unreadable.
Step 6 — Final formatting
- Cell borders: Use thin borders (0.1–0.3 pt) only for small matrices (<50 rows); omit for large matrices
- Font size: Minimum 6 pt for row and column labels at print size; for >100 genes, omit row labels entirely
- Color scale bar: Include a labeled color scale (e.g., "Row Z-score")
- Figure size: At journal column width with cells large enough to read
Code examples
R (ComplexHeatmap — the standard for publication):
library(ComplexHeatmap)
library(circlize)
# Row-scale the data
mat_scaled <- t(scale(t(mat))) # Z-score by row
# Color function centered at 0
col_fun <- colorRamp2(c(-2, 0, 2), c("#2166AC", "white", "#B2182B"))
# Column annotation
col_annot <- HeatmapAnnotation(
Group = metadata$group,
col = list(Group = c(Control = "#888888", Treatment = "#E05252"))
)
Heatmap(mat_scaled,
col = col_fun,
top_annotation = col_annot,
clustering_method_rows = "ward.D2",
clustering_method_columns = "ward.D2",
show_row_names = FALSE,
name = "Row Z-score",
width = unit(85, "mm"), # single-column Cell width
height = unit(120, "mm")
)
Python (seaborn clustermap):
import seaborn as sns
# Z-score normalize
from scipy.stats import zscore
mat_z = mat.apply(zscore, axis=1)
g = sns.clustermap(mat_z,
cmap="RdBu_r", center=0, vmin=-2, vmax=2,
method="ward", metric="euclidean",
figsize=(3.35, 4.7), # single-column Cell width in inches
yticklabels=False,
cbar_kws={"label": "Row Z-score"})
g.savefig("heatmap.tiff", dpi=1200, bbox_inches="tight")
Common mistakes
- Not Z-scoring — showing raw counts makes all samples with high total counts look uniformly bright
- Using rainbow (jet) color palette — creates misleading visual boundaries
- No color scale bar — readers cannot interpret the colors
- Clustering time series columns — breaks the temporal order
- Showing row labels for >100 genes — unreadable at any font size
- Not setting color limits — outlier samples or genes wash out all variation
FAQ
Should I always Z-score my heatmap? Row Z-scoring is standard for expression heatmaps because it shows relative changes across samples. If you want to show absolute expression levels (e.g., to compare low vs highly expressed genes), do not Z-score but use a log-transformed sequential color scale. State clearly in the legend which normalization you used.
How many genes is too many for a heatmap? There is no hard limit, but >500 rows becomes uninterpretable unless clustering is very clear. For very large matrices, show a representative subset or use a summary statistic plot instead.
ComplexHeatmap or pheatmap — which should I use? ComplexHeatmap is more powerful and flexible, especially for complex annotations. pheatmap is simpler to learn. For publication figures, ComplexHeatmap is the standard in top journals.
What is the difference between Pearson distance and Euclidean distance for clustering? Euclidean distance groups genes by similar absolute values; Pearson (1 − correlation) groups genes by similar patterns regardless of magnitude. For Z-scored data, either works similarly. For non-normalized data, Pearson is often preferred.
Can I use FigureGuild to make heatmaps? Yes — FigureGuild's Graph Builder supports heatmaps from pasted data, with controls for clustering, color palette, Z-score normalization, and annotation. Export at journal-required DPI.