How to Make a PCA Plot for Scientific Papers
A complete guide to creating publication-quality PCA plots. From principal component analysis to journal-ready export.
What is a scientific PCA plot?
A PCA plot (Principal Component Analysis plot) is a scatter plot that visualizes high-dimensional data in two dimensions. It shows the relationships between samples based on their features, with samples that are similar clustering together. PCA plots are standard for quality control, exploratory analysis, and group comparison in genomics, proteomics, and metabolomics.
Key requirements:
- • X-axis: PC1 with variance explained (%)
- • Y-axis: PC2 with variance explained (%)
- • Points colored by group or condition
- • Legend clearly identifying groups
- • Confidence ellipses (optional, 95%)
- • Journal width (single: 84–90 mm, double: 170–183 mm)
- • 300 DPI minimum for raster export
Step-by-Step Guide
- Prepare your data
Organize data with samples as rows and features (genes, proteins, metabolites) as columns. Normalize and scale data before PCA.
- Run PCA
Compute principal components using SVD or eigen decomposition. Extract PC1 and PC2 scores for each sample.
- Plot samples
X-axis: PC1 score. Y-axis: PC2 score. Each point is a sample. Color by group (treatment, condition, time).
- Label axes
Label axes with PC name and variance explained (e.g., "PC1 (45%)"). This tells readers how much variation is captured.
- Add ellipses
If comparing groups, add 95% confidence ellipses to show separation. State the confidence level in the legend.
- Export for publication
Set width to journal column or double column. Export at 300 DPI. Ensure labels are readable. Use colorblind-friendly palette.
Interpreting a PCA Plot
- ✓Clustering — Samples that cluster together are similar in their feature profiles.
- ✓Separation — Clear separation between groups suggests the groups are different in their features.
- ✓Outliers — Samples far from their group may be outliers or low quality. Investigate before analysis.
- ✓Variance explained — PC1 + PC2 should ideally capture >60% of total variance for a meaningful 2D view.
Frequently Asked Questions
How do you make a PCA plot for a scientific paper?
To make a PCA plot: (1) run principal component analysis on your data matrix, (2) plot PC1 (x-axis) vs PC2 (y-axis), (3) label axes with variance explained (%), (4) color points by group (treatment, condition, time), (5) add sample labels if needed, (6) include a legend, (7) add confidence ellipses if showing group separation, (8) export at 300 DPI at journal width.
What is a PCA plot used for?
A PCA plot is used to visualize high-dimensional data in two dimensions. It shows the relationships between samples based on their features (genes, proteins, metabolites). Samples that cluster together are similar; samples that are far apart are different. PCA plots are standard for quality control and exploratory analysis in omics studies.
What do PC1 and PC2 mean?
PC1 (Principal Component 1) is the direction of maximum variance in the data. PC2 is the direction of second-highest variance, orthogonal to PC1. The percentage on each axis label shows how much total variance that component explains. If PC1 explains 60% and PC2 explains 20%, the plot captures 80% of the data’s variation.
What is the best tool for making PCA plots?
FigureGuild is ideal for publication-ready PCA plots. It auto-computes PCA, labels variance explained, and applies journal formatting. R (prcomp + ggplot2) and Python (sklearn + matplotlib) are also used but require coding. Most omics pipelines generate PCA plots but with basic formatting.
Should PCA plots show confidence ellipses?
Confidence ellipses help visualize group separation. They show the region where 95% of samples in a group are expected to fall. Ellipses are useful when comparing treatments or conditions. However, they can be misleading with small sample sizes. Always state the confidence level in the legend.
How do you interpret a PCA plot?
Samples that cluster together are similar in their feature profiles. Samples that are far apart are different. The axes show the principal components that capture the most variance. If groups separate clearly along PC1, the main difference between groups is captured by the largest source of variation in the data.
Related Pages
Create PCA Plots Automatically
FigureGuild auto-computes PCA, labels variance explained, and exports at journal dimensions — all from your raw data.
Try FigureGuild Free