← All methods
Data QC

Data Quality Control (Missingness, MAF, HWE)

Catch bad markers and bad samples before they corrupt downstream analysis.

How it works

Every genomics pipeline lives or dies by QC. We compute per-marker missingness, minor allele frequency (MAF), and Hardy–Weinberg equilibrium p-values, plus per-sample missingness and heterozygosity. We flag outliers and apply user-configurable filters before any GWAS or genomic-selection run.

Formula

MAF = min(p, 1−p). HWE χ² = Σ (observed − expected)² / expected, with expected from Hardy–Weinberg proportions.

What you get

  • Per-marker MAF, missingness, and HWE p-value distributions
  • Per-sample missingness and heterozygosity outliers
  • Filtered marker and sample lists

When to use it

  • On every new genotype dataset, immediately after upload
  • Before running GWAS, GS, or population-structure analyses
  • When troubleshooting unexpected results from downstream modules

References

Run Data QC on your data

Open the module and upload a CSV.

Open module