Skip to main content
ARS Home » Northeast Area » Beltsville, Maryland (BARC) » Beltsville Agricultural Research Center » Animal Genomics and Improvement Laboratory » AIP » Software » FINDHAP

findhap.f90 Find haplotypes and impute genotypes using multiple chip sets and sequence data

Downloads Version 4 program, example files, and executable
(beta version — not quite ready for routine use on U.S. chip data, but performs better than version 3 for sequence data)
  • Example data files for imputation study presented by VanRaden and Sun at the 2014 World Congress on Genetics Applied to Livestock Production
  • Files include actual pedigree, simulated true genotypes, simulated sequence reads, and imputed genotypes
  • This example used 500 reference bulls sequenced at 4× with 1% error and containing high-density SNPs; the 250 young bulls used to test imputation had only high-density SNPs
  • Other examples in the study can be generated by setting other options for programs findhap4, geno2seq, and genosim
Version 3 program, example files, and executable

Version 2 program, example files, and executable
(not maintained)

Inputs genotypes.txt Format: animal# chip# #SNPs genotypes
Sort by animal#, genotype codes are 0,1,2, and 5 = missing
For fixed length input, set chip# to 1 and missing genotypes to 5
For variable length input, #SNPs and order must match chromosome.data
chromosome.data List of all SNPs used and which SNPs are on each chip
Sort by chromosome number and position within chromosome
X-specific chromosome last, after pseudo-autosomal "chromosome"
Y-specific SNPs not supported yet 
pedigree.file Format: sex  animal#  sire#  dam#  birthdate  animal ID  animal name
Sort in ascending birth date order
findhap.options Program control file with user-defined options
sequences.readdepth
(version 4 only)
Format: animal#  chip#  #SNPs
Read counts for A and B alleles stored in 1-byte hexadecimal format

Outputs hap.list List of all haplotypes found in each segment0
hap.found Each animal's paternal and maternal haplotypes (2 lines/animal)
hap.inherit Tracks inheritance and crossovers for each parental chromosome
hap.filled Summarizes imputation quality for each animal
cross.overs Lists exact location of all detected crossovers
allele.frequency Estimated allele frequencies and missing rates for each SNP
genotypes.filled Imputed genotypes with codes: 0 = BB, 1 = AB, 2 = AA, 3 = B_, 4 = A_, 5 = __
Number of animals output may exceed input because of imputed dams
Remaining missing alleles in codes 3, 4, and 5 can be set using allele frequencies
haplotypes.txt Imputed haplotypes: SNP1 paternal maternal, SNP2 pat mat, etc., for each animal
No missing alleles, allowing genotypes to be formed simply as (pat + mat - 2)

Version differences 4 vs. 3 Can input numbers of A and B allele reads from sequence data
Increased memory and CPU because of likelihood ratio calculations
3 vs. 2 Computing time reduced by using priors or imputing only new animals
Files hap.list and hap.found output multiple lengths to use as priors
Options file includes damout, listout, and errate for outputting imputed parents, outputting all steps or only the final step, and allowing error within haplotypes
Option genout can output only best call (0,1,2) or just missing (0,1,2,5) in genotypes.filled
2 vs. 1 Options file uses maxlen, minlen, and steps to divide long segment into shorter segments
Computing time increases by number of steps used to get from maxlen to minlen
Population and pedigree haplotyping in one loop vs. 2 separate loops
Searches for great-grandparent haplotypes, not just genotyped parents and grandparents
Higher accuracy and/or fewer high-density genotypes required

References 2015 VanRaden, P.M., C. Sun, and J.R. O'Connell. Fast imputation using medium- or low-coverage sequence data. BMC Genet. 16:82.
2014 VanRaden, P.M., and C. Sun. Fast imputation using medium- or low-coverage sequence data. Proc. 10th World Congr. Genet. Appl. Livest. Prod., 179.
2013 VanRaden, P.M., D.J. Null, M. Sargolzaei, G.R. Wiggans, M.E. Tooker, J.B. Cole, T.S. Sonstegard, E.E. Connor, M. Winters, J.B.C.H.M. van Kaam, A. Valenti, B.J. Van Doormaal, M.A. Faust, and G.A. Doak. Genomic imputation and evaluation using high-density Holstein genotypes. J. Dairy Sci. 96:668–678.
2011 VanRaden, P.M., J.R. O'Connell, G.R. Wiggans, and K.A. Weigel. Genomic evaluations with many more genotypes. Genet. Sel. Evol. 43:10.
2010 VanRaden, P.M. Genomic evaluations with many more genotypes and phenotypes. Proc. 9th World Congr. Genet. Appl. Livest. Prod., Leipzig, Germany, Aug. 1–6, Comm. 27.

VanRaden, P.M., J.R. O'Connell, G.R. Wiggans, and K.A. Weigel. Combining different marker densities in genomic evaluation. Interbull Bull. 42:113–118.

License Fortran package findhap.f90 is public domain and was developed with U.S. taxpayer funding. Accurate results are not guaranteed. Please report any bugs to paul.vanraden@usda.gov. You may modify, improve, use, and redistribute the code to anyone for any purpose. Or, you can ask Paul to make changes that could benefit U.S. evaluations and other users.

 Paul VanRaden
 Animal Genomics and Improvement Laboratory
 Agricultural Research Service, USDA