Pedigree Simulator

Pedigree Simulator is an improved version of a tool originally developed by Staples et al. (2014) to generate simulated pedigrees for benchmarking PRIMUS in its original publication. This tool also uses code from the IBDsims program developed by Morrison (2013). We developed this version to introduce compute optimizations for use in a variety of contexts; specifically, for benchmarking COMPADRE, a tool that unifies PRIMUS, ERSA, and PADRE.

Installation

First, click the green Code button at the top of this page and select a cloning option.

Dependency and reference data installation takes place using Docker, which must first be installed and launched on your machine.

Navigate into the project directory cloned from GitHub:

cd pedigree-simulator

Build the Docker image:

docker build -t pedigreesim .

Note: The build process will take between 10-20 minutes due to the size of the reference data being downloaded (approx. 40 gigabytes after being unzipped).

Execution

After building the Docker image, enter the container using docker run. During this step, you should also set your local volume mount (for writing the output), specified by the -v flag.

Note: Make sure to provide the absolute path to your local output folder in this flag. Even if you run the docker run command from inside the top level of the repository folder, you must use ./output instead of just output.

docker run -v \
    /your/path/to/pedigree-simulator/output:/usr/src/output \
    -it --entrypoint /bin/bash pedigreesim:latest

Once inside the container, you can run the tool from the command line:

perl main.pl 100 uniform3 20 EUR parallel

Arguments

The main.pl script takes several positional arguments. The following descriptions use the above command as an example.

Required:

100: The simulation "number", or the unique identifier for the output folder/files
uniform3: The simulation "type." Currently, the script supports uniform3, uniform2, and halfsib3. The key distinction here is that halfsib3 offers half-sibling relationships in the pedigree. The trailing number represents the average number of offspring per node in the pedigree.
20: The number of individuals in the pedigree.
EUR: The 1000 Genomes superpopulation from which founder genotypes are drawn. Currently, the script supports EUR (European) and AMR (Admixed American) superpopulation seeding.

Optional:

parallel: Enables parallel processing of the genotype adding step with 22 threads (one per chromosome). This is much faster but very RAM intensive and not recommended outside of HPC/server environments. The genotype adding step will run using a single thread if this argument is not used.

Notes

This tool generates a full pedigree as well as incrementally missing versions (up to 20% of all pedigree nodes). For example, a size 20 pedigree output will contain versions with up to 4 nodes missing. This is an artifact of the code left over from our developement done in line with the COMPADRE benchmarking, where we evaluated pedigree reconstruction success as pedigrees became more sparse. If you want to change the maximum % of samples removed in this incremental process, please update the global $missing_denominator variable in line 45 of src/main.pl before building the Docker image. The default value of 5 divides the total pedigree size by 5, removing 1/5th (20%) of all nodes by the last incrementally missing version of the pedigree. If you want more missingness than 20%, consider decreasing the value to 4 or 2, and if you want more, increase it.

IBD segment generation

Tools like COMPADRE utilize shared IBD segments alongside typical genotype data. We generated IBD segments in our benchmarking of COMPADRE by first phasing the simulated output VCF files with SHAPEIT5 (using unique BioVU haplotypes as a reference panel), then performing segment detection with GERMLINE2. We provided a basic script to highlight the command structure we used in our own benchmarking in the tools/ folder. Note that this script expects a .env file with several executable paths, such as for SHAPEIT, GERMLINE, PLINK2, Python 3, as well as a genetic map file folder path.

Another important note about IBD segment simulation: if you are using tools like GERMLINE2 that expect phased input data, you might want to use a different fileset [than the provided 1000 Genomes set] as a haplotype reference to avoid reference overlap. For example, in our benchmarking of COMPADRE, we used phased, population-matched BioVU data as the haplotype reference.

An alternative option is to perform phase-free segment detection with tools like IBIS. This is a useful option if you are trying to evaluate tools like Bonsai that expect unphased data.

Questions?

Please email contact AT compadre DOT dev with the subject line "Pedigree Simulator Help" or submit an issue report/pull request on GitHub.

If you use this tool in your research, please cite the following:

Evans, G. F., Baker, J. T., Petty, L. E., ... & Below, J. E. (2025). COMPADRE: 
Combined pedigree-aware distant relatedness estimation for improved pedigree reconstruction. 
The American Journal of Human Genetics. DOI: 10.1016/j.ajhg.2025.09.011

License

Pedigree Simulator was developed by the Below Lab in the Division of Genetic Medicine at Vanderbilt University Medical Center, Nashville, TN, USA.

Pedigree Simulator is distributed under the following APACHE 2.0 license: https://compadre.dev/licenses/sim_license.txt

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
data		data
dependencies		dependencies
output		output
src		src
tools		tools
.env		.env
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Pedigree Simulator

Installation

Execution

Arguments

Required:

Optional:

Notes

IBD segment generation

Questions?

License

About

Uh oh!

Releases

Packages

Languages

belowlab/pedigree-simulator

Folders and files

Latest commit

History

Repository files navigation

Pedigree Simulator

Installation

Execution

Arguments

Required:

Optional:

Notes

IBD segment generation

Questions?

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages