Skip to content

lbcb-sci/I002C

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

Telomere-to-Telomere diploid Indian Genome

We have sequenced a EBV-immortalized human male cell line from SG10k samples on different platforms. The data contains ~106x of Pacbio HiFi, ~64x of Oxford Nanopore (ONT) Duplex, ~222x ONT Ultralong (ULONT), ~100x MGI WGS short reads, ~33x Illumina WGS short reads and ~120x Omni-C for the child sample (I002C). Statistics of BioNano will be added soon.

For parental samples:

SampleType SampleID HiFi (REVIO) Duplex MGI
Father I002A 60x 21x 35x
Mother I002B 61x 20x 37x

The data statistics are provided in an excel

Reads

Reads can be downloaded from SRA PRJNA1150503. Please cite the below article if you use the dataset.

Sarashetti, P., Lipovac, J., Tomas, F. et al. Evaluating data requirements for high-quality haplotype-resolved genomes for creating robust pangenome references. Genome Biol 25, 312 (2024). https://doi.org/10.1186/s13059-024-03452-y

Assembly releases

v0.7

The latest version of assembly with combined QV of 82.

Annotations

▪️ Gene Annotations

File Description Link
I002C_Maternal_v0.7_LiftOver.gff3.gz GRCh38 Gencode (v48) annotation for maternal haplotype ⬇️download
I002C_Maternal_v0.7_bambu.gtf.gz bambu annotations for maternal haplotype ⬇️download
I002C_Paternal_v0.7_LiftOver.gff3.gz GRCh38 Gencode (v48) annotation for paternal haplotype ⬇️download
I002C_Paternal_v0.7_bambu.gtf.gz bambu annotations for maternal haplotype ⬇️download

▪️ Repeat Annotations

Method: RepeatMasker (v4.1.5) using Dfam (v3.7)
File Description Link
I002C_Maternal_v0.7.fasta.out.gz Maternal RM annotation in dafault .out format ⬇️download
I002C_Maternal_v0.7.fasta_rm.bed.gz Maternal RM annotation in .bed format ⬇️download
I002C_Paternal_v0.7.fasta.out.gz Paternal RM annotation in dafault .out format ⬇️download
I002C_Paternal_v0.7.fasta_rm.bed.gz Paternal RM annotation in .bed format ⬇️download
I002C_Maternal_v0.7_Centromere.bed Maternal centromere annotations ⬇️download
I002C_Paternal_v0.7_Centromere.bed Paternal centromere annotations ⬇️download

▪️ Chain files

File Description Link
GRCh38.p14 (GENCODE v48) <-> I002C
HG38ToMat.chain.gz hg38 to maternal haplotype ⬇️download
HG38ToPat.chain.gz hg38 to paternal haplotype ⬇️download
HG38ToHaploid.chain.gz hg38 to haploid haplotype ⬇️download
MatToHG38.chain.gz Maternal haplotype to hg38 ⬇️download
PatToHG38.chain.gz Paternal haplotype to hg38 ⬇️download
HaploidToHG38.chain.gz Haploid haplotype to hg38 ⬇️download
CHM13 (v2.0) <-> I002C
CHM13ToMat.chain.gz CHM13 to maternal haplotype ⬇️download
CHM13ToPat.chain.gz CHM13 to paternal haplotype ⬇️download
CHM13ToHaploid.chain.gz CHM13 to haploid haplotype ⬇️download
MatToCHM13.chain.gz Maternal haplotype to CHM13 ⬇️download
PatToCHM13.chain.gz Paternal haplotype to CHM13 ⬇️download
HaploidToCHM13.chain.gz Haploid haplotype to CHM13 ⬇️download

v0.4

This assembly version results from performing two rounds of polishing, as outlined in the procedure from 1.

The v0.4 assembly has a improved QV values, with Maternal at 72.35 and Paternal at 70.97. QV values are estimated using hybrid k-mers generated from Pacbio HiFi and MGI WGS data as described in 2. Chromosome wise QV values are listed in an excel file.

v0.2

This version of the assembly contains Telomere-to-Telomere chromsomes for both maternal and paternal haplotypes including a mitochondria. In this version of the genomes, rDNAs have not been resolved.

  Maternal Paternal
T2T Chromosomes 23 23
Size 3,022,465,370 2,934,829,127
NG50 154,891,367 146,273,588
%GC 40.82 40.79

Assembly files (zipped):

Downloading

If you wish to download the files using wget, you may use wget -O <fileName> -U "Mozilla/5.0" --no-check-certificate <link>

Assembly QV is calculated with yak tool using I002C MGI WGS dataset. Per chromosome QV values are provided in an excel file

Note: The data available on this GitHub page can be accessed either from LHG or LBCB GitHub pages

Footnotes

  1. Mc Cartney, Ann M., et al. "Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies." Nature methods 19.6 (2022): 687-695.

  2. https://github.com/arangrhie/T2T-Polish/tree/master/merqury

About

Telomere-to-Telomere diploid Indian Genome

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •