Information about assembly Zm-EP1-REFERENCE-TUM-1.0
(also known as EP1)
to learn about maize genome and gene model nomenclature rules.
Genome Sequencing Project Information
The enormous diversity of maize is reflected by a large number of SNPs and substantial structural variation. To remedy the scarcity of sequence resources for the Flint pool, a reference sequence was generated de novo from inbred line EP1. The EP1 reference sequence complements the maize pan-genome with European Flint diversity.
This project is part of the European Maize project (http://www.europeanmaize.net/). This work was funded by the Bavarian State Ministry of the Environment and Consumer Protection (Project BayKlimaFit; http://www.bayklimafit.de/)
European Flint reference sequences complement the maize pan-genome.
Unterseer, Sandra*; Seidel, Michael A.*; Bauer, Eva; Haberer, Georg; Hochholdinger, Frank; Opitz, Nina, Marcon, Caroline; Baruch, Kobi; Manuel Spannagl; Mayer, Klaus F.X.; Schön, Chris-Carolin* These authors contributed equally.
At MaizeGDB DOI
Assembly methods: NRGene de novo assembly. The assembly was done using DeNovoMAGIC 2.0 after which NRGene’s internal maize ancestral genome was used to build pseudo chromosomes from the de novo assembled scaffolds. Construction of pseudomolecules: Yes
Scaffolds Ns are applied between contigs using paired-end and mate-pair information and their sizes are determined by the estimated insert sizes. Negative gaps Mate-pair and paired-end information is used to estimate the unfilled gap sizes in the scaffolds. In cases where the linking information indicated a "negative" gap size (a gap of undetermined size), an artificial gap size of 10 N’s is used. Pseudomolecules Scaffolds in all the rest of of the chromosomes are separated by 100 Ns. The unfilled gaps within scaffolds by a variable number of N’s according to the estimation of gap size between their contigs.
N50 scaff length
N50 scaff count
N90 scaff length
N90 scaff count
N50 contig length
N50 contig count
N90 contig length
N90 contig count
Total number of scaffolds in assembly.
Longest scaffold in assembly.
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 50% of the total assembly size.
How many scaffolds are counted in reaching the N50 threshold.
The length of scaffold which takes the sum length (summing from longest to shortest scaffold) past 90% of the total assembly size.
How many scaffolds are counted in reaching the N90 threshold.
The longest contig.
The length of contig which takes the sum length (summing from longest to shortest contig) past 50% of the total assembly size.
How many contig are counted in reaching the N50 threshold.
The length of contig which takes the sum length (summing from longest to shortest contig) past 90% of the total assembly size.
How many contig are counted in reaching the N90 threshold.
A contig is a contiguous consensus sequence that is
derived from a collection of overlapping reads.
A scaffold is set of a ordered and orientated contigs
that are linked to one another by mate pairs of sequencing reads.
Annotation version 1.0 was a two-step process using an in-house annotation pipeline at PGSB (Plant Genome and Systems Biology, Helmholtz Zentrum München). In step 1, transcriptome assemblies were made from a pooled RNAseq library of 27 different tissues/conditions using Bridger and Trinity. These assemblies were unified with the Evigene5 pipeline to retrieve a transcript set for each line. The transcriptome assemblies were then used together with public transcriptome assemblies and proteome data from Sb, Bd, Os B73_v4 and PH207 to identify optimal spliced alignments to the reference sequence using GenomeThreader. From this, consensus models were derived and used for AHRD annotation and subsequent filtering for transposons. In step 2, we made pairwise whole-genome alignments (WGA) between EP1, F7, B73_v4 and PH207 to identify syntenic WGA blocks. Coding sequences of each line were mapped to all other lines. If a coding sequence of another line mapped with high confidence in a syntenic block where initially no gene was annotated, we kept the gene as novel gene model if a stringent bit-score threshold was surpassed and coverage was >0.97. These novel gene models were added to those of step 1 to provide annotation v1.0.