Rice Genome Annotation Project

Current Rice Genome Pseudomolecules Release

We are pleased to announce release 7 of the Rice Pseudomolecules and Genome Annotation. The official release date for this version was October 31, 2011.

Release 7 is a major update from release 6.1. The rice pseudomolecules have been reconstructed using an optimal BAC tiling path that involved use of a BAC-optical map and error correction of the underlying BAC sequence using next generation sequencing reads from Nipponbare rice. This effort, in cooperation with researchers at the Agrogenomics Research Center at the National Institute of Agrobiological Sciences, Tsukuba, Japan and the Rice Annotation Project Database (RAP-DB), represents a final and unified set of pseudomolecules (Os-Nipponbare-Reference-IRGSP-1.0). There are the 12 chromosomes, one pseudomolecule representing the unanchored BAC clones, one pseudomolecule representing unmapped Syngenta sequences plus the two organellar genomes. Note that while the Rice Genome Annotation Project (RGAP) and the International Rice Annotation Project Database (RAP-DB) have different annotation efforts, these parallel annotation efforts utilize the same underlying pseudomolecule sequence.

In release 7, there were 373,245,519 bp of non-overlapping rice genome sequence from the 12 rice chromosomes. The genes that had been identified from release 6.1 were remapped and transfered to release 7. This process resulted in 55,986 genes (loci) had been identified, of which 6,457 had 10,352 additional alternative splicing isoforms resulting in a total of 66,338 transcripts (or gene models) in the rice genome. Note that small gene models (<50 amino acids) have been excluded from our annotated gene set.

Transposable element-related (TE-related) gene models were identified using two approaches: BLASTN searches against the MSU Oryza Repeat Database and by identifying gene models containing TE-related Pfam domains. These loci (16,941) and their models (17,272) were annotated based on the Pfam domain or the nomenclature in the MSU Oryza Repeat Database. Pack-MULEs were identified on all 12 chromosomes. They were annotated as described in Hanada et al. 2009. Transduplicate MULEs identified by Juretic et al. 2005 were aligned to the current pseudomolecules. Note that the Jiang Pack-MULEs and the transduplicate MULEs had only been identified on the Genome Browser and not in our functional annotation. Also note that although loci and gene models on ChrUn and ChrSy are now included in our official gene set but are not assigned LOC_OsXXgXXXXX identifiers. These two pseudomolecules contain 185 loci and gene models.

Please note that these pseudomolecules are constructed from finished and unfinished sequence and a majority of the gene models have not been manually curated.

Table of Rice Pseudomolecule, Loci, and Gene Models in Release 7

Chr	BAC/ PAC No.	Sequence Length in Pseudomolecule (bp)	Gaps	Genes/Loci^a			Gene Models^a			Download Sequences
Chr	BAC/ PAC No.	Sequence Length in Pseudomolecule (bp)	Gaps	TE^b	Non-TE^c	Total^d	TE^b	Non-TE^c	Total^d	Download Sequences
1	392	43,270,923	8	1,464	5,078	6,542	1,518	6,518	8,036	Download
2	359	35,937,250	5	1,244	4,143	5,387	1,274	5,392	6,666	Download
3	331	36,413,819	8	1,185	4,388	5,573	1,224	5,803	7,027	Download
4	296	35,502,694	9	1,903	3,419	5,322	1,919	4,265	6,184	Download
5	286	29,958,434	5	1,461	3,118	4,579	1,483	4,009	5,492	Download
6	281	31,248,787	4	1,488	3,236	4,724	1,517	3,965	5,482	Download
7	289	29,697,621	3	1,397	3,065	4,462	1,430	3,767	5,197	Download
8	278	28,443,022	3	1,432	2,762	4,194	1,446	3,426	4,872	Download
9	223	23,012,720	7	1,148	2,260	3,408	1,161	2,768	3,929	Download
10	208	23,207,287	10	1,219	2,298	3,517	1,244	2,830	4,074	Download
11	261	29,021,106	6	1,459	2,707	4,166	1,493	3,208	4,701	Download
12	269	27,531,856	5	1,579	2,443	4,022	1,605	2,983	4,588	Download
Total^e	3,184	373,245,519	73	16,979	39,102	56,081	17,314	49,119	66,433	Download

^a Excluding small gene models (< 50 amino acids).
^b TE: Transposable elements related genes and gene models. The rice proteome was searched against the MSU Oryza Repeat Database with TBLASTN and against the TE-related Pfam domains with hmmpfam. Genes and gene models with matches above cut-offs were annotated as TE-related gene models.
^c Non-TE: Non-TE related gene models.
^d There are 89 loci and 89 models on ChrSy. There are 96 loci and 96 models on ChrUn. These loci and models are not included in the totals for the main pseudomolecules.
^e Note that these pseudomolecules are now identical to the IRGSP/RAP pseudomolecules.

This work is supported by grants (DBI-0321538/DBI-0834043) from the National Science Foundation and funds from the Georgia Research Alliance, Georgia Seed Development, and University of Georgia.