Rice Genome Annotation Project

Rice Genome Annotation Project Data Download

This page contains download links for the IRGSP Rice Os-Nipponbare-Reference-IRGSP-1.0 genome assembly and the UGA (formerly MSU) Rice Genome Annotation Release 7. We have updated the data files to give them more user friendly names and to separate the gene models from the TE-related elements

The legacy download / FTP site https://rice.uga.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/ with the original files will still be maintained.

Researchers who wish to cite the MSU Rice Genome Annotation Project website are encouraged to refer to this publication:

Kawahara, Y., de la Bastide, M., Hamilton, J.P. et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice 6, 4 (2013). https://doi.org/10.1186/1939-8433-6-4

Rice Genome Annotation Project - Release 7

Release Date: October 31, 2011
Updated files released: Sepetmber 1, 2024

Rice Os-Nipponbare-Reference-IRGSP-1.0 Genome Assembly:

osa1_r7.asm.fa.gz - Genome assembly
osa1_r7.asm.chrs.fa.gz - Genome assembly - Chromosomes 1-12 only
osa1_r7.asm.repeat_masked.fa.gz - Repeat Masked Genome assembly
osa1_r7.asm.repeat_masked.gff3.gz - Putative repetitve elements in the genome assembly (GFF3 format)

Genome Annotation:

Gene Model Set (non TE-related)

osa1_r7.gene_models.cdna.fa.gz - Transcript sequences (cDNA) of the gene models
osa1_r7.gene_models.cds.fa.gz - Coding sequences (CDS) of the gene models
osa1_r7.gene_models.pep.fa.gz - Protein sequences of the gene models
osa1_r7.loci.genomic.fa.gz - Genomic sequences for each locus

Representative Gene Model Set (non TE-related)

The representative gene models are a subset of the gene model set. Each representative gene model is the isoform with the longest CDS at each locus.

osa1_r7.gene_models.repr.cdna.fa.gz - Transcript sequences (cDNA) of the the representative gene models
osa1_r7.gene_models.repr.cds.fa.gz - Coding sequences (CDS) of the representative gene models
osa1_r7.gene_models.repr.pep.fa.gz - Protein sequences of the representative gene models

TE-related Gene Model Set

osa1_r7.te_related_models.cdna.fa.gz - Transcript sequences (cDNA) of the TE-related gene models
osa1_r7.te_related_models.cds.fa.gz - Coding sequences (CDS) of the TE-related gene models
osa1_r7.te_related_models.pep.fa.gz - Protein sequences of the TE-related gene models
osa1_r7.te_related_loci.genomic.fa.gz - Genomic sequences for each TE-related locus

All gene models (non TE-related and TE-related combined)

osa1_r7.all_models.cdna.fa.gz - Transcript sequences (cDNA) of all gene models
osa1_r7.all_models.cds.fa.gz - Coding sequences (CDS) of all gene models
osa1_r7.all_models.pep.fa.gz - Protein sequences of all gene models
osa1_r7.all_loci.genomic.fa.gz - Genomic sequences of all loci
osa1_r7.all_models.gff3.gz - Annotation of all gene models in GFF3 format
osa1_r7.all_models.functional_annotation.txt.gz - Functional annotation for all gene models
osa1_r7.all_models.GOSlim.txt.gz - GOSlim annotation for all gene models
osa1_r7.all_models.interpro.txt.gz - Interpro annotation for all gene models
osa1_r7.all_models.pfam.txt.gz - PFAM annotation for all gene models
osa1_r7.locus_brief_info.txt.gz - Legacy file summarizing the annotation for all the gene models
osa1_r7.all_models.gene_exp_matrix.txt.gz - Gene expression matrix (TPM) for all gene models
osa1_r7.all_models.gene_exp_matrix.xlsx - Gene expression matrix (TPM) for all gene models, Excel format
osa1_r7.all_models.coexpression_modules.txt.gz - Gene coexpression module assignments for all gene models
osa1_r7.all_models.coexpression_modules.xlsx - Gene coexpression module assignments for all gene models, Excel format
osa1_r7.coexpression_module_list.txt.gz - List of all coexpression modules
osa1_r7.coexpression_module_list.xlsx - List of all coexpression modules, Excel format

Contact:

Rice Genome Annotation Project, UGA - buell.lab.web@gmail.com

Dr. C. Robin Buell, UGA - Robin.Buell@uga.edu

This work is supported by grants (DBI-0321538/DBI-0834043) from the National Science Foundation and funds from the Georgia Research Alliance, Georgia Seed Development, and University of Georgia.