Rice Genome Annotation Project Data Download
This page contains download links for the IRGSP Rice Os-Nipponbare-Reference-IRGSP-1.0 genome assembly and the UGA (formerly MSU) Rice Genome Annotation Release 7. We have updated the data files to give them more user friendly names and to separate the gene models from the TE-related elements
The legacy download / FTP site https://rice.uga.edu/pub/data/Eukaryotic_Projects/o_sativa/annotation_dbs/ with the original files will still be maintained.
Researchers who wish to cite the MSU Rice Genome Annotation Project website are encouraged to refer to this publication:
- Kawahara, Y., de la Bastide, M., Hamilton, J.P. et al. Improvement of the Oryza sativa Nipponbare reference genome using next generation sequence and optical map data. Rice 6, 4 (2013). https://doi.org/10.1186/1939-8433-6-4
Rice Genome Annotation Project - Release 7
- Release Date: October 31, 2011
- Updated files released: Sepetmber 1, 2024
Rice Os-Nipponbare-Reference-IRGSP-1.0 Genome Assembly:
- osa1_r7.asm.fa.gz - Genome assembly
- osa1_r7.asm.chrs.fa.gz - Genome assembly - Chromosomes 1-12 only
- osa1_r7.asm.repeat_masked.fa.gz - Repeat Masked Genome assembly
- osa1_r7.asm.repeat_masked.gff3.gz - Putative repetitve elements in the genome assembly (GFF3 format)
Genome Annotation:
- Gene Model Set (non TE-related)
- osa1_r7.gene_models.cdna.fa.gz - Transcript sequences (cDNA) of the gene models
- osa1_r7.gene_models.cds.fa.gz - Coding sequences (CDS) of the gene models
- osa1_r7.gene_models.pep.fa.gz - Protein sequences of the gene models
- osa1_r7.loci.genomic.fa.gz - Genomic sequences for each locus
- Representative Gene Model Set (non TE-related)
- osa1_r7.gene_models.repr.cdna.fa.gz - Transcript sequences (cDNA) of the the representative gene models
- osa1_r7.gene_models.repr.cds.fa.gz - Coding sequences (CDS) of the representative gene models
- osa1_r7.gene_models.repr.pep.fa.gz - Protein sequences of the representative gene models
- TE-related Gene Model Set
- osa1_r7.te_related_models.cdna.fa.gz - Transcript sequences (cDNA) of the TE-related gene models
- osa1_r7.te_related_models.cds.fa.gz - Coding sequences (CDS) of the TE-related gene models
- osa1_r7.te_related_models.pep.fa.gz - Protein sequences of the TE-related gene models
- osa1_r7.te_related_loci.genomic.fa.gz - Genomic sequences for each TE-related locus
- All gene models (non TE-related and TE-related combined)
- osa1_r7.all_models.cdna.fa.gz - Transcript sequences (cDNA) of all gene models
- osa1_r7.all_models.cds.fa.gz - Coding sequences (CDS) of all gene models
- osa1_r7.all_models.pep.fa.gz - Protein sequences of all gene models
- osa1_r7.all_loci.genomic.fa.gz - Genomic sequences of all loci
- osa1_r7.all_models.gff3.gz - Annotation of all gene models in GFF3 format
- osa1_r7.all_models.functional_annotation.txt.gz - Functional annotation for all gene models
- osa1_r7.all_models.GOSlim.txt.gz - GOSlim annotation for all gene models
- osa1_r7.all_models.interpro.txt.gz - Interpro annotation for all gene models
- osa1_r7.all_models.pfam.txt.gz - PFAM annotation for all gene models
- osa1_r7.locus_brief_info.txt.gz - Legacy file summarizing the annotation for all the gene models
- osa1_r7.all_models.gene_exp_matrix.txt.gz - Gene expression matrix (TPM) for all gene models
- osa1_r7.all_models.gene_exp_matrix.xlsx - Gene expression matrix (TPM) for all gene models, Excel format
- osa1_r7.all_models.coexpression_modules.txt.gz - Gene coexpression module assignments for all gene models
- osa1_r7.all_models.coexpression_modules.xlsx - Gene coexpression module assignments for all gene models, Excel format
- osa1_r7.coexpression_module_list.txt.gz - List of all coexpression modules
- osa1_r7.coexpression_module_list.xlsx - List of all coexpression modules, Excel format
The representative gene models are a subset of the gene model set. Each representative gene model is the isoform with the longest CDS at each locus.
Contact:
Rice Genome Annotation Project, UGA - buell.lab.web@gmail.com
Dr. C. Robin Buell, UGA - Robin.Buell@uga.edu
This work is supported by grants (DBI-0321538/DBI-0834043) from the National Science Foundation and funds from the Georgia Research Alliance, Georgia Seed Development, and University of Georgia.