MSU Rice Genome Annotation Project Release 6.1 (June 3, 2009) The all.chr directory contains the sequence and other information files for each of the 12 pseudomolecules. The following file types are available: all.1kUpstream: 1000 bp upstream genomic sequences (lower case) from the translational start codons (upper case) for each of the 12 pseudomolecules all.BAC_in_pseudomolecule.info: BAC/PAC tiling path used for each of the 12 pseudomolecules all.TE-related: a list of TE-related gene models in rice genome all.TU_model.brief_info: This file lists the information about the chromosome, locus, asmbl_id, MSU TU feature name, TU_end5, TU_end3, model_locus, model_name, if the gene model protein related to transposon or retro-transposon elements, if there are good matches to rice EST or fl-cDNA sequences, if PASA* has validated the rice EST/fl-cDNA evidence(s), and functional annotatation *PASA is a Program to Assemble Spliced Alignments (Nucleic Acids Research, 2003, Vol.31:5654-5666). all.UTR file: the UTR sequences for gene models for each of the 12 pseudomolecules Please note that the UTR sequences are curated only when there are good rice full-length cDNA (FL-cDNA) or rice EST evidence supporting the UTR annotation. There are no UTR sequences included in the .seq file and .UTR file for those genes without good evidence supporting the UTR annotation. all.cDNA file: nucleotide sequences of the gene model containing the untranslated region but no intron sequences for each of the 12 pseudomolecules all.cds file: nucleotide sequence of the gene models for each of the 12 pseudomolecules (coding sequence only: i.e. intron-less and no untranslated region) all.con file: complete genome sequence for each of the 12 pseudomolecules Please note that 1000 Ns have been inserted for each physical gap. Also, note that the sequence of some BAC/PAC clones is unfinished and may contain other gaps denoted by a string of Ns. all.exon file: exon nucleotide sequences of the gene models for each of the 12 pseudomolecules all.gff3 file: Generic Feature Format version 3 for the MSU rice genome annotation. This file also includes short genes, partial genes, pseudogenes and the annotation for the Syngenta and unanchored BAC psuedomolecules. all.intergenic: the genomic sequences which are not included in gene (TU) sequences for each of the 12 pseudomolecules all.intron file: intron nucleotide sequences of the gene models for each of the 12 pseudomolecules all.models_near_insertion_sites: This file lists the gene models with an insertional mutation either within the gene or within the flanking sequence (500 bp upstream or downstream) of the gene. The insertions were positioned on the version 5 pseudomolecules using the Flanking Sequence Tag (FST) sequence from GenBank. all.pfam: This file lists the Pfam domain matches for each of the 12 pseudomolecules all.interpro: This file lists Interpro database matches for each of the 12 pseudomolecules all.pep file: Amino acid sequences corresponding to gene models for each of the 12 pseudomolecules all.seq file: Gene (TU) sequences, including exons, introns and upstream/downstream untranslated regions for each of the 12 pseudomolecules all.short_models.cds file: nucleotide sequence of the gene models with less than 50 amino acids for each of the 12 pseudomolecules all.short_models.pep file: amino acid sequences corresponding to gene models with less than 50 amino acids for each of the 12 pseudomolecules orthologous_groups: 143,380 non-transposable element related genes from rice (release 6), Arabidopsis (release 8), poplar (release 1.1), and grapevine (release 1) were used for identification of putative orthologous groups using OrthoMCL with default parameters. rice.versions.info.txt: a table of MSU rice TU (gene) feat_names linking among MSU/TIGR Rice Annotation Releases v5.obsolete.loci.with.v6.coords: a list of rice locus names deprecated in the annotation latest release. Similar files are provided for each chromosome in the individual directories. The format in the definition line for .pep, .cDNA, .cds, .seq files: >locus|feat_name|seq_type gene_product_functional_assignment The format in the definition line for .UTR, .intron files: >locus|feat_name|seq_type The format in the definition line for .1kUpstream file: >locus|feat_name|end5..end3 gene_product_funcitonal_assignment The format in the definition line for .intergenic file: >asmbl_id|seq_type|end5..end3 The format in the definition line for .con file: >chr|asmbl_id database The feat_name in .seq files is the TU (transcription unit) name in the MSU rice genome annotation database. However, the feat_name in the other sequence files is the gene model name. The relationship between TU feat_name and model feat name can been found in the file .TU_model.brief_info. Please note that one TU can have more than one gene model when there are alternative splicing isoforms. For the detailed gene nomenclature at MSU Rice Genome Annotation Project, please visit: http://rice.plantbiology.msu.edu/gene_nomenclature.shtml Please note that small genes (< 50 amino acids), partial genes, pseudogenes, and annotation from the Syngenta and unanchored BAC pseudomolecules are excluded from all of the sequence files except for the all.short_models.cds and all.short_models.pep files which contain the short and partial genes from Chr 1-12. For more details about the MSU Rice Genome Annotation Project pseudomolecules, please visit our web site at: http://rice.plantbiology.msu.edu/pseudomolecules/info.shtml The main page for the MSU Rice Genome Annotation Project is : http://rice.plantbiology.msu.edu/ For questions about MSU rice genome related issues, please send e-mail to: rice@plantbiology.msu.edu Researchers who wish to cite the MSU Rice Genome Annotation Project website are encouraged to refer to our recent publications: * Ouyang, S., Zhu, W., Hamilton, J., Lin H., Campbell, M., Childs, K., Thibaud-Nissen, F., Malek, R.L., Lee, Y., Zheng, L, Orvis, J., Haas, B., Wortman, J. and Buell, C.R. 2007. The TIGR Rice Genome Annotation Resource: improvements and new features. Nucleic Acids Research 35:D883-D887 * Yuan, Q., Ouyang, S., Wang, A., Zhu, W., Maiti, R., Lin, H., Hamilton, J., Haas, B., Sultana, R., Cheung, F., Wortman, J., and Buell, C.R. 2005. The Institute for Genomic Research Osa1 Rice Genome Annotation Database. Plant Physiology 138: 18-26 Thank you for your interest in the MSU Rice Genome Annotation Project Resources.