For TIGR Rice Annotation Release 5.0 (January 24, 2007)

The all.chr directory contains the sequence and other information files for each of the 12 pseudomolecules.  The
following file types are available:

all.1kUpstream:   1000 bp upstream genomic sequences (lower case) from the translational start codons (upper case) for each of the 12 pseudomolecules;

all.BAC_in_pseudomolecule.info:	BAC/PAC tiling path used for each of the 12 pseudomolecules;

all.TE-related:	a list of TE-related gene models in rice genome;

all.TU_model.brief_info: This file lists the information about the chromosome, locus, asmbl_id, TIGR TU name,
                 TU_end5, TU_end3, model_locus, model_name, BAC/PAC clone name, GenBank accession for the BAC/PAC
                 clone, sequence status for the BAC/PAC clone, if the gene model protein related to transposon or
                 retro-transposon elements, if there are good matches to rice EST or fl-cDNA sequences, if PASA* has
                 validated the rice EST/fl-cDNA evidence(s), and functional annotatation;

                 *PASA is a Program to Assemble Spliced Alignments (Nucleic Acids Research, 2003, Vol.31:5654-5666).

all.UTR file:      the UTR sequences for gene models for each of the 12 pseudomolecules;
                  Please note that the UTR sequences are curated only when there are good rice full-length cDNA
                  (FL-cDNA) or rice EST evidence supporting the UTR annotation. There are no UTR sequences included
                  in the .seq file and .UTR file for those genes without good evidence supporting the UTR annotation.
all.cDNA file:     nucleotide sequences of the gene model containing the untranslated region but no intron sequences for each of the 12 pseudomolecules;

all.cds file:      nucleotide sequence of the gene models for each of the 12 pseudomolecules (coding sequence only: i.e. intron-less and no untranslated region);

all.con file:     complete genome sequence for each of the 12 pseudomolecules;
                 Please note that 1000 Ns have been inserted for each physical gap. Also, note that the sequence of
                 some BAC/PAC clones is unfinished and may contain other gaps denoted by a string of Ns.

all.exon file:   exon nucleotide sequences of the gene models for each of the 12 pseudomolecules;

all.gff3 file:    Generic Feature Format version 5 for TIGR's rice genome annotation.

all.intergenic:   the genomic sequences which are not included in gene (TU) sequences for each of the 12 pseudomolecules;

all.intron file:   intron nucleotide sequences of the gene models for each of the 12 pseudomolecules;

all.models_near_insertion_sites: This file lists the gene models with an insertional mutation either within the gene or within the flanking sequence (500 bp upstream or downstream) of the gene. The insertions were positioned on the version 5 pseudomolecules using the Flanking Sequence Tag (FST) sequence from GenBank.

all.models_with_Pfam: This file lists those gene models with Pfam domain matches for each of the 12 pseudomolecules;

all.models_with_interpro: This file lists those gene models with Interpro database matches for each of the 12 pseudomolecules;

all.pep file:      amino acid sequences corresponding to gene models for each of the 12 pseudomolecules;

all.seq file:      gene (TU) sequences, including exons, introns and upstream/downstream untranslated regions for each of the 12 pseudomolecules;

all.xml file:     xml-formatted file containing TIGR's annotation, coordinates, and sequence data
                 corresponding to the pseudomolecule for each of the 12 pseudomolecules;

all.models_genomic.seq:       gene model sequences, including exons, introns and upstream/downstream untranslated regions for each of the 12 pseudomolecules;

all.short_models.cds file:      nucleotide sequence of the gene models with less than 50 amino acids for each of the 12 pseudomolecules;

all.short_models.pep file:      amino acid sequences corresponding to gene models with less than 50 amino acids for each of the 12 pseudomolecules;

rice.v5_iprscan.raw:	raw output of Interpro scan on the TIGR rice proteome;

rice.v5.paralogous.family.list:	rice proteins grouped into paralogous families;

rice.versions.info.txt:	a table of TIGR rice TU ((gene) feat_names linking among TIGR Rice Annotation Releases;

v4.obsolete.loci.with.v5.coords:	a list of rice locus names deprecated in the TIGR's latest release.


Similar files are provided for each chromosome in the individual directories.


The format in the definition line for .pep, .cDNA, .cds, .seq  files:
>locus|feat_name|seq_type gene_product_functional_assignment

The format in the definition line for .UTR, .intron files:
>locus|feat_name|seq_type

The format in the definition line for .1kUpstream file:
>locus|feat_name|end5..end3 gene_product_funcitonal_assignment

The format in the definition line for .intergenic file:
>asmbl_id|seq_type|end5..end3

The format in the definition line for .con file:
>chr|asmbl_id database

The feat_name in .seq files is the TU (transcription unit) name in the TIGR rice genome database. However, the feat_name in the other
sequence files is the gene model name. The relationship between TU feat_name and model feat name can been found in the file
.TU_model.brief_info. Please note that one TU can have more than one gene model when there are alternative splicing
isoforms. For the detailed gene nomenclature at TIGR rice genome annotation project, please visit:
http://www.tigr.org/tdb/e2k1/osa1/tigr_gene_nomenclature.shtml

Please note that small genes (< 50 amino acids) are excluded from all of the sequence files except for all.short_models.cds and all.short_models.pep files.


For more details about TIGR rice genome pseudomolecules, please visit our web site at:
    http://rice.tigr.org/tdb/e2k1/osa1/pseudomolecules/info.shtml

The main page for the TIGR Rice Genome Project is :
    http://rice.tigr.org/

For questions about TIGR rice genome related issues, please send e-mail to:
    rice@tigr.org

Thank you for your interest in the TIGR Rice Annotation Resources.