PSAT User's Manual

User's Manual Contents

Examples

We describe here a few examples of how this tool might be used for genome annotation or genomic comparison studies. We provide links directly to the results that you can easily refer to for these examples.

1. Looking for known gene clusters

A cluster of genes in the F. novicida bacterial strain in known to be involved with leucine biosynthesis. The cluster includes genes leuB, leuD, leuC, leuA, and ilvE. To investigate whether this cluster of genes is present in a different Francisella strain, F. tularensis, we perform a search for leu.

List of genes with name leu

Five genes are found. We already see that potential orthologs in F. tularensis were found only for two of the leu genes in F. novicida (see # of BLAST hits). We even used the most relaxed BLAST score thresholds for this search. Selecting one of these genes (leuA) displays the synteny results that we can explore in greater detail.

Synteny Results for 'leucine' cluster of genes

Inspection of the graphic further indicates that leuB, leuD, and leuC that are found immediately adjacent to the leuA gene in F. novicida are not present in the F. tularensis genome. This observation suggests that leucine biosynthesis may be impaired in the F. tularensis strain and warrants further investigation.

2. Discovering potential novel functional gene clusters

When analyzing the synteny results for a gene, you may come across results that suggest the existence of functional gene clusters that may not have been previously known. Say you are analyzing the synteny of genes for the pilin related gene pilQ in F. novicida. You have specified to find synteny among all bacterial genomes, and have selected a relatively relaxed set of BLAST score thresholds.

Synteny Results for pilQ gene

As expected, the other Francisella genomes are at the top of the list with the highest predicted synteny. There are, however, several other results found in remotely related bacterial genomes. Inspection of the graphic indicate that the majority of the syntenic regions include the genes aroK and aroB immediately downstream of pilQ (mouseover the colored genes to view orthology details). The mouseovers describe these genes as 3-dehydroquinate synthase and shikimate kinase I with no indication of a relationship to pilQ. Because of the synteny found between these genes in multiple genomes, however, it may be interesting to explore them as a potential gene cluster.

3. Finding orthologs for gene annotation

Researchers attempting to determine the function of a particular gene often perform sequence similarity searches against genes of known function. Sometimes a set of closely related genomes are selected for this search. And sometimes, preselected alignment score cutoffs are used to determine whether two genes are similar enough to designate as orthologs. It can often be challenging for annotators to decide whether two genes are similar enough to propagate the function of one to the other. Synteny analysis can provide additional information to help researchers with their annotation tasks.

Say you are trying to annotate the F. novicida gene FTN_0453. You perform a synteny analysis of the gene against all other bacterial genomes.

Synteny Results for FTN_0453

A few results are returned, none of which are in other Francisella genomes, but rather are found in the genomes of remotely related organisms such as Shewanella and Vibrio. An analysis among only closely related genomes would therefore not have discovered the potential orthologs listed in the synteny results. The BLAST score values for the potential orthologs may also be debatable, yet the existence of synteny among the neighboring genes help provide further evidence of valid orthology. Using this logic, FTN_0453 has been annotated as a glycosyl transferase.

4. Retrieving a list of homologs for a complete genome

Utilize the PSAT database of homologs to quickly retrieve a list of genes in a complete genome and its sequence homologs in selected comparison genomes that meet specified BLAST alignment score threshold and a PSAT homolog cluster score value.

List of genes in Brucella abortus 9-941 and its homologs in Brucella canis ATCC 23365, Brucella melitensis and Brucella melitensis biovar Abortus

Two tables are generated, one for each of the 2 chromosomes in the genome of Brucella canis ATCD 23365. To create a tab delimited file of the results for export to a spreadsheet application such as Excel, select to Edit Settings and use the 'Tab delimited text file' option.

5. Exploring genes which may be present only in a subset of genomes

In a genomic comparison study, researchers are often interested in determining which genes are present only in a certain subset of genomes. For example, a comparative study of Francisella strains may want to explore genes which are present in human pathogenic strains such as tularensis SchuS4 and holarctica LVS, yet absent in non-human pathogenic strains such as novicida U112.

PSAT offers a basic analysis that allows researchers to determine which genes in a reference genome have homologs in some number of comparison genomes. This feature displays numbers of genes, grouped by the number of genomes in the selected set of comparison genomes that contain homologies (BLAST hits).

Results Comparing genes in Francisella tularensis SchuS4 with novicida U112 and holarctica LVS

You can easily identify from the list of results the SchuS4 genes that have homologs in LVS, but not in U112. Note that we selected a BLAST e-value cutoff of 0.1 and selected to include hits with any homolog clustering score. The results given might provide a good starting point for investigating genes found in the human pathogenic Francisella strains, but not U112. Researchers can modify the set options to further explore.
User's Manual Contents



© University of Washington 2008