User's Manual

User's Manual Contents

Implementation Details

We describe here some of the details about the implementation of PSAT

System software requirements

  1. Apache web server (http://www.apache.org)
  2. PostgreSQL database server (http://www.postgresql.org)
  3. Perl (http://www.perl.org)

Database Population

  1. Determine reference and comparison genomes
  2. Download all .ptt and .faa files for both the reference and comparison genomes from the NCBI ftp site
  3. Store all genes (indicated in .ptt file) for each genome into a database, assigning each gene an index indicating its ordering in the genome
  4. Perform protein BLAST of each reference genome against each comparison genome, and store the top three results for each reference gene against each comparison genome into the database
  5. For each BLAST hit stored in the database, calculate the homology cluster score (this score can be an indicator of synteny between genomes and can contribute interesting information when comparing genomes) as defined below:
    1. Determine gene index value within their respective genomes for the query and subject gene
    2. Query the database to find the immediately adjacent gene upstream of both the query and subject gene
    3. Query the database to determine whether a BLAST hit exists for these adjacent genes. If there is a hit, add a point to the score.
    4. Repeat the previous two steps until there is no hit
    5. Repeat the previous three steps for adjacent genes downstream of both the query and subject gene
    6. Record the final score for the BLAST hit pair into the database

Web Interface Visualization and Analysis of Data

We have implemented three options for querying, analyzing, and exploring the homology results stored in the database using a web interface:
  1. Find homologs for specified gene(s)
    1. Use gene search constraints specified by user input (e.g. gene name, locus tag, description, location) to find gene matches for the selected reference genome
    2. For each match, query the database for BLAST hits that meet the user defined score thresholds for the selected comparison genome(s)
    3. If the user specified to only show hits with a homolog clustering score greater than 1 (indicating some synteny), then filter the BLAST hit result numbers based on the score
    4. Display details about the each of the matching genes found and the number of BLAST hits satisfying the constraints
    5. Each result will link to the details page listing details about each gene hit, the BLAST hit scores, and visualization graphic for comparison of the genomic neighborhoods between the genomes
  2. Compare multiple genomes
    1. For each count between 0 and the number of comparison genomes selected, query the database for the number of genes that have a BLAST hit (homolog)
      • meeting the specified BLAST score thresholds
      • with the homolog clustering score selected (>1 indicating synteny or all scores)
      • in this number of genomes
    2. For each of these counts (number of genomes), display the number of genes with homologs (as defined by the constraints)
    3. Each result will link to a page listing the genes and the homologs in the subset of comparison genomes
  3. Determine homology statistics
    1. For each comparison genome,
      • Determine the number (and percentage) of genes in the reference genome that have a BLAST hit meeting the user specified BLAST score thresholds in the comparison genome
      • Use the above values to calculate the number and percentage of genes in the reference genome that do NOT have a BLAST hit in the comparison genome
      • Display the values determined above as the number of genes with or without homologs
      • For each value, link to a page listing the genes in the reference genome with homologs (along with BLAST scores) or without homologs (no BLAST results)
    2. Each gene listed will link to the details page with details about each gene hit and BLAST hit scores (if any), and the visualization graphic for exploration of the genomic neighborhood surrounding the gene and any homolog
The visualization graphic for a given gene is drawn by calculating the genome coordinates of the region to display based on user settings. The genes in this region are queried from the database and assigned a color code. For any homolog matching specified constraints, the corresponding genomic neighborhood is drawn in the same manner, aligned appropriately below the reference gene graphic. The genes in the neighborhood of the homolog are colored to match any gene in the reference genome that has a BLAST hit, helping users explore potential synteny between genomes. Genes without a BLAST hit are colored gray. Mouseovers are generated for all displayed genes with gene details and BLAST hit scores.



© University of Washington 2008