Construction

Workflow

(Click on the image to view full size)

Flowchart of the different steps used to build the PLAZA platform. Grey boxes represent methods & tools whereas white shadowed boxes refer to website visualization tools (most stored under the Analyze menu). A brief description of some essential construction steps is given below while program settings can be found here. For a more detailed explanation, please refer to the PLAZA publications.


Annotation

  • Processing gene annotation & alternative splicing: PLAZA uses a simplified annotation based on a 'one locus - one transcript' principle. In case multiple splice variants of a single gene are known only the longest transcript is stored. In the genome browser (IGV.js) alternative transcripts can be explored.
  • Erroneous gene models: When parsing all structural gene annotation, we verify if the original gene coordinates do generate the correct transcript and protein sequence (as reported by the primary data provider). Unfortunately, in some cases gene models yield an incorrect transcript (transcript=ne) or protein (protein=ne), sometimes including in-frame stopcodons as well. Consequently, we have chosen to not process these proteins as their inclusion might cause issues further downstream in the PLAZA pipeline. Although these invalid protein sequences are not present in PLAZA's protein database, the corresponding transcript sequences can still be explored using the BLAST page, using BLASTN.
  • GO projection:
    • We applied a stringent set of rules to identify, based on the phylogenetic trees, sets of orthologous groups and used GO projection to exchange functional annotation between orthologs. For the GO projection all primary gene annotations Inferred from Electronic Annotation (IEA) were excluded as primary information source. All new gene-GO associations inferred through projection are labeled with evidence tag Inferred from Sequence Orthology (ISO). For newly inferred gene annotations the source gene(s) are stored as well.
    • GO projection is also performed using much less strict homology definitions. The end-user has the option, through the website, to select/deselect the information collected using GO transfer through sequence-based homology identification.

Gene family evolution

Genome evolution

Workbench


Tools and parameter settings