Contents
Introduction
PredGO is a flexible, interactive web server that combines sequence, AlphaFold predicted structure and protein interaction information (if available) for protein function prediction methods. Our approach uses pre-trained language models, geometric vector perceptrons and attention mechanisms to extract key information about proteins and fuse them for functional prediction. Figure 1 shows the framework of PredGO [1].
Figure 1. An overview of PredGO. The input is a sequence. The model first searches for interacting proteins from the PPI database and predicts the protein structure based on the sequence using AlphaFold2. Then the sequence features of the protein are extracted by ESM-1b, and the proteins with interactions are fused using a PPI feature fusion module with protein fusion layers. The predicted structures are represented in the structure feature extraction module as graph structure with scalar features and vector features and extracted by GVP-GNN. Finally, the feature containing sequence and PPI information are concatenated with the feature containing structure to obtain the scores of GO terms by a multilayer perceptron and sigmoid function.
Getting StartedInput data can be sequences in fasta format or structure files in PDB format. The input PDB file will only use the first chain. Multiple proteins can be predicted at one time (maximum allowed 10 proteins).Please follow the descriptions for the input format. Users could leave their email address. Titles also could be specified for the user to distinguish their different jobs. Private Key is set to protect your structure and analysis. Please refer to Figure 2 and Figure 3.
Figure 2. Enter the sequence and submit the task.
Figure 3. Enter the structure and submit the task.
The server will check the validity of the input sequences or structures, and once confirmed, process to the
secend step to select the query proteins. If the selection is done, please click the button "submit" to run
the job (Figure 4 and Figure 5). Users will then be directed to the result page with job status. Book the
link
if you want
to check your results later. However, the analysis could also be retrieved by user email or job ID in the
result page.
Figure 4. Select the query protein.
Figure 5. Task submission interface.
PredGO will return the predicted protein gene ontology function, along with their confidence score. Users can download the predicted results in excel or csv format. The user can also view the 3D structure and GO tree of the target protein. (Figure 6, Figure 7, Figure 8 and Figure 9)
Figure 6. Checking task progress.
Figure 7. Prediction results.
Figure 8. 3D structure structure display.
Figure 9. GO tree display.
Figure 9. Search iinterface display.
Figure 9. Search result display.
Source Code, Predictions and Datasets
Predictions:
CAFA3:
UniGOA16:
Notes:
References