The preprint by Li et al. describes a new deep learning model for prediction of antimicrobial peptides and its applications to identify these peptides on bullfrog genome.
- Deep learning application is a hot topic in machine learning area
- AMPlify outperforms the methods in the benchmarking
- AMP Scanner was retrained with AMPlify data sets
- Careful selection of non-AMP sequences
- Application in a real world scenario (screening the bullfrog genome)
- Antimicrobial activity determined for pathogens from WHO priority list
- Section Hyperparameter tuning and model architecture is not biologist-friendly
- Loose (https://doi.org/10.1038/nature05233) and Nagarajan (https://doi.org/10.3390/data4010027) datasets were not used as external validation data sets
- The benchmarking lacks classical prediction systems (e.g. AntiBP2 and CAMP)
- The problem of shuffled peptides was not addressed
- The preprint lacks a pipeline flowchart figure
- The web server was not implemented
- The peptide screening did not include peptides predicted as non-AMPs
In the field of antimicrobial activity prediction, there are some classical problems that were not overcome in more than ten years of research. The first one is the absence of a non-antimicrobial peptides data set. It seems that we just accepted the use of sequences from Swissprot without the ‘antimicrobial’ annotation to create this data set. Li et al. were more rigorous with this data, which could help to explain AMPlify best performance on the benchmarking.
The second problem is related to the descriptors, which are not necessary when using deeplearning. However, the key problem of shuffled peptides (https://doi.org/10.1016/j.jtbi.2017.05.011) was not addressed by the authors. And this problem could explain some of their results in the bullfrog genome screening.
From the eleven predicted sequences, only four demonstrated antimicrobial activity, resulting in a probability of correct prediction of positive peptides of 0.36. In fact, the eleven peptides have characteristics of AMPs, however, because the shuffled problem was not addressed, we don’t know if these results could be due to the compositional bias. In addition as the authors themselves stated “the size of the training data is still small relative to the data typically employed in most deep learning applications”.
An interesting feature is that they retrained the AMP Scanner with their own data, allowing the comparison between the algorithms, not the systems. This reinforces what other manuscripts have shown, regardless the algorithm, if the system is trained with similar data, the outcome is similar. Because AMPlify has a slightly outperformed AMP Scanner (~5%), but both systems showed statistics higher than 90%.
Besides, AMP Scanner is not the only deep learning predictor available on the web, there is another system which would be interesting to compare, AxPEP (https://doi.org/10.1016/j.omtn.2020.05.006).
Regarding the antimicrobial screening on bullfrog genome, I checked the peptide molecular masses using protparam, and they didn’t match. It is not clear whether some modifications were made on peptides. Also, the peptides presented a rana box motif (https://doi.org/10.3389/fmicb.2018.02846), but it was not clear wheter they were synthesized with or without the disulfide bridge.
There is a very specific point that should be highlighted. In discussion the authors stated “it has the potential to play a role in de novo AMP design or enhancement”, well, considering that designed peptides are quite similar to AMPs, but a number of them are inactive, AMPlify should not be used for such purpose, mainly because the Loose data set was not included in the system assessments.
Significance to the field: ☆☆☆✭✭
Interest to general audience: ☆☆☆✭✭
Quality of writing: ☆☆☆☆✭
Overall quality of the study: ☆☆☆✭✭
Li et al. 2020. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BioRxiv. (Version 1). doi: https://doi.org/10.1101/2020.06.16.155705