Virtual screening of peptides with high affinity for SARS-CoV-2 main protease

When I wrote the draft of the manuscript entitled “Virtual screening of peptides with high affinity for SARS-CoV-2 main protease”, the pandemic of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) had caused in about of 900,000 deaths worldwide. Today, this number is about to reach 3,000,000 of deaths, and Brazil has been considered as the new epicenter of the disease.

Fortunately, the vaccine development was accelerated and by now there are some options available. In fact, the vaccines are the ultimate resource to solve the pandemic. However, they are not the cure, but the prevention. Therefore, the search of new drugs to treat the coronavirus disease 2019 (COVID-19) are still valid, in particular for those people which are hospitalized.

In this context, peptides have been poorly explored as a potential drug for COVID-19. The main strategy has been the repositioning of drugs already approved for human use, where the virtual screening plays a pivotal role, exploring hundreds to thousands of molecules. In this context, I developed a virtual screening system for peptides against the viral protease.

This system could be considered as a frugal innovation, due to the reuse of previous resources. I took advantage of the genetic algorithm developed for antimicrobial peptides and adapted it to use molecular docking as the fitness function. However, the main innovation was the use of a raspberry pi computer as a server. Interestingly, this feature arose from a failure on my notebook: it randomly stops to work and then a restart is required. Therefore, how I can recover all the data? Fortunately, I had the raspberry pi, which could act as a server, despite its computational power. Thus, with this client-server architecture, the system increase in performance and more than 70,000 peptides could be screened.

However, what is the main idea behind this project? Firstly, it should make clear that this was a very preliminary study, based on computer simulations. As well other virtual screening studies, the main target was the viral protease, which is pivotal in viral cycle, however, peptides have some advantages over other putative inhibitors, due to their plasticity, which turns them very versatile molecules, where other building blocks could be added to add functionality to the molecule.

In this context, taking the viral protease as the target, the molecule should enter the infected cell to reach the target. Therefore, if a molecule could inhibit the protease, but fail in entering the cell, the molecule probably will not work. In the case of peptides, this could be easily fixed by adding a cell penetrating peptide at one of the terminals.

Therefore, the two identified peptides (HHYWH and HYWWT) should a piece on this puzzle, but there is more to be discovered. The main question is if they really bind to the protease and upon binding if they inhibit or are just cleaved by the protease. Depending on what happens, from my point of view, different strategies for engineering a peptide drug could be used: firstly, in case of inhibition, the peptide should be linked to a cell penetrating peptide; and secondly, in case of cleavage, a toxin could be designed to kill the infected cells, by a combination of a four-domain peptide, including a toxin, the peptide, a toxin inactivating sequence and a cell penetrating peptide.

This clearly shows how preliminary the data is. Besides, there are further steps prior to approval for human use, including in vitro and in vivo assays. But we hope that this study could help in solving this critical scenario. By now, with the development of vaccines, we are close to the end, however, the data from this article, as well as the virtual screening system, could be useful for future pandemics.

Quality assessment:
Originality ☆☆☆☆✭
Rigor ☆☆☆☆✭
Significance to the field ☆☆☆☆☆
Interest to general audience ☆☆☆☆☆
Quality of writing ☆☆☆☆✭
Overall quality of the study ☆☆☆☆✭

Porto (2021) Virtual screening of peptides with high affinity for SARS-CoV-2 main protease. Computers in Biology and Medicine, vol 133, 104363.

In silico characterization of class II plant defensins from Arabidopsis thaliana

Until now, all of our posts were basically about the development of machine learning models for prediction of antimicrobial peptides. However, there are more to explore than machine learning. In our last paper, published on Phytochemistry, we characterized two defensins from Arabidopsis thaliana, which despite being a in silico study, it is closer to biology than informatics.

Being a model plant, A. thaliana has an array of resources available on the web; and despite that, our paper shows there are more to be discovered on such plant. This plant has more than 300 defensin genes described – defensins are small proteins involved in plant defense against biotic and abiotic stresses.

Finding new information on this context would be unexpected. However, we found two defensins belonging to class II defensins, which could help in understanding the evolution and distribution of defensins among the flowering plants.

In this context, the web resources for A. thaliana played a critical role on this study. By applying a classical strategy for identification of cysteine-rich peptides on A. thaliana predicted proteome, we found those two defensins, but a number of questions araised from that, including their tissue of expression. However, this information is sometimes inaccessible depending on the tissue of expression, need for an specific stimulus or even amount of protein or RNA produced.

Fortunately, there is a high resolution transcript map for A. thaliana, where we could identify the expression of both defensins on flowers, ovules and seeds. This is interesting because the other known class II defensins are expressed in a similar context, in flowers for solanaceus species and seeds for poaceous species.

In addition, given the evolutionary distance among Brassicaceae, Solanaceae and Poaceae families, these class II defensins could be spread among all flowering plants. We do not know the function of A. thaliana’s class II defensins, but for solanaceous and poaceous’ class II defensins, they present antimicrobial function. Do the A. thaliana’s class II defenins have the same function?

Well, the actual function we do not know, but the predicted structures seem to be very similar to classical plant defensins. In addition, the genes that codes these defensins in A. thaliana seem to be a result of duplication process, because they are neighbors and their sequences share ~70% of identity.

In fact, despite being a in silico study, a number of hypothesis emerged, which remember me about an article by Markowertz on Plos Biology, “All biology is computational biology” ( The application of computational methods to study the structure, function and evolution of proteins is a very exciting field. And this article is a good example of modern biology application.

Quality assessment:
Originality ☆☆☆☆✭
Rigor ☆☆☆☆✭
Significance to the field ☆☆☆☆☆
Interest to general audience ☆☆☆☆✭
Quality of writing ☆☆☆☆✭
Overall quality of the study ☆☆☆☆✭

Costa et al. (2020) In silico characterization of class II plant defensins from Arabidopsis thaliana. Phytochemistry, vol 179, 112511.

#PrePrintFeedback: “AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens”

Main Findings

The preprint by Li et al. describes a new deep learning model for prediction of antimicrobial peptides and its applications to identify these peptides on bullfrog genome.


  • Deep learning application is a hot topic in machine learning area
  • AMPlify outperforms the methods in the benchmarking
  • AMP Scanner was retrained with AMPlify data sets
  • Careful selection of non-AMP sequences
  • Application in a real world scenario (screening the bullfrog genome)
  • Antimicrobial activity determined for pathogens from WHO priority list


  • Section Hyperparameter tuning and model architecture is not biologist-friendly
  • Loose ( and Nagarajan ( datasets were not used as external validation data sets
  • The benchmarking lacks classical prediction systems (e.g. AntiBP2 and CAMP)
  • The problem of shuffled peptides was not addressed
  • The preprint lacks a pipeline flowchart figure
  • The web server was not implemented
  • The peptide screening did not include peptides predicted as non-AMPs


In the field of antimicrobial activity prediction, there are some classical problems that were not overcome in more than ten years of research. The first one is the absence of a non-antimicrobial peptides data set. It seems that we just accepted the use of sequences from Swissprot without the ‘antimicrobial’ annotation to create this data set. Li et al. were more rigorous with this data, which could help to explain AMPlify best performance on the benchmarking. 

The second problem is related to the descriptors, which are not necessary when using deeplearning. However, the key problem of shuffled peptides ( was not addressed by the authors. And this problem could explain some of their results in the bullfrog genome screening.

From the eleven predicted sequences, only four demonstrated antimicrobial activity, resulting in a probability of correct prediction of positive peptides of 0.36. In fact, the eleven peptides have characteristics of AMPs, however, because the shuffled problem was not addressed, we don’t know if these results could be due to the compositional bias. In addition as the authors themselves stated “the size of the training data is still small relative to the data typically employed in most deep learning applications”.

An interesting feature is that they retrained the AMP Scanner with their own data, allowing the comparison between the algorithms, not the systems. This reinforces what other manuscripts have shown, regardless the algorithm, if the system is trained with similar data, the outcome is similar. Because AMPlify has a slightly outperformed AMP Scanner (~5%), but both systems showed statistics higher than 90%.

Besides, AMP Scanner is not the only deep learning predictor available on the web, there is another system  which would be interesting to compare, AxPEP (

Regarding the antimicrobial screening on bullfrog genome, I checked the peptide molecular masses using protparam, and they didn’t match. It is not clear whether some modifications were made on peptides. Also, the peptides presented a rana box motif (, but it was not clear wheter they were synthesized with or without the disulfide bridge.

There is a very specific point that should be highlighted. In discussion the authors stated “it has the potential to play a role in de novo AMP design or enhancement”, well, considering that designed peptides are quite similar to AMPs, but a number of them are inactive, AMPlify should not be used for such purpose, mainly because the Loose data set was not included in the system assessments.

Quality assessment:
Originality: ☆☆☆☆✭
Rigor: ☆☆☆✭✭
Significance to the field: ☆☆☆✭✭
Interest to general audience: ☆☆☆✭✭
Quality of writing: ☆☆☆☆✭
Overall quality of the study: ☆☆☆✭✭


Li et al. 2020. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BioRxiv. (Version 1). doi:

An SVM model based on physicochemical properties to predict antimicrobial activity from protein sequences with cysteine knot motifs

Ten years ago, I was publishing my first manuscript as 1st author, a manuscript about antimicrobial activity prediction, which would be one of the pillars of Porto Reports. Because of its inaugural character, I choose that article to be subject of the first post on Porto Reports Legacy.

An SVM model based on physicochemical properties to predict antimicrobial activity from protein sequences with cysteine knot motifs”, this manuscript is far from being a perfect manuscript, but it has some strengths, including an innovative strategy for antimicrobial activity prediction. In fact, the innovation was the main strength for worthing the publication. And by today, I use this work to teach what to do and what not to do in an scientific publication.

Briefly, the manuscript describes the construction of an antimicrobial activity prediction system using support vector machine as the machine learning algorithm and physicochemical properties as the sequence descriptors. The system reached a good accuracy (~80%) using the polynomial kernel.

The main limitations were  (i) the weak English, and in fact, the incorrect use of verb tenses in several manuscript sections; and (ii) the non-contextualization of the computational problem, which makes the manuscript hard to understand for scientists from biological sciences.

Nevertheless, I need to talk about the strengths! To create something innovative, we need to be creative and in 2010 there was a wide field to explore in this topic. In fact, there was only one manuscript until then related to prediction of antimicrobial activity. Thus in this condition, the idea does not need to be bright, it just need to be different.

The difference was not in the algorithm itself, but on how to train the machine learning algorithm. Because the first manuscript demonstrated that there are only slight differences in the predictive power of different algorithms with the same training schemes. Thus, we used physicochemical properties to train the system, reaching a good accuracy. However, there were some limitations on our technique, that were well described and properly addressed, which is always a strength.

This system was the precursor of CS-AMPPred, and due to some errors in the choice of methods, including the support vector machine engine, the original system did not reach its actual potential.

Quality assessment:

Originality: ☆☆☆☆✭
Rigor: ☆☆☆✭✭
Significance to the field: ☆☆☆☆✭
Interest to general audience: ☆☆☆✭✭
Quality of writing: ☆☆✭✭✭
Overall quality of the study: ☆☆☆✭✭

Porto W.F., Fernandes F.C., Franco O.L. (2010) An SVM Model Based on Physicochemical Properties to Predict Antimicrobial Activity from Protein Sequences with Cysteine Knot Motifs. In: Ferreira C.E., Miyano S., Stadler P.F. (eds) Advances in Bioinformatics and Computational Biology. BSB 2010. Lecture Notes in Computer Science, vol 6268. Springer, Berlin, Heidelberg.