Porto Reports – Science, Bioinformatics and Software Development

Structural effects driven by rare point mutations in amylin hormone, the type II diabetes-associated peptide

Written in collaboration with Wendy Mendes.

Diabetes mellitus is disease that affects a large part of the world population, being frequently cited as a driving cause of premature death, as well as a risk factor for other pathologies, including pandemic diseases such as COVID-19.

When diabetes is the subject, insulin is almost automatically popped out in our minds. However, there is another important hormone that could be involved in some forms of this disease, the islet amyloid polypeptide, also known as amylin. This is a 37-residue peptide hormone co-secreted with insulin at a ratio of 1:100 by the pancreatic β-cells. Because diabetes is extensively related to glucose control mechanisms, this disease envolves the behavior of other associated hormones; and amylin plays a critical role in glucose homeostasis performing a synergistic activity with insulin.

Diabetes corresponds to a group of metabolic disorders, where type II diabetes is highlighted being the most frequent type. This form of the disease is a result of insufficient insulin production and/or resistance to insulin response, similar to other forms of the disease. The hallmark of type II diabetes is hyperglycemia resulting from disturbances in glucose processing, where amylin plays a critical role. This hormone has the ability to aggregate into fibrils that can generate amyloid deposits. Due to the cytotoxicity of this amyloid formation, amylin is related to the pathophysiology of type II diabetes, because the pancreatic β-cells are often destroyed by the amyloid formation.

Being a gene encoded hormone, amylin could suffer modifications driven by mutations on its gene. And such modifications could increase or decrease the amylin amyloid formation capacity, altering its biological activity. For instance, studies with amylin orthologues indicated that the rat amylin has a lower propensity for aggregation, and based on such modifications the drug pramlintide was developed. On the other hand, some mutations such as S20G, driven by a single nucleotide polymorphism (SNP) seem to be capable to increase this ability.

In fact, SNPs can lead to changes in the structure and/or function of a peptide. And the effects of these changes in amylin are still not fully understood, especially since it is a dynamic and flexible molecule, evaluating the effects of SNPs on the amylin sequence can be a challenge. Currently, computational tools, e.g. SNP effect predictors, molecular modeling and molecular dynamics, have been used to characterize and evaluate SNPs, in addition to generating new insights into their relationship with the most diverse diseases.

In this study by Mendes et al. (2021), two amylin SNPs (including S20G), where subjected to in silico analysis, in order to gain insights on their effects on the structure. Our results indicated that both mutations have aggregation potential and may cause changes in the monomeric forms when compared with wild-type amylin. In additon, when compared to pramlintide we could infer that second α-helix maintenance may be related to the aggregation potential.

The S20G mutation has a frequence >1% in east asian population, and this could be related to the fact that asian have more rates of type II diabetes. However the most intriguing mutation is G33R due two factors. Firstly, the introduction of an arginine at such position, made the C-Terminal cationic, which in turn, caused a repultion to N-Terminal, leading to a completely different structural type, a complete α-helix, which could be observed in the headline figure of this post; and secondly, this mutation was only observed in a single individual. Thus, we don’t know wheter this mutation is a de novo mutation or the frequence is very low due to the putative deleterious effects driven by this different structural type.

This study could help to better understand the impact of mutations on the wild-type amylin sequence, as a starting point for the evaluation and characterization of other variations. Moreover, these findings could improve the health of patients with type II diabetes with this genetic background.

Quality assessment:
Originality ☆☆☆☆✭
Rigor ☆☆☆☆✭
Significance to the field ☆☆☆☆☆
Interest to general audience ☆☆☆☆☆
Quality of writing ☆☆☆☆☆
Overall quality of the study ☆☆☆☆✭

Reference
Mendes et al. (2021) Structural effects driven by rare point mutations in amylin hormone, the type II diabetes-associated peptide. BBA General Subjects, vol. 1865, no. 8, 129935. https://doi.org/10.1016/j.bbagen.2021.129935

Virtual screening of peptides with high affinity for SARS-CoV-2 main protease

When I wrote the draft of the manuscript entitled “Virtual screening of peptides with high affinity for SARS-CoV-2 main protease”, the pandemic of severe acute respiratory syndrome coronavirus 2(SARS-CoV-2) had caused in about of 900,000 deaths worldwide. Today, this number is about to reach 3,000,000 of deaths, and Brazil has been considered as the new epicenter of the disease.

Fortunately, the vaccine development was accelerated and by now there are some options available. In fact, the vaccines are the ultimate resource to solve the pandemic. However, they are not the cure, but the prevention. Therefore, the search of new drugs to treat the coronavirus disease 2019 (COVID-19) are still valid, in particular for those people which are hospitalized.

In this context, peptides have been poorly explored as a potential drug for COVID-19. The main strategy has been the repositioning of drugs already approved for human use, where the virtual screening plays a pivotal role, exploring hundreds to thousands of molecules. In this context, I developed a virtual screening system for peptides against the viral protease.

This system could be considered as a frugal innovation, due to the reuse of previous resources. I took advantage of the genetic algorithm developed for antimicrobial peptides and adapted it to use molecular docking as the fitness function. However, the main innovation was the use of a raspberry pi computer as a server. Interestingly, this feature arose from a failure on my notebook: it randomly stops to work and then a restart is required. Therefore, how I can recover all the data? Fortunately, I had the raspberry pi, which could act as a server, despite its computational power. Thus, with this client-server architecture, the system increase in performance and more than 70,000 peptides could be screened.

However, what is the main idea behind this project? Firstly, it should make clear that this was a very preliminary study, based on computer simulations. As well other virtual screening studies, the main target was the viral protease, which is pivotal in viral cycle, however, peptides have some advantages over other putative inhibitors, due to their plasticity, which turns them very versatile molecules, where other building blocks could be added to add functionality to the molecule.

In this context, taking the viral protease as the target, the molecule should enter the infected cell to reach the target. Therefore, if a molecule could inhibit the protease, but fail in entering the cell, the molecule probably will not work. In the case of peptides, this could be easily fixed by adding a cell penetrating peptide at one of the terminals.

Therefore, the two identified peptides (HHYWH and HYWWT) should a piece on this puzzle, but there is more to be discovered. The main question is if they really bind to the protease and upon binding if they inhibit or are just cleaved by the protease. Depending on what happens, from my point of view, different strategies for engineering a peptide drug could be used: firstly, in case of inhibition, the peptide should be linked to a cell penetrating peptide; and secondly, in case of cleavage, a toxin could be designed to kill the infected cells, by a combination of a four-domain peptide, including a toxin, the peptide, a toxin inactivating sequence and a cell penetrating peptide.

This clearly shows how preliminary the data is. Besides, there are further steps prior to approval for human use, including in vitro and in vivo assays. But we hope that this study could help in solving this critical scenario. By now, with the development of vaccines, we are close to the end, however, the data from this article, as well as the virtual screening system, could be useful for future pandemics.

Quality assessment:
Originality ☆☆☆☆✭
Rigor ☆☆☆☆✭
Significance to the field ☆☆☆☆☆
Interest to general audience ☆☆☆☆☆
Quality of writing ☆☆☆☆✭
Overall quality of the study ☆☆☆☆✭

Reference
Porto (2021) Virtual screening of peptides with high affinity for SARS-CoV-2 main protease. Computers in Biology and Medicine, vol 133, 104363. https://doi.org/10.1016/j.compbiomed.2021.104363

In silico characterization of class II plant defensins from Arabidopsis thaliana

Until now, all of our posts were basically about the development of machine learning models for prediction of antimicrobial peptides. However, there are more to explore than machine learning. In our last paper, published on Phytochemistry, we characterized two defensins from Arabidopsis thaliana, which despite being a in silico study, it is closer to biology than informatics.

Being a model plant, A. thaliana has an array of resources available on the web; and despite that, our paper shows there are more to be discovered on such plant. This plant has more than 300 defensin genes described – defensins are small proteins involved in plant defense against biotic and abiotic stresses.

Finding new information on this context would be unexpected. However, we found two defensins belonging to class II defensins, which could help in understanding the evolution and distribution of defensins among the flowering plants.

In this context, the web resources for A. thaliana played a critical role on this study. By applying a classical strategy for identification of cysteine-rich peptides on A. thaliana predicted proteome, we found those two defensins, but a number of questions araised from that, including their tissue of expression. However, this information is sometimes inaccessible depending on the tissue of expression, need for an specific stimulus or even amount of protein or RNA produced.

Fortunately, there is a high resolution transcript map for A. thaliana, where we could identify the expression of both defensins on flowers, ovules and seeds. This is interesting because the other known class II defensins are expressed in a similar context, in flowers for solanaceus species and seeds for poaceous species.

In addition, given the evolutionary distance among Brassicaceae, Solanaceae and Poaceae families, these class II defensins could be spread among all flowering plants. We do not know the function of A. thaliana’s class II defensins, but for solanaceous and poaceous’ class II defensins, they present antimicrobial function. Do the A. thaliana’s class II defenins have the same function?

Well, the actual function we do not know, but the predicted structures seem to be very similar to classical plant defensins. In addition, the genes that codes these defensins in A. thaliana seem to be a result of duplication process, because they are neighbors and their sequences share ~70% of identity.

In fact, despite being a in silico study, a number of hypothesis emerged, which remember me about an article by Markowertz on Plos Biology, “All biology is computational biology” (https://doi.org/10.1371/journal.pbio.2002050). The application of computational methods to study the structure, function and evolution of proteins is a very exciting field. And this article is a good example of modern biology application.

Quality assessment:
Originality ☆☆☆☆✭
Rigor ☆☆☆☆✭
Significance to the field ☆☆☆☆☆
Interest to general audience ☆☆☆☆✭
Quality of writing ☆☆☆☆✭
Overall quality of the study ☆☆☆☆✭

Reference
Costa et al. (2020) In silico characterization of class II plant defensins from Arabidopsis thaliana. Phytochemistry, vol 179, 112511. https://doi.org/10.1016/j.phytochem.2020.112511

#PrePrintFeedback: “AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens”

Main Findings

The preprint by Li et al. describes a new deep learning model for prediction of antimicrobial peptides and its applications to identify these peptides on bullfrog genome.

Strengths

Deep learning application is a hot topic in machine learning area
AMPlify outperforms the methods in the benchmarking
AMP Scanner was retrained with AMPlify data sets
Careful selection of non-AMP sequences
Application in a real world scenario (screening the bullfrog genome)
Antimicrobial activity determined for pathogens from WHO priority list

Limitations

Section Hyperparameter tuning and model architecture is not biologist-friendly
Loose (https://doi.org/10.1038/nature05233) and Nagarajan (https://doi.org/10.3390/data4010027) datasets were not used as external validation data sets
The benchmarking lacks classical prediction systems (e.g. AntiBP2 and CAMP)
The problem of shuffled peptides was not addressed
The preprint lacks a pipeline flowchart figure
The web server was not implemented
The peptide screening did not include peptides predicted as non-AMPs

Comments

In the field of antimicrobial activity prediction, there are some classical problems that were not overcome in more than ten years of research. The first one is the absence of a non-antimicrobial peptides data set. It seems that we just accepted the use of sequences from Swissprot without the ‘antimicrobial’ annotation to create this data set. Li et al. were more rigorous with this data, which could help to explain AMPlify best performance on the benchmarking.

The second problem is related to the descriptors, which are not necessary when using deeplearning. However, the key problem of shuffled peptides (https://doi.org/10.1016/j.jtbi.2017.05.011) was not addressed by the authors. And this problem could explain some of their results in the bullfrog genome screening.

From the eleven predicted sequences, only four demonstrated antimicrobial activity, resulting in a probability of correct prediction of positive peptides of 0.36. In fact, the eleven peptides have characteristics of AMPs, however, because the shuffled problem was not addressed, we don’t know if these results could be due to the compositional bias. In addition as the authors themselves stated “the size of the training data is still small relative to the data typically employed in most deep learning applications”.

An interesting feature is that they retrained the AMP Scanner with their own data, allowing the comparison between the algorithms, not the systems. This reinforces what other manuscripts have shown, regardless the algorithm, if the system is trained with similar data, the outcome is similar. Because AMPlify has a slightly outperformed AMP Scanner (~5%), but both systems showed statistics higher than 90%.

Besides, AMP Scanner is not the only deep learning predictor available on the web, there is another system which would be interesting to compare, AxPEP (https://doi.org/10.1016/j.omtn.2020.05.006).

Regarding the antimicrobial screening on bullfrog genome, I checked the peptide molecular masses using protparam, and they didn’t match. It is not clear whether some modifications were made on peptides. Also, the peptides presented a rana box motif (https://doi.org/10.3389/fmicb.2018.02846), but it was not clear wheter they were synthesized with or without the disulfide bridge.

There is a very specific point that should be highlighted. In discussion the authors stated “it has the potential to play a role in de novo AMP design or enhancement”, well, considering that designed peptides are quite similar to AMPs, but a number of them are inactive, AMPlify should not be used for such purpose, mainly because the Loose data set was not included in the system assessments.

Quality assessment:
Originality: ☆☆☆☆✭
Rigor: ☆☆☆✭✭
Significance to the field: ☆☆☆✭✭
Interest to general audience: ☆☆☆✭✭
Quality of writing: ☆☆☆☆✭
Overall quality of the study: ☆☆☆✭✭

Reference:

Li et al. 2020. AMPlify: attentive deep learning model for discovery of novel antimicrobial peptides effective against WHO priority pathogens. BioRxiv. (Version 1). doi: https://doi.org/10.1101/2020.06.16.155705

An SVM model based on physicochemical properties to predict antimicrobial activity from protein sequences with cysteine knot motifs

Ten years ago, I was publishing my first manuscript as 1st author, a manuscript about antimicrobial activity prediction, which would be one of the pillars of Porto Reports. Because of its inaugural character, I choose that article to be subject of the first post on Porto Reports Legacy.

“An SVM model based on physicochemical properties to predict antimicrobial activity from protein sequences with cysteine knot motifs”, this manuscript is far from being a perfect manuscript, but it has some strengths, including an innovative strategy for antimicrobial activity prediction. In fact, the innovation was the main strength for worthing the publication. And by today, I use this work to teach what to do and what not to do in an scientific publication.

Briefly, the manuscript describes the construction of an antimicrobial activity prediction system using support vector machine as the machine learning algorithm and physicochemical properties as the sequence descriptors. The system reached a good accuracy (~80%) using the polynomial kernel.

The main limitations were (i) the weak English, and in fact, the incorrect use of verb tenses in several manuscript sections; and (ii) the non-contextualization of the computational problem, which makes the manuscript hard to understand for scientists from biological sciences.

Nevertheless, I need to talk about the strengths! To create something innovative, we need to be creative and in 2010 there was a wide field to explore in this topic. In fact, there was only one manuscript until then related to prediction of antimicrobial activity. Thus in this condition, the idea does not need to be bright, it just need to be different.

The difference was not in the algorithm itself, but on how to train the machine learning algorithm. Because the first manuscript demonstrated that there are only slight differences in the predictive power of different algorithms with the same training schemes. Thus, we used physicochemical properties to train the system, reaching a good accuracy. However, there were some limitations on our technique, that were well described and properly addressed, which is always a strength.

This system was the precursor of CS-AMPPred, and due to some errors in the choice of methods, including the support vector machine engine, the original system did not reach its actual potential.

Quality assessment:

Originality: ☆☆☆☆✭
Rigor: ☆☆☆✭✭
Significance to the field: ☆☆☆☆✭
Interest to general audience: ☆☆☆✭✭
Quality of writing: ☆☆✭✭✭
Overall quality of the study: ☆☆☆✭✭

Reference:
Porto W.F., Fernandes F.C., Franco O.L. (2010) An SVM Model Based on Physicochemical Properties to Predict Antimicrobial Activity from Protein Sequences with Cysteine Knot Motifs. In: Ferreira C.E., Miyano S., Stadler P.F. (eds) Advances in Bioinformatics and Computational Biology. BSB 2010. Lecture Notes in Computer Science, vol 6268. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-15060-9_6