Probabilistic Models for Predicting Mutational Routes to New Adaptive Phenotypes

,

www.bio-protocol.org/e3407 : e3407. DOI:10.21769/BioProtoc.3407 evolutionary routes then more likely mutations that confer small gains in fitness may be observed more often than rare mutations that confer large fitness gains; being first may be more important than being best. In such cases, accurate prediction of the mutational routes to adaptive phenotypes requires knowledge of their relative rates of phenotypic production.
The rates of phenotypic production can be shaped by a variety of factors. For instance, a higher rate of phenotypic production can be caused by mutational hotspots, which are sections of DNA that increase the frequency of mutational events in nearby genes. Another possibility is that some genes have a greater capacity to translate mutation into phenotypic change, i.e., have a larger number of sites that can be mutated with functional effects. This can due to the properties of the individual protein (Lind et al., 2017) or ncRNA, in terms of mutational robustness, but also because proteins and ncRNAs have different functions in the molecular interaction networks underpinning adaptive phenotypes (Lind et al., 2015 and 2019). A simple example of the latter case would be when expression of a gene is controlled by both a negative and a positive regulator, activation of that gene occurs more often through mutations in the negative regulator simply due to loss-of-function mutations being more likely than gain-of-function mutations.
To account for the factors affecting the rate of phenotypic production, mechanistic information is needed. An arena in which mechanistic information has been used to successfully predict evolution is in metabolism (Edwards and Palsson 2000;O'Brien et al., 2015). There is a wealth of data on the biochemical reactions involved in central metabolism in different microbial species. This data can be used to form explicit metabolic models that in conjunction with a technique known as flux balance analysis can predict the growth rate of an organism in simple environments. By manipulating which reactions are present in the metabolic model, the phenotypic effects, i.e., growth rates, of mutational knockouts can be predicted. Similar metabolic models have also been used to predict how organisms will interact in different environments via the exchange of biochemical compounds including spatio-temporal interactions (Liu et al., 2015;Bocci et al., 2018). While the metabolic models have been successful in certain evolutionary predictions, they are limited in their applicability: they focus mainly on growth phenotypes in environments in which organisms are assumed to be actively reproducing.
Moreover, the growth phenotypes are computed using large-scale models with hundreds or thousands of reactions and typically involve optimization of a "biomass'' function of some 50+ variables. The models also do not usually incorporate any type of regulation, e.g., transcriptional/translational control or protein modifications, but see Chandrasekaran and Price, 2010 for one such example. Thus, they are not well suited to generalizations outside of metabolism.
Here we describe a method to predict the rates that mutational changes to different molecular networks produce phenotypic change. One advantage of this method is that only knowledge of the general architecture of the molecular networks is needed to make a prediction. However information about reaction rates, concentrations, mutation rates and gene size, mutational robustness of components can be included if known. The method can incorporate different types of reactions, for example conformational changes and enzymatic reactions, and the phenotypes predicted are not limited to optimization for growth rate. Predictions generated by the method could potentially be Choi and Chan, 2015) can be incorporated into the model described here to adjust the rates of disabling and enabling mutation in different genes. The method described here is also useful for providing null models in order to test the causes of repeated evolution (Lind, 2018;Lind et al., 2019). It could also be one component in understanding the molecular bases of complex genetic diseases and for evolutionary forecasting of antibiotic resistance and cancer, especially when experimental data is incomplete.

Computer
Any desktop computer that fulfills the system requirements for the programming language(s) used. Here a desktop computer with the following configuration was used: iMac with a 3.6 GHz Intel Core i7 processor and 16 GB 2400 MHz DDR4 memory MATLAB was used to solve the differential equations, analyze the resulting data, and generate figures. However, other programming language(s) would have sufficed so long as they can be used to numerically solve differential equations and perform simple mathematical operations, e.g., Python, Julia, Octave, or Mathematica. We note that in the associated paper (Lind et al., 2019) a combination of languages was used but for simplicity here we have put everything into a single language. MATLAB code for the procedures RunModelComparison and PlotModelComparison is available at https://github.com/ericlibby/BioProtocol.

Procedure
This protocol describes a method for computing the relative likelihood that a mutational change will translate into a phenotypic change in two molecular pathways. The pathways do not have to produce the same phenotype but there should be a way of determining what the phenotype is, based on the interactions between pathway components. Thus, the protocol assumes that the mechanistic details of the molecular pathways are understood; however, information may be missing concerning the reaction kinetics, the concentrations of molecular compounds, and/or the effects of mutations. This protocol outlines a procedure that randomly samples many models of the pathways and compares the likelihood that mutations in the pathway-that affect the reaction kinetics-alter the phenotype(s). The steps of the procedure are illustrated in Figure 1 using the same notation as in the text below. Figure 2 and Figure 3 presents the procedure in a pseudocode format and the full MATLAB code is available at https://github.com/ericlibby/BioProtocol. 2. For each set of biochemical reactions, formulate a system of differential equations that describes the reaction kinetics of the indicator (see Figure 1 for example). Each pathway is likely affected by biochemical compounds whose dynamics are governed by external pathways.
It is often better to treat these compounds as constants rather than include more differential equations for their dynamics. For instance, if a pathway affects an indicator through phosphorylation then the intracellular pool of phosphate groups may be held as a constant rather than modeled through an additional set of differential equations. In general, it also helps to keep a similar level of detail/abstraction across the models of the pathways for better  AWS is a system of ordinary differential equations describing all reactions and rates. In this case, the active dimer RR is the indicator of phenotypic change.
B. Choose parameter distributions to solve the differential equations We assume that the systems of differential equations used to model the pathways cannot be solved analytically. To solve each system of differential equations numerically, three key pieces of information are needed: a. Initial concentrations of the molecular compounds.
b. Reaction rates for the interactions between compounds. c. Time interval over which to solve the differential equations.
It is possible that all pieces of information are known for a model in which case there will be only one numerical solution for each pathway model. In most cases, however, information is missing and must be estimated.  change the internal dynamics of a pathway, and we classify mutations as enabling or disabling based on whether they increase/decrease reaction rates. Thus, we need a way of determining new reaction rate(s) in the differential equation model of a pathway, following a mutation (or a set of mutations). Ideally, one would have information on all possible mutations and their effects on reaction rates. However, this is unlikely to be available. In the absence of empirical data, a distribution should be picked such that given a reaction rate and whether a mutation increases or decreases reaction rates, a new reaction rate is determined. For example, in (Lind et al., 2019) enabling mutations were assumed to affect reaction rates multiplicatively and thus new reaction rates were computed as the product of the original reaction rate and a multiplicative factor sampled from 10 U[0,2] (similarly a factor of 10 U[−2,0] ) was used for disabling mutations. (https://github.com/ericlibby/BioProtocol/blob/master/RunModelComparison.m) to generate the data. The pseudocode for "Run Model Comparison" is included below as Algorithm 1 (Figure 3).
"Run Model Comparison" use the procedure "Phenotype Change" for each pathway to compute the probability that mutation in a pathway causes a phenotypic change. The pseudocode for "Phenotype Change" is shown in Figure 3. "Phenotype Change" uses several procedures to set the parameter distributions (Procedure B, C above). The pseudocode for these are shown in Figure 4.   Further analyses can investigate the standard deviation or maximum/minimum of the likelihoods.
We recommend that additional analyses be performed to validate any conclusions. In particular, the analyses can be performed again with changes to the probability distributions to test for parameter sensitivity. Alternatively, the number of iterations can be increased/decreased by some factor to evaluate whether the contour plot remains stable. Examples of these analyses can be found in