Low genetic diversity and population connectivity fuel vulnerability to climate change for the Tertiary relict pine Pinus bungeana

Endemic species are important components of regional biodiversity and hold the key to understanding local adaptation and evolutionary processes that shape species distributions. This study investigated the biogeographic history of a relict conifer Pinus bungeana Zucc. ex Endl. confined to central China. We examined genetic diversity in P. bungeana using genotyping‐by‐sequencing and chloroplast and mitochondrial DNA markers. We performed spatial and temporal inference of recent genetic and demographic changes, and dissected the impacts of geography and environmental gradients on population differentiation. We then projected P. bungeana's risk of decline under future climates. We found extremely low nucleotide diversity (average π 0.0014), and strong population structure (global FST 0.234) even at regional scales, reflecting long‐term isolation in small populations. The species experienced severe bottlenecks in the early Pliocene and continued to decline in the Pleistocene in the western distribution, whereas the east expanded recently. Local adaptation played a small (8%) but significant role in population diversity. Low genetic diversity in fragmented populations makes the species highly vulnerable to climate change, particularly in marginal and relict populations. We suggest that conservation efforts should focus on enhancing gene pool and population growth through assisted migration within each genetic cluster to reduce the risk of further genetic drift and extinction.


Introduction
Endemic species are important components of regional biodiversity and play critical roles in ecosystem functioning (Myers et al., 2000;Hampe & Jump, 2011). By definition, these species exist only in limited geographic ranges and are often uniquely adapted to local habitats (Anderson, 1994), which increase their vulnerability to environmental change. Species become endangered for two main reasons: (i) loss of habitat due to geological and climate events or to land use change, which lead to population contraction; and (ii) inbreeding and subsequent loss of genetic variation. During these processes, many species are reduced to small populations and exist as relicts of past climate. Genetic monitoring in relict species provides the essential information for supporting risk management relevant to their conservation as these ancient species and populations represent valuable assets for uncovering phylogenetic history, the genetic basis of local adaptation, and evolutionary potential under changing climate (Hewitt, 2000;Hampe & Petit, 2005).
Classical population genetic investigations are an indispensable component in conservation biology. Recent advances in landscape genomics offer an additional approach to explore genetic adaptation across landscapes by associating genomewide scans with geographic and environmental gradients. This knowledge can be incorporated into predictive models to evaluate the risk of genetic mismatch of extant populations to future conditions (Fitzpatrick & Keller, 2015). The mismatch, also known as genomic offset, or genomic vulnerability, is measured by a genetic distance between the current and required genomic composition for matching a future environment (Fitzpatrick & Keller, 2015;Rellstab et al., 2021). It thus can be explored to assess the risk of maladaptation and identify geographic regions where the current genotypeclimate relationships will most likely be disrupted if migration and de novo mutations cannot compensate for the required diversity .
Pinus bungeana Zucc. ex Endl. is a haploxylon pine within the genus Pinus, sect. Quinquefoliae, subsect. Gerardianae. Gerardianae consists of three species, all with very restricted distributions in eastern Asia (Gernandt et al., 2005;Jin et al., 2021). According to the most recent phylogeny, subsect. Gerardianae evolved in the Eocene and P. bungeana diverged from its two sister species in the Oligocene 25.7 million years ago (Ma) (Jin et al., 2021). Pinus bungeana is confined to central-northern China and occurs in small and isolated populations scattered along the Qinling, Daba, and Taihang Mountains ( Fig. 1A; Zhao et al., 2014). This pine has a unique morphology but its most striking feature is the smooth, gray bark that sheds in round patches, leaving olive and gray patterns on the tree trunk that resemble camouflage fabric. The species is attractive and desired for temples and gardens and has thus been heavily exploited for centuries. Past climate change, habitat destruction, and human impacts have relegated its forests to mountain tops where natural regeneration is poor (Zhao et al., 2014). Such a distribution makes it especially vulnerable to stochastic environmental events, and it is now red-listed as an endangered (EN) species in the threatened species list of China's higher plants (Qin et al., 2017).
Despite having a Red List EN status, no comprehensive study has been undertaken on the genetic diversity and evolutionary history in P. bungeana. The few available studies, based on a small number of DNA markers (seven nuclear simple sequence repeat (SSR) loci in Zhao et al., 2014; 10 nuclear loci and one chloroplast DNA (cpDNA) marker in Yang et al., 2019), did not provide a genome-wide estimate of diversity, although they revealed a genetic structure along the Qinling and neighboring mountains. The Qinling Mountains are an east-west mountain chain in central China approximately 1600 km long with an altitude of 2000-3700 m a.s.l. (Rost, 1994). This massive physical barrier blocks the northward movement of the monsoons and forms a natural boundary between the dry temperate climates of northern China and the subtropical climates of southern China. The complex topography of high mountains and deep valleys in the Qinling Mountains hosts diverse microclimates that have facilitated the evolution of a large number of endemic species and contributed significantly to the diversity of East Asian flora (Huang et al., 2016). Such landscapes at low latitudes likely created refugia for plant species during the last glacial maximum (LGM) and thus harbor pre-LGM relict populations (Hampe & Jump, 2011). The evolutionary history and forces that have affected the diversity in P. bungeana are not yet clear. It is not known whether P. bungeana lost genetic diversity during the Quaternary climate oscillations, what limits gene flow, or whether a reduced gene pool and inbreeding are currently constraining its adaptation. These questions are highly relevant for understanding how relict populations are formed and maintained and respond to future climate change, and thus the development of an appropriate conservation plan for the species.
This study was designed to gain an insight on the recent evolutionary processes that have impacted the distribution of genetic diversity in P. bungeana, and its risk of decline under climate change. We sampled 21 populations of P. bungeana across its distribution range and analyzed nuclear genome diversities using genotyping-by-sequencing (GBS). We also screened mitochondrial DNA (mtDNA) and cpDNA to estimate diversity in these uniparentally inherited genomes. We performed spatial and temporal inference of recent genetic and demographic changes. Our objectives were to: (i) assess the levels of genetic diversity and spatial genetic structure in P. bungeana, (ii) identify driving forces for population differentiation and barriers to gene flow, (iii) establish the recent history of population size changes and range shift and their impacts on diversity, and (iv) predict its genomic offset under future climate scenarios, to guide management actions aiming to reduce extinction risk.

Population sampling and cpDNA and mtDNA sequencing
The distribution of Pinus bungeana has shrunk over the centuries and extant natural populations are rare and confined to isolated localities in mountainous regions in central to northwestern China along the Qinling, Daba, and Taihang Mountains (Fig. 1A). The Daba Mountains are to the south of Qinling, and the Taihang Mountains lie to the north of the eastern end of Qinling. We collected 21 populations that are recognized as remnant populations of the natural distribution of P. bungeana (Zhao et al., 2014;Yang et al., 2019), covering the different mountain groups (Fig. 1A). The names, locations, and sample sizes of each population are listed in Table 1. We collected needles from eight to 15 randomly selected mature trees separated by at least 30 m in each population. Needles were dried in silica gel until DNA isolation. Genomic DNA was extracted using a Plant Genomic DNA Kit (Tiangen).

Genotyping-by-sequencing
We used GBS to analyze the diversity in the nuclear genome. We prepared a GBS library containing 200 individuals (Table 1) using a PstI high fidelity restriction enzyme (New England Biolabs) following the procedure of Pan et al. (2015) with minor modifications. Briefly, restriction enzyme digestion and adapter (with individual barcode) ligation were carried out simultaneously on 200 ng DNA from each sample. The digested and ligated DNA were then pooled, purified, and polymerase chain reaction-amplified. The amplified products were purified first, then separated on a 2% precast gel (E-gel; Thermo Fisher Scientific), and fragment sizes of 330-450 bp were recovered and purified. Paired-end sequencing (2 × 125 bp) was carried out on an Illumina HiSeq 2500 system. Sequence read quality was assessed using FastQC (http:// www.bioinformatics.babraham.ac.uk/projects/fastqc/). Adapter sequences and low-quality bases (Phred quality <20) were removed from the tail of each read using fastp (Chen et al., 2018). Clean reads were cataloged using the process_radtags module of Stacks version 2.0 (Catchen et al., 2011). Sequences were then mapped to the sugar pine (Pinus lambertiana Douglas) genome (Stevens et al., 2016) using the BWA-MEM algorithm with default parameters (Li, 2013). Variants were called using SAMtools and BCFtools pipelines (Li, 2011).
Several filtering steps were undertaken to minimize genotyping errors: indels and single nucleotide polymorphisms (SNPs) within 3 bp from any indels, or mapping quality <30 were removed; SNPs with genotype quality <10 or read depth <5 were masked as missing; and SNPs with missing rate >50%, minor allele frequency (MAF) <5%, heterozygosity >70%, or allele number >2 were also removed. The remaining SNPs (4080) were used for population genomic analyses, except for nucleotide diversity (π) and demography analyses, for which no MAF filtering was applied.

Population structure and diversity analyses
To obtain an overview of the spatial pattern of diversity, we examined the population genetic structure with ADMIXTURE version 1.30 (Alexander et al., 2009) at K = 1-20 and repeated 20 times using random seeds. The optimal number of putative genetic groups (K value) was assessed using the delta-K method (Evanno et al., 2005). The estimated membership coefficients of each individual were aligned and visualized across 20 independent runs using CLUMPAK (Kopelman et al., 2015). We also used TESS3, which combines matrix factorization and spatial statistics to explore estimates of ancestry coefficients (Caye et al., 2016). TESS3 was run in five replicates at K = 1-20. The optimum K value was assessed using the crossvalidation result. Population relationships were further evaluated using principal component analysis (PCA) as implemented in EIGENSOFT version 6.1.4 (Price et al., 2006) and the neighbor-joining (NJ) method as implemented in PHYLIP version 3.698 (Felsenstein, 2005).
Genetic diversity in each population was estimated as average pairwise nucleotide diversity (π), observed heterozygosity (H O ), and inbreeding coefficient (F IS ). H O , F IS , and population differentiation (F ST ) were calculated using MAFfiltered informative SNPs with the R package HIERFSTAT (Goudet, 2005), while π was calculated using both informative and invariant sites with VCFtools (Danecek et al., 2011). The 95% confidence interval (CI) for F IS was estimated by jackknifing a sample size of 100 SNPs for 10 000 resamples.

Estimated effective migration surface
Spatial patterns in genetic diversity and gene flow were estimated and visualized using estimated effective migration surface (EEMS) (Petkova et al., 2016). This method is based on a stepping-stone model (Kimura & Weiss, 1964) to identify regions where genetic similarity decays slower or faster than expected under pure isolation by distance (IBD), thereby identifying gene flow barriers or corridors. This is achieved by estimating the effective migration rate (q) and the effective diversity rate (m) from pairwise genetic dissimilarities calculated on a geographical grid. We ran three independent analyses using 800 deme size with a burn-in 1 000 000 and Markov chain Monte Carlo (MCMC) length of 20 000 000 iterations in "runeems_snps" program to ensure the convergence of the MCMC chains ( Fig. S2A). We adjusted the mrateMuProposalS2, qEffct-ProposalS2, qSeedsProposalS2, mSeedsProposalS2, and mEffctProposalS2 parameters, such that the acceptance proportions for all parameters, except degrees of freedom, were within 20%-30%, as suggested by the manual. The reliability of EEMS models was evaluated by checking the linear relationship between the observed and fitted dissimilarities for within-and between-demes (Figs. S2B, S2C). Final spatial visualizations illustrating migratory surfaces were plotted using the R package rEEMSplot (Petkova et al., 2016).

Isolation by distance, isolation by resistance, and isolation by environment
To investigate the role of geography and environment in shaping spatial genetic differentiation, we tested IBD, isolation by resistance (IBR), isolation by environment (IBE), and correlations between them (Fig. 2). Normalized pairwise genetic distances F ST /(1−F ST ) were calculated among populations using "hierfstat". Geographic distances were calculated based on latitude and longitude information of populations using R package "geosphere" (Hijmans, 2019). Resistance distances were calculated based on an elevation raster layer and latitude and longitude information of sites by using topoDist in the R package topoDistance (Wang, 2020), the number of directions for movement between cells was set to 8. To calculate environmental distances between populations, we first extracted 19 bioclimatic variables from WorldClim (http://www.worldclim.org/current), representing annual and seasonal temperature and precipitation conditions, and six global UV radiation values (https://www.ufz.de/gluv/ index.php?en=32435) for each sampling location (Table S2). We chose these environmental variables because of their known impacts on plant development and niche diversification in conifer species in this geographical region (Xia et al., 2018;Zhao et al., 2020). All environmental layers were converted to the same resolution at a grid cell size of 30 arc-seconds (c. 1 km 2 ). After considering the ranked importance and correlation among these variables (see Section 2.7), 12 environmental variables ( Fig. 2G) with Pearson correlation coefficient | r | ≤0.8 were retained for the IBE analysis. These 12 selected environmental variables were first scaled and then used to calculate pairwise Euclidean distances between sites.
We assessed the association between the genetic and geographic and environmental distances using the Mantel test in the R package "vegan" (Oksanen et al., 2019), with significance determined using 999 permutations. We undertook a series of redundancy analyses (RDA) to quantify the contribution of geography and environment on population genetic differentiation using vegan. Redundancy analyses combine multiple linear regression analysis with PCA to assess the relative effect of matrices of independent variables to the variation in the dependent variable matrix. In this analysis, the 12 scaled environmental variables (representing IBE) and the distance-based Moran's eigenvector map (six dbMEMs, representing IBD) were used as two independent matrices, and the Hellingertransformed MAF was the dependent matrix. We did not include IBR in RDA because we do not have appropriate resistance matrix for each site. To avoid model overfitting, we undertook forward selection with α = 0.05 on geographic and environmental variables separately. Finally, one geographic variable (dbMEM1) and three environmental variables (BIO4, BIO5, and BIO12) were reserved for RDA. We undertook a series of full and partial RDA model tests to differentiate the independent effects of geography and environment by reciprocally constraining one of the two factors. The significance of these models was determined using the "anova.cca" function based on 999 permutations.

Inference of demographic history
The population genetic structure analyses identified two major groups (West and East) in our samples (see Section 3). To infer the divergence history of these two groups, we used fastsimcoal2 version 2.6 (Excoffier et al., 2013), a coalescent simulation-based method, to establish the most likely demographic scenarios for their past evolution. We tested 12 scenarios, all of which began with the split of the ancestral population into two derived populations but differed in terms of: (i) whether divergence occurred with migration or not, (ii) levels of gene flow between the two groups, and (iii) the mode of population size change after the split (Fig. S3). Because missing data can bias the estimates of site frequency spectrum (SFS), we used a down-sampling procedure to generate the folded SFS using a Python script available on Dryad (Papadopoulou & Knowles, 2015) with minor modification (https://github.com/huiliu/Bioinformatics-Scripts/blob/master/Scripts/Python/ sampleDownMSFS_Hui_final.py). The final folded SFS contained 1 142 666 sites from GBS including low-frequency alleles and invariants. Considering the low confidence of rare alleles, we ran fastsimcoal2 based on SFS with "--nosingleton." For each model, we performed 50 independent runs with 100 000 coalescent simulations as well as 40 expectationconditional maximization cycles to estimate the global maximum-likelihood of each model. The best-fitting model was selected based on the maximum value of likelihood over the 50 independent runs using Akaike's weight of evidence (Excoffier et al., 2013). The goodness of fit of the best model was assessed by comparing the observed SFS with expected SFS, which was obtained by 100 000 coalescent simulations under the maximum-likelihood estimates of population parameters. Parameter confidence intervals of the best model were obtained by 100 parametric bootstraps, with 50 independent runs in each bootstrap. We assumed a mutation rate of 7 × 10 −10 per site per year and a generation time of 50 years for the genus Pinus (Willyard et al., 2007) to convert model parameters to absolute values. Population BD was removed from this analysis due to its uncertain grouping (see Section 3).

Gradient forest predictions of genomic offset
To identify the environmental variables that best explained the distribution of genetic variation, we undertook gradient forest (GF) analyses using R package "gradientForest" (Ellis et al., 2012). We removed SNPs that were polymorphic in fewer than five populations to ensure robust regression. Each GF model was tested using 500 regression trees per SNP while setting all the other parameters at default values. At first, all 25 environmental variables were included in GF models. After comparing the ranked importance, 12 environmental variables (Fig. 2G) with Pearson correlation coefficient , resistance distance (B), and environmental distance (C), and correlation between geographic, resistance, and environmental distances (D-F). Gray shading around each regression line represents the 95% confidence interval. G, Ranked importance of 25 environmental variables based on gradient forest analysis (below the diagonal), and Pearson's correlation coefficient between these variables (above the diagonal). Asterisks (*) indicate the 12 top-ranked, uncorrelated environment variables (Pearson's | r | < 0.8). H, Results of redundancy analyses that partition the sources of genetic variation among populations in P. bungeana into environment (env.), geography (geog.), and their combined effects. *P ≤ 0.05, **P ≤ 0.01, ***P ≤ 0.001. | r | ≤ 0.8 were retained to build the final GF models and predict the genomic composition of each grid point within the distribution range of P. bungeana. The resulting multidimensional genetic patterns in geographic space were summarized using PCA with the first three principal components (PCs) assigned to red, green, and blue, respectively. The similarity of color represents the similarity of the expected genetic composition. This allowed us to visualize the differences in allele frequencies along environmental gradients across space.
We used GF to predict the genomic offset of P. bungeana following the method of Fitzpatrick & Keller (2015). Genomic offset is a measure of mismatch between current and predicted genomic compositions under future environmental projections using contemporary genotype-environment associations as a baseline (Fitzpatrick & Keller, 2015;Rellstab et al., 2021). To identify the spatial regions where genotype-environment relationships are most likely to be affected by climate change, we first used the current GF model to predict genetic compositions by year 2070 under two representative concentration pathway (RCP) scenarios, a milder RCP2.6 and a more severe RCP8.5, representing up to 2°C and 4°C temperature increase, respectively, by year 2100. Four of the 12 environmental layers (UVB1, UVB2, UVB3, and UVB5) were not available for future projection. We left them unchanged and used the projections for the other eight variables (BIO1, BIO3, BIO4, BIO5, BIO12, BIO13, BIO15, and BIO19) to predict future genomic compositions. We calculated the Euclidean distances between the current and future genomic compositions to represent the scale of genetic change needed to match environmental change, with higher values indicating greater genomic vulnerability of the population (Fitzpatrick & Keller, 2015).

Genetic diversity and population structure
Genotyping by sequencing on the 200 trees generated a total of 119 080 484 paired-end reads, of which 115 067 919 (96.6%) reads passed initial quality filters. After removing 14 lowdepth individuals, 186 valid individuals from 20 populations were used for the subsequent analyses, with 6-10 individuals per population (Table 1). We recovered more than 36 million sites, but stringent filtering reduced it to 1 142 818 sites. After removing nonpolymorphic loci, 30 300 SNPs remained; a further MAF 0.05 filter shrank it down to the final number of 4080 SNPs.
Analysis of genetic variation, as measured by average nucleotide diversity (π) and observed heterozygosity (H O ), showed low values with an overall π of only 0.0014 across all 1 142 818 sites and an average H O of 0.2746 across 4080 polymorphic SNPs (Table 1). Inbreeding coefficient F IS varied among local stands; most stands had an F IS close to zero except for three populations (QS, QY, WZS; Table 1) where high positive F IS were detected, indicating high levels of inbreeding. ADMIXTURE detected K = 2 as the optimal number of genetic clusters in the 20 populations (Fig. S4A), which divided the populations into an east (E) group (including 15 populations) and a west (W) group (including five populations), with high degrees of admixture detected in the transition zone of the two groups (Figs. 1B, 1F). TESS3 revealed a northeastsouthwest differentiation at K = 2, similar to the ADMIXTURE result (Fig. 1B). The PCA corroborated the ADMIXTURE pattern, except for population BD, which was placed more into W cluster (Fig. 1C). The first two PC eigenvalues were significant (Tracy-Widom test, P < 0.001) and explained 29.53% of the total genetic variance (17.56% and 11.97% for PC1 and PC2, respectively). The NJ tree revealed a similar grouping to that by PCA, with population BD placed in the W group (Fig. 1D). Considering the PCA and NJ results and the geographical location of BD, we assigned this population into the W cluster in the following analysis. The W group included the populations in the west Qinling, whereas the E group covered a much larger geographic area, including eastern Qinling, eastern Daba, and Taihang Mountains (Fig. 1B).
Genetic differentiation (F ST ) ranged from 0.039 to 0.528 between populations, with high values observed between the most southern population LH and the others (Fig. 1E). F ST between W and E group was 0.098, and the overall value across all populations was 0.234. Populations within the E group had weaker differentiation (F ST = 0.180), while those within W group showed an F ST as high as 0.236, suggesting longer isolation history and limited gene flow even among closely located sites in the west. Of the nine mtDNA loci and nine cpSSR loci analyzed, only one cpSSR locus, PTS15169, was variable. Genotyping of the PTS15169 locus in 104 trees from 12 populations detected only two haplotypes (Fig. S1). The most abundant haplotype A was shared by 11 populations, whereas haplotype B was present in only three populations (JP, AK, and LT) that are located relatively close to each other (Fig. S1). The limited distribution of haplotype B seems to suggest a recent mutation and/or migration. The overall haplotype diversity was 0.019. Our sampling for cpDNA analysis was limited, several populations had only two individuals analyzed. Nonetheless, we observed fixation of alternative haplotypes in populations with good sample sizes, which suggests restricted gene flow mediated by pollen (cpDNA is paternally inherited in Pinus; Neale & Sederoff, 1989) and a strong founder effect from stochastic events.

Isolation by distance, resistance, and environment
Mantel tests on genetic distance versus geographical, resistance, and environmental distances revealed good linear relationships with r = 0.3752 (P = 0.001), 0.7457 (P = 0.001), and 0.3871 (P = 0.025), respectively (Figs. 2A-2C), indicating a similar pattern of IBD and IBE, but stronger IBR. Autocorrelation between geographical, resistance, and environmental distances was strong (r = 0.4413-0.5842, P < 0.01; Figs. 2D-2F) and thus the observed IBD, IBR, and IBE could be confounded. We therefore used RDA to dissect the independent effects of environment and geography.
All RDA models performed using the three retained environmental (BIO4, BIO5, and BIO12) and one geographic (dbMEM1) variables after forward selection were highly significant (P ≤ 0.001). The RDA results showed that, when controlling for IBD, the exclusive contribution of IBE (partial model F~env. | geo. in Fig. 2H) explained 8.45% of the variation. Comparably, when controlling for IBE, the exclusive contribution of IBD (partial model F~geo. | env.) only explained 2.26% of the variation. A total of 18.67% of the variation could be explained by the two components ("total explained"), of which 7.96% was the result of their joint effect ("total confounded"), leaving a large proportion (81.33%) of the variation that cannot be explained by these two factors. This portion of the variation was likely the effect of drift and/or unsampled environmental variables.

Migration barriers
The EEMS contours highlighted a few potential barriers to migration resulting from either historical or contemporary patterns of gene flow (Fig. 3A). One significant barrier is in the central part of the Qinling Mountains, where the populations along this mountain chain were separated into western and eastern domains. The west Qinling also showed high resistance to gene flow. Two additional barriers were indicated, one on the northern side and the other on the southeast margin of Qinling. Another strong barrier was at the intersection between the Qinling and Taihang Mountains, which constrains the connectivity between southern and northern populations (Fig. 3A). All these barriers were supported by high posterior probabilities (>0.90) of migration parameters (Fig. 3B). These scattered migration barriers corroborate with the detected significant IBR (Fig. 2B). There are areas on the southern side of Qinling and the eastern edges of Taihang (Fig. 3A, blue areas), where the effective migration rates tend to be slightly higher than expected under an IBD model. Those regions are generally at lower elevations along valleys (e.g., WZS, YTS, and YL).

Demographic history
We tested 12 demographic models, of which the best-fitting model, based on Akaike's weight of evidence, was a twopopulation isolation-with-migration model (Figs. 4, S3). The expected SFS obtained from the best-fitting model projected the observed SFS well (Fig. S5), suggesting that the demographic parameters recovered are good estimates of the past population history. The estimates of effective population sizes, divergence time, gene flow, and their associated 95% CIs are listed in Table 2. The model estimated a divergence time of 4.88 (95% CI,) Ma between the W and E group. The estimated effective population size (N e ) of the common ancestor of the two groups was 3.45 × 10 4 (95% CI, 1.26 × 10 4 -7.25 × 10 4 ). The two groups experienced strong bottlenecks after splitting from the ancestral population, with N e down to 0.79 × 10 4 and 0.43 × 10 4 , respectively. Both groups experienced an instantaneous size change at approximately 50.5 ka (95% CI, 52.33-49.50 ka), at which the W group experienced another four-fold bottleneck whereas the E group expanded by 30-fold. The current effective population sizes of W and E group were estimated to be 0.20 × 10 4 and 1.46 × 10 5 , respectively. The historical gene flow between the two groups was asymmetrical with 1.28 migrants per generation from W to E, with almost none in the opposite direction. Current gene flow between the two groups increased to approximately 4-6 migrants per generation ( Fig. 4; Table 2), suggesting that climate oscillations during the Quaternary likely caused range shifts and thus exchange of genes.

Prediction of genomic vulnerability to climate change
Because signals of IBE were obtained in RDA analysis, we performed GF simulations to extrapolate the genotypeenvironment relationships across the distribution range. The GF models revealed significant differences in genetic composition along latitude and longitude, revealing clear genetic turnover between the west, east, and north regions of the distribution (Fig. 5A). Temperature seasonality (BIO4), annual precipitation (BIO12), annual mean temperature (BIO1), UVB seasonality (UVB2), and precipitation of wettest month (BIO13) were most strongly correlated with the observed genomic variation (Fig. 5B). We simulated the genomic change needed to track the predicted climate change by the year 2070 under RCP2.6 and RCP8.5. The GF modeling predicted that the southeastern margin is at high risk even at the mild RCP2.6 scenario, indicating where the populations are most vulnerable to climate-induced selective pressure (Fig. 5C). Under a more severe climate change scenario, RCP8.5, the degree of genomic offset increased and almost the whole southern distribution and part of the central region are vulnerable (Fig. 5D). Mapping of the shifts in climate conditions under RCP2.6 and RCP8.5 illustrate that temperature seasonality weakens and precipitation strengthens from the current conditions in the southern margins (Fig. S6).

Extremely low genetic diversity and strong population structure in Pinus bungeana
This study presents the first genome-wide population investigation in P. bungeana and revealed a few striking results that characterize its demographic history and genetic status. Examination of over 1 million sites in the sampled trees gave an average nucleotide diversity π of 0.0014, which is several fold lower compared to other Pinus species, such as Pinus sylvestris L. (π = 0.0045; Hall et al., 2021), Pinus taeda L. (ϴ π = 0.0033; Eckert et al., 2013), and Pinus pinaster Ait. (π syn = 0.0044; Grivet et al., 2017). This low level of nucleotide diversity is similar to that of the relict flatneedle pine Pinus krempfii Lecompte (π sil = 0.0011; , which is restricted to only two provinces in Vietnam. The low genetic diversity is further shown by mtDNA and cpDNA analyses. We found no variation over nine mtDNA segments (11 423 bp) among 16 individuals sampled range-wide. In other Asian pine species, these same mtDNA segments revealed high polymorphism and that mtDNA haplotype diversities in Pinus densata Mast., Pinus tabuliformis Carr., and Pinus yunnanensis Franch. were 0.704, 0.824, and 0.748, respectively (Wang et al., 2011). A similar result was obtained for cpDNA. The haplotype diversity over nine cpSSR loci was 0.019 in P. bungeana but 0.935, 0.989, 0.868, and 0.911 at these same loci in P. densata, P. tabuliformis, P. yunnanensis, and P. krempfii, respectively (Wang et al., 2011. The nucleotide polymorphism in nuclear, mitochondrial, and chloroplast genomes of P. bungeana was lowest among the pine species studied thus far. This low genetic diversity for P. bungeana is in accordance with a relict history characterized by demographic bottlenecks and long-term isolation. Prolonged isolation in small populations can lead to increased inbreeding due to mating among related individuals and/or due to recent expansion from a limited number of colonizing individuals . Indeed, high inbreeding coefficient F IS was observed in three populations. Increased inbreeding negatively affects fitness in conifers (Koelewijn et al., 1999). Forest inventory reported low natural regeneration rate of P. bungeana (Zhao et al., 2014), which might well be the consequence of inbreeding depression in this species, resulting in either high empty seed rate and/or low seedling survival and recruitment. Further clarification of inbreeding and its fitness consequences would require crossing experiments followed by fitness examination at different life stages.
Another significant result from this study is the high population differentiation among and within geographical regions, with an overall F ST of 0.234. ADMIXTURE identified two major genetic clusters that divide the distribution into W and E groups. In contrast to the E group that showed a more homogeneous ancestral component and weaker differentiation (F ST = 0.180), high F ST (0.236) was observed within the relatively restricted W group. Within the E group, excluding two populations (AK and NZ) from eastern Qinling decreased the F ST among the remaining populations in Taihang to 0.126. This suggests a recent colonization history in the E cluster of populations north of the Qinling Mountains, and a more ancient isolation history among the west Qinling populations. Our result is similar to an early study on P. bungeana based on seven nuclear SSR loci, which reported a range-wide F ST of 0.215 and a distinct separation between west Qinling populations and the rest of the distribution (Zhao et al., 2014). This level of differentiation is  Note: M1 W→E , M1 E→W , migration rate from west to east and from east to west group after split from the ancestor, respectively; M2 W→E , M2 E→W , current migration rates; N1 W , N1 E , effective population size of west and east group after split from the ancestor, respectively; N2 W , N2 E , current effective population size of west and east group, respectively; N ANC , effective population size of ancestor of west and east groups; T 1 , time of split between the west and east group; T 2 , time of population size changes within the west and east group.
much higher than in most pine species (e.g., Lu et al., 2016;Xia et al., 2018;Hall et al., 2021). Pines are wind-pollinated outcrossing plants and often have extensive gene flow across large distribution ranges, which result in high levels of genetic diversity but low levels of population differentiation. The observed high F ST in P. bungeana is thus likely the result of habitat fragmentation that reduced population connectivity even at regional scales. Spatial analysis revealed a strong IBR and identified some significant barriers to gene flow. One prominent barrier is in the central part of the Qinling Mountains, which divided the populations into the E and W clusters. This barrier seems to coincide with a group of high peaks (Mount Taibai) that rise to 3767 m in central Qinling, which effectively blocked W-E migration. In the western Qinling, complex terrains further restricted gene flow even among neighboring populations, resulting in high F ST in this cluster of populations. The other gene flow barriers also corresponded with topographical changes in the landscape. These scattered gene flow barriers in a mountainous landscape could distort a clear pattern of IBD. Indeed, we could only attribute 2.26% of the genetic variation to the effect of IBD. Complex mountain terrains on one hand impose barriers to gene flow; however, they can also facilitate migration along valley corridors, promoting genetic homogenization. Thus, uneven mosaic spatial genetic structure is often characteristic of complex mountain landscapes.

Biogeographic history
Our demographic simulations suggested that the ancestral population size of P. bungeana was modest; it experienced drastic contraction approximately 4.88 (95% CI, 8.54-3.44) Ma and divided into E and W groups, each with a very small population size. This time coincides with the rapid growth of the Qinling Mountains from the late Miocene and early Pliocene (7.28-3.60 Ma) to the Pleistocene, but west Qinling had already been formed by 10.5 Ma (Ge et al., 2012). Our results point to the origin of P. bungeana in western Qinlingthe more complex genetic composition and high F ST in the W group corroborate this hypothesis. The uplift of the Qinling Mountains, particularly the block of high peaks in central Qinling, led to the separation and evolution of two genetic clusters. The E cluster was likely first established by limited migrants from the west, creating a significant bottleneck and reduced effective population size. The very low one-directional gene flow from W to E in this period further supports this hypothesis. Small populations are susceptible to loss of genetic diversity due to genetic drift. Several relict conifers, such as Pinus torreyana Parry ex Carr. (Ledig & Conkle, 1983), Picea martinezii T.F.Patt. (Ledig et al., 2000), and Pinus krempfii , are thought to have experienced extreme genetic bottlenecks, dramatically reducing their genetic diversity. Severe and prolonged bottlenecks are expected to have stronger impact on mt-and cp-genome diversity due to their smaller Ne relative to nuclear genome. This could partly explain the very low cpDNA and mtDNA polymorphism observed in this species.
The E group of P. bungeana experienced rapid growth beginning approximately 50 ka, but the W group continued to contract. The growth of the E group during the late Pleistocene likely resulted from an "out of Qinling" migration that expanded northward along the north-south orientation of the Taihang Mountains. This time point correlates with the Pleistocene global climate fluctuations. Species range expansions during repeated glaciations in the Pleistocene were made possible by the presence of glacial refugia (Hewitt, 2000). Thus, much of the distribution north of Qinling likely represents more recent colonization from east Qinling; their homogeneous genetic composition and close relationship with east Qinling populations support this view. The shared chlorotype B between population JP (W group) and populations AK and LT (E group) could thus be explained by this W to E and north migration.
The continued decline of the W group was likely driven by climate changes and habitat degradation during the late Pleistocene, as most of the remnant populations retreated to tops of the mountains with patchy and confined populations (Zhao et al., 2014). These scattered populations have been subjected to long-term genetic isolation and related genetic drift effects that continuously enhanced among-population genetic differentiation even within small geographic distances in west Qinling. The extant distribution of P. bungeana in Qinling thus appears to be the result of long-term range retraction and local persistence as relict and marginal populations.
Biogeographic studies of northern China flora often identify a significant role of the Qinling Mountains in intraspecific diversification and as refugia during the Quaternary glaciations (e.g., Huang et al., 2016;Xia et al., 2018;Shahzad et al., 2020). The heterogeneous landscapes of Qinling permit population persistence under different microor regional climates through altitudinal shifts or changes in slope orientation. Such landscapes, when combined with relatively stable climate conditions through time, tend to aggregate a large number of pre-LGM relicts and endemic species (Hampe & Jump, 2011). The concentration of rare plant species in the Qinling Mountains make it a wellrecognized biodiversity hotspot, and thus a critical conservation area (Huang et al., 2016).

Vulnerability to climate change
We found a larger contribution of IBE than IBD to the observed population diversity. This implies that climate adaptation, among other neutral stochastic forces, played a role in genetic differentiation. The GF modeling suggested that temperature, precipitation, and UV variables contributed to the genetic differentiation. A similar set of environmental factors has been identified in other conifers from this region that shows strong genotype-environment associations (Xia et al., 2018;Zhao et al., 2020). Spatial mapping of the genotype-environment associations showed clear genetic turnover between the southwest, southeast, and northern regions (Fig. 5A). These regions represent distinct climate zones separated by Qinling, moist subtropical climate to the south and dry temperate climate to the north (Rost, 1994;Xia et al., 2018), which provide support for divergent selection operating in different regions. Heritable phenotypic variations associated with climate, such as cone and seed morphometric traits, are reported among provenances of P. bungeana, providing additional evidence for adaptation to local climate (Li et al., 2002). However, capturing the loci underlaying polygenic adaption requires intensive coverage of the genome and a large sample size, which is beyond the scope of this study.
Based on current genotype-environment relationships, we simulated the genomic offset under future climatic conditions by calculating the genetic distances between current and future theoretically expected compositions (Fitzpatrick & Keller, 2015). Gradient forest identified the southeastern margin of P. bungeana as the most vulnerable and therefore likely to face challenges to match and adapt to the climate change. Under more severe (RCP8.5) climate change conditions, almost the entire southern range became vulnerable. Given the limited migration potential in and across the Qinling Mountains, climate change could be imposing an enhanced threat for the conservation of this species. If both migration and adaptive responses fail (due to very low genetic diversity), and current habitats cease to provide suitable environmental conditions for species persistence, then population extinction becomes the most likely outcome.
We are aware of the limitations in applying genomic offset for conservation as many assumptions in offset modelling are difficult to decide with certainty Rellstab et al., 2021). One challenge is to define adaptive and neutral genetic variation for complex traits like local adaptation. This issue is well illustrated in the study by Hall et al. (2021) where the authors investigate genotype-phenotype (frost hardiness) associations in Scots pine across 10 000 SNPs. The joint marker estimated heritability of the trait variation as high as 0.56, demonstrating the ability of genomic SNPs in capturing the genetic variation in the trait, and yet few significant loci are detected. Another study using exome sequencing recovered 134 000 SNPs in 5000 Norway spruce trees selected for wood quality properties (Chen et al., 2021); however, only 55 loci are significantly associated with the phenotypic variation. These examples support the theoretical expectation of the inheritance of complex traitsfor traits controlled by a large number of loci, each with very small effects, the power to detect individual causal loci is extremely low unless sample sizes and genome coverage are unrealistically large (Gienapp, 2020).
In this study, we did not have a priori SNPs associated with adaptation, nonetheless RDA detected significant contribution of the environment to genomic variation. This implies adaptive variation exists in the genome, thus using the full set of SNPs should to some extent capture the signal of genomic offset of P. bungeana populations to future climate change. Our genomic offset projections, together with the estimation of shifts in climate variables over the distribution range, could nonetheless serve as a risk prognosis of decline for management plans of relict and endangered species. Tests validating the trends of maladaptation using multisite common gardens are a necessary next step.

Conservation considerations
Increasing variability of future climate imposes new challenges for the conservation of endangered tree species such as P. bungeana with extremely low genetic diversity and patchy distribution. Knowledge about how endangered species persist and evolve in restricted geographical areas has become an integral part of developing effective conservation strategies. Populations north of Qinling are better connected with a large effective population size, which can secure more genetic diversity in the long run and thus buffer stochastic events. In contrast, a long history of bottlenecks, genetic drift, and limited gene flow in the west Qinling populations could result in further loss of genetic diversity in the future, which would enhance their risk of extinction due to demographic or environmental stochasticity. Effective population size is critical for saving relict populations from extinction, thus priority should focus on improving long-term population growth in situ (Fady et al., 2016). Given the limited migration potential arising from topographic and landscape constraints, assisted migration by mixing seedlots from the isolated stands in west Qinling would aid regional restoration while preserving the relict genetic resource of the west. For genetic reinforcement and rescue in the southeast, gene flow within the eastern lineage might be sufficient. This strategy could enlarge the effective population size and minimize inbreeding depression in and isolated populations. The risk of maladaptation should not be a big concern if the distance of translocation is modest.
Ex situ common gardens should be set up in suitable habitats to host range-wide collections for studies on adaptive traits variation and translocation effects, and for carrying out controlled crosses between populations to produce new genetic recombination for testing and restoration. Seeds and pollen from ancient trees that are preserved in temples and gardens could be utilized as an additional gene pool in crosses. Crossing experiments between populations are valuable for assessing the putative loss of fitness in the extant populations and the prospect of elevating inbreeding depression in small relict populations. Finally, direct human extirpation through landscape trans-formation remains a major cause of decline in this species. Therefore, forestry authorities must manage and protect the remnant forests through legislation.

Supplementary Material
The following supplementary material is available online for this article at http://onlinelibrary.wiley.com/doi/10.1111/jse. 12821/suppinfo Fig. S1. Distribution of chlorotype diversity. Fig. S2. Reliability of estimated effective migration surface (EEMS) models. Fig. S3. Twelve tested demographic models. Fig. S4. Analyses of population structure. Fig. S5. Goodness of fit of the best-fitting demographic model inferred by fastsimcoal2. Fig. S6. Shifts in climate conditions from current year to 2070. Table S1. Primers used for chloroplast DNA sequencing. Table S2. Environmental parameters used in this study.