Genome-wide accumulations of non-random adaptive point mutations drive westward evolution of Helicobacter pylori.

Retnakumar RJ; Chettri P; Lamtha SC; Sivakumar KC; Dutta P; Sen P; Biswas S; Agarwal N; Nath AN; Devi TB; Thapa N; Tamang JP; Chattopadhyay S

doi:10.1186/s12866-025-03944-2

← 뒤로

Genome-wide accumulations of non-random adaptive point mutations drive westward evolution of Helicobacter pylori.

1/5 보강

BMC microbiology 📖 저널 OA 100% 2022~2026 2025 Vol.25(1) p. 229

Retnakumar RJ, Chettri P, Lamtha SC, Sivakumar KC, Dutta P, Sen P

📖 무료 전문 🟢 PMC 전문 PMC12013172

PubMed ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

[BACKGROUND] For last seven decades we remained convinced that the natural point mutations occur randomly in the genome of an organism.

이 논문을 인용하기

↓ .bib ↓ .ris

APA Retnakumar RJ, Chettri P, et al. (2025). Genome-wide accumulations of non-random adaptive point mutations drive westward evolution of Helicobacter pylori.. BMC microbiology, 25(1), 229. https://doi.org/10.1186/s12866-025-03944-2

MLA Retnakumar RJ, et al.. "Genome-wide accumulations of non-random adaptive point mutations drive westward evolution of Helicobacter pylori.." BMC microbiology, vol. 25, no. 1, 2025, pp. 229.

PMID 40263995 ↗

DOI 10.1186/s12866-025-03944-2

Abstract

[BACKGROUND] For last seven decades we remained convinced that the natural point mutations occur randomly in the genome of an organism. However, our whole genome sequence analyses show that for the gastric pathogen Helicobacter pylori, which causes peptic ulcer and gastric cancer, accumulations of point mutations in the genome are non-random and they contribute to its unidirectional evolution. Based on the oncoprotein CagA, the pathogen can be classified into Eastern (East Asian countries like China and Japan; high incidence of gastric cancer) and Western (Europe, Africa, South-West Asian countries like India; low incidence of gastric cancer) types.

[RESULTS] We have found a unique high-altitude Himalayan region, Sikkim (an Indian state bordering China, Nepal and Bhutan), where the evolving Eastern and Western H. pylori types co-exist and show the signs of genetic admixtures. Here, we present genomic evidence for more virulent Eastern-H. pylori getting converted to less virulent Western-H. pylori by accumulating non-random adaptive point mutations.

[CONCLUSION] The lesser virulence of the westernized H. pylori is beneficial since this pathogen typically remains colonized in the stomach for decades before causing terminal diseases like gastric cancer. Moreover, the mutation-driven westward evolution of H. pylori is a global phenomenon, which occurred in the geographical regions where people from Eastern and Western ethnicities met and cohabited. The identified evolution of virulent Eastern H. pylori strains to lesser virulent Western variants by accumulation of point mutations also provides insight into the pathogenic potentials of different H. pylori strains.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~85 KB · 영문

Background

Background
H. pylori colonises the human stomach for decades before causing severe diseases like peptic ulcer and gastric cancer [1, 2]. During long-term colonization, the bacterium adapts to the changing milieu of the stomach and gradually co-evolves with the respective human [3–8]. The co-evolution and co-migration of H. pylori with humans of various ethnic background, for the past 60,000–100000 years, has led to the emergence of different populations and subpopulations of H. pylori strains with distinct multilocus sequence types (MLST) and core genome patterns [9–13].

Owing to the higher rates of recombination and point mutation than most bacterial species, H. pylori genomes evolve remarkably fast and are known to show panmictic population structures [14–16]. A unique DNA polymerase I and faster mutation rate also allow the bacterium to form quasispecies [17–21]. The rapid evolution of H. pylori genome has been observed in the stomachs of human and experimental animals [4, 22, 23]. As a result several beneficial adaptive mutations are accumulated in the H. pylori genome over the years [24].
The organization of the EPIYA segments in the H. pylori oncoprotein CagA has been distinctly maintained among the strains and consequently the global strains are classified into ‘Eastern’ or of ‘Western’ types [25] (Fig. 1a). The Eastern type CagA carries the D-segment, while the Western type CagA carries the C-segment in their respective C-terminal EPIYA-repeat regions, where the tyrosine (Y) residues in the ‘EPIYA’ motifs get phosphorylated within the gastric epithelial cells, initiating several intracellular signalling cascades that lead to gastric cancer [25–27]. The D-segment carrying ‘Eastern-Specific Sequence’ (ESS) of CagA is typically found only in H. pylori strains isolated from East-Asian countries like Japan and China [28, 29] (Fig. 1a). The H. pylori strains from almost all other parts of the world (e.g. the countries in Europe, Africa, North and South America; also, the countries in West, Mid and South Asia including India) express the CagA with C-segment or ‘Western-Specific Sequence’ (WSS) (Fig. 1a). Upon Tyrosine phosphorylation, the Eastern CagA binds more strongly to cellular SHP2 than the Western CagA and initiates a stronger intracellular response, which accounts for the higher prevalence of gastric cancer in East-Asia [28, 29].
Among the various determinants influencing the clinical outcome of H. pylori infection, the phylogeographic variations in the key virulence genes of the bacteria, like cagA, plays a significant role. It is worth noting that in Asia, the Eastern CagA-prevalent eastern regions and the Western CagA-prevalent Southwestern regions are geographically separated by the Himalayas, which, from ancient times, have created a partial barrier for human and H. pylori co-migration [30, 31] (Fig. 1b). Limited migrations, however, occurred through the mountain passes of the Himalayas [31]. In the extreme Northeastern India, a mountainous state, Sikkim, shares borders with China, Bhutan and Nepal [32]. The H. pylori strains that colonized the Sikkim residents remained overlooked. Historically, the state received settlers from either side of the Himalayan ranges, partly due to the spread of ‘Buddhism’ and the trades through a branch of the ‘Silk Route’ [32–34]. They had limited admixtures but also maintained certain distinctiveness in each ethnic group. The cultural admixture in Sikkim has increased in modern times and the state is now more open to the external world due to improved tourism. Currently, the state is populated with a wide range of people, of which Nepali, Lepcha and Bhutia are the major ethnic groups [35].
It is not known whether the H. pylori strains from Sikkim express Eastern CagA (like the adjacent country, China) or Western CagA (like the adjacent Indian state, West Bengal), or a combination of both. We hypothesized that the unique geographical and cultural overlaps of the East and West that exist in the high altitudes of Himalayas can potentially alter the H. pylori genome through interaction with host populations. Hence, Sikkim H. pylori strains would provide valuable insights into H. pylori genome evolution and would facilitate understanding the survival advantages of the pathogen. By genome-wide comparative study based on the housekeeping genes, core genes and virulence genes we could identify a unique ‘atypical’ clade of Sikkim Eastern strains that does not overlap with any other strains, including the typical Eastern strains. Incorporations of tolerable point mutations in the genome are crucial for progressive evolution, adaptation and survival of a species [36–38]. Recently, non-random point mutations were demonstrated in the non-coding regions of the plant genome [39]. Searching the genetic basis of the existence of unique atypical Sikkim Eastern clade revealed several non-random Western-specific mutations within the coding regions of the different genes in Sikkim Eastern H. pylori genomes, including cagA. This phenomenon was also observed in other geographical regions where the Eastern and Western cultures met and co-existed for a certain period of time. It was also noted that, typically, the ‘Eastern H. pylori strains’ accumulated ‘Western-specific mutations’, and gradually evolved unidirectionally to become ‘Western-like H. pylori strains’, thereby achieving lower virulence and better fitness within the gastric niches. From the pathogen’s perspective, over the years, the genome might have gained and retained mutations that in turn provided an adaptive edge for prolonged colonization within relevant host populations.

Methods

Methods

Ethical approval
The study was approved by the Institutional Human Ethics Committee of the Sir Thutob Namgyal Memorial (STNM) Hospital, Gangtok, Sikkim (04/18-C/STNMH/19) and Rajiv Gandhi Centre for Biotechnology (IHEC/1/2018/12; RGCB/IHEC/250/2023/08).

Study population
Sikkim is a land-locked state in North East India. It shares borders with Nepal, China, Bhutan and remains connected to the other Indian states through only one Indian state, West Bengal [32]. The study population includes patients of both sexes (Male: 65; Female: 65) of mean age 40.6 years (youngest: 19 years, oldest: 82 years) from all four districts (East, West, North and South) of Sikkim. They visited the Department of Gastroenterology, STNM Hospital, Gangtok, Sikkim with various upper gastrointestinal discomforts. The patients did not take any antibiotics for four weeks before endoscopy. Also, the patients with active GI bleeding and the patients who took proton pump inhibitors or H2 receptor antagonists in the past week of endoscopy were excluded from the study. Written informed consents were obtained from all enrolled patients. During endoscopy, the clinical diagnoses (non-ulcer dyspepsia, gastritis, peptic ulcer disease and gastric cancer) were made by a single gastroenterologist (Supplementary Table 2). A rapid urease test was performed to check the H. pylori infection status.

Collection of biopsies
During upper GI endoscopy, the gastric biopsies were collected and stored with Brucella broth containing glycerol in a − 80 °C freezer until they were used for the isolation of H. pylori.

Isolation of H. pylori
Gastric biopsies were inoculated on brain heart infusion (BHI) agar (BD Difco, Sparks, Maryland, US) containing antibiotic cocktail, Dent (Oxoid, Basingstoke, England), 0.4% IsoVitaleX (Difco BBL, Sparks, Maryland, US), 7% calf serum and 2% charcoal. The plates were then incubated under microaerobic conditions (5% oxygen and 10% carbon dioxide) at 37 °C for 5 days. H. pylori appeared as typical fine translucent colonies. Single colonies were picked up and expanded on the plates containing the same medium but without charcoal. The isolated pure H. pylori strains were harvested in BHI broth containing 20% glycerol and were preserved in the − 80 °C freezer for further use.

Isolation of genomic DNA and genotyping
Genomic DNA was isolated from the lawn of H. pylori cultures grown on BHI-agar plates [40]. The vacA alleles vacAs1/vacAs2, vacAm1/vacAm2) and the presence or absence of cagA gene were detected by a multiplex PCR [41] (Supplementary Table 3). The vacAi1/vacAi2 status was detected by a separate PCR as described elsewhere [42] (Supplementary Table 3). The cagA 3’ end was amplified by PCR and sequenced by the Sanger method.

Whole genome sequencing of the H. pylori strains
For 25 Sikkim H. pylori strains, Whole Genome Sequencing (WGS) was performed using Illumina Novaseq 6000 (Supplementary Table 4). Bcl2fastq program was used for base calling the fastq reads. The short reads obtained after base calling were used for bacterial genome assembly using the Unicycler (v0.5.0) pipeline. The k-mer range for SPAdes was selected based on the length of the input reads to maximise the chance of getting an ideal assembly (k-mer range: 27, 47, 63, 77, 89, 99, 107, 115, 121, 127; Median read length: 250). After assembly, a score was given to the assembly graph for each k-mer using the number of contigs. Various cleaning protocols were used to remove overlaps and merge the segments. Paired-end information was used to perform repeat resolution (RR) and produce contigs from the assembly graph. A bridge was created from the path if the graph path contains two or more anchor contigs. The graph was applied in decreasing order of quality and the most supported option is used. The depth of coverage was estimated using SAMtools after alignment of each of the raw reads with H. pylori reference strain 26695.
The Galaxy (https://usegalaxy.org/) platform was used to analyse the whole genomes using Prokka (Galaxy Version 1.14.6 + galaxy1) and Roary (Galaxy Version 3.13.0 + galaxy2). The WGS of the 25 strains from Sikkim and 131 strains from different regions of the world were analysed (Supplementary Table 5). The Roary tool sorts the genes in the pangenome into core genes, soft core genes, shell genes and cloud genes. The ‘core genome’ includes the conserved set of genes that are present in all the H. pylori genomes used for analysis. The genes present in 95 to 99% of the genomes are classified as soft core genes and those in 15 to 94% of the genomes as shell genes. Cloud genes are those that are present in only 0 to 14% of the genomes. As the core genome represents the functional backbone of the genome, it can be used as a reliable marker for studying the evolutionary divergence and genetic stability of different populations within the species. The summary statistics file obtained from Roary analysis shows the number of genes in the four categories, core genes, soft core genes, shell genes, cloud genes and the total number of genes in the pangenome. The core genes were identified from the Gene_Presence_Absence file in.csv format which contains information such as gene names and gene annotations. The sequences of the 747 core genes were extracted from the Prokka annotation output and were used for further analyses. The 747 core genes from 25 Sikkim strains and 131 global strains were further used for performing comparative phylogenetic analyses, SNP distance analyses and similarity index.

Core genome similarity heatmap
A FASTA file containing concatenated and aligned core gene sequences from 156 strains of H. pylori from diverse geographical regions of the world was used for analysis. Python (version 3.12) was utilized to compute the percent identity and generate a similarity matrix derived from the aligned and concatenated core gene sequences. Aligned sequences were retrieved from the FASTA file using the Biopython library (version 1.83). The sequences were parsed and their headers and corresponding aligned sequences were stored separately for further analyses. Percent identity was selected as a Similarity metric, calculated as the percentage of identical residues between two sequences. For each pair of sequences, the percent identity was calculated using a custom function. This function computes the number of identical residues between two sequences and expresses it as a percentage of the sequence length. A similarity matrix was constructed using NumPy (version 1.26.3) to represent pairwise similarities between all sequences. The matrix size was determined based on the number of sequences, with diagonal elements set to 100% (indicating self-similarity). The similarity matrix, along with sequence headers, was exported to a CSV file format. Each cell in the matrix contained the percent identity value between the corresponding sequences. A core genome Heatmap was created using the Heatmap function from the ComplexHeatmap package based on the similarity matrix in R software (version 4.3.2) and RStudio (version 4.3.2).

Chord diagram
Python, along with Matplotlib (version 3.10.0) and NumPy (version 1.26.4) libraries were used for data wrangling and visualization of the circular chord diagram, which was created to visually illustrate the genetic structures and interactions between two groups of strains. The reciprocal of Euclidean distance was used to measure the similarity between a pair of groups of strains. Thicker chords indicate higher similarity between the two groups.

Comparative phylogenetic analysis
The gene sequences were aligned using MUSCLE alignment program in MEGA11 software [43]. The MUSCLE alignment was done using default settings with Gap Open penalty of − 400.00 and a Gap Extend penalty of 0.00. The UPGMA cluster method was used for iteration and the maximum number of iterations given was 16. The minimum diagonal length (lambda) was set at 24. For the cumulative core genes and virulence genes, the maximum likelihood method was used for making phylogenetic trees. A total of 747 core genes from 156 strains were used for the core gene phylogenetic analysis. Each of the gene sequences was aligned using MUSCLE tool, in MEGA 11 and was trimmed to obtain a continuous ORF for all the strains. The individual core genes were concatenated for each of the strains. The concatenated core gene sequence was used for generating the phylogenetic tree. The alignment file was converted to PHYLIP format and was used to generate a maximum likelihood tree by PhyML (version 3.1). The number of bootstrapped data was set to 100. The branch length and the substitution model parameters were optimized and the Generalized Time Reversible (GTR) substitution model was used for the analysis. The output tree was uploaded to iTOL (https://itol.embl.de) for visualization and analysis. Further, the virulence genes in the H. pylori genomes were identified using the Virulence Factor Database. Similar approach was used for the virulence gene pool, which comprises of 54 genes from 111 strains. The cagA tree was also generated using the maximum likelihood method using 25 Sikkim strains and 287 global H. pylori strains.
For the neighbor joining tree the method of phylogenetic analyses the p-distance and the bootstrap value were set at 1000. Transitions and transversions were included in the substitutions, with a uniform rate among sites. For making the MLST phylogenetic tree, the sequences of 7 housekeeping genes (atpA, efp, mutY, trpC, ureI, ppa and yphC) were concatenated for all 342 strains. The vacA phylogenetic tree was generated using the strains with vacAm1 allelic type, which includes 12 Sikkim strains and 168 global H. pylori strains. The ureA and ureB phylogenetic trees were generated with 256 strains and 257 strains, respectively. The recA phylogenetic tree was generated using 315 strains, including 25 Sikkim strains.

Core gene SNP distance graph
An SNP distance matrix was generated for each of the 747 core genes for 156 H. pylori strains using the SNP distance matrix tool (Galaxy Version 0.8.2 + galaxy0) available in the Galaxy server (usegalaxy.org). The values in the individual triangular matrices for each gene were arranged in a single column, which represents the SNP distance score corresponding to a particular gene. The mean SNP distance values corresponding to each gene were plotted to compare the variability between each of the core genes.

Analysis of synonymous and non-synonymous mutations
From the 25 Sikkim H. pylori strains 24 were used for estimating the non-synonymous vs the synonymous change rates for the housekeeping (trpC, ureI, atpA, mutY, ppa, yphC, efp) and virulence (ureA, ureB, vacA, cagA, oipA) genes. The strain SMVE8 was excluded from the analyses since this strain carries a premature stop codon in the cagA gene. For calculating the non-synonymous and the synonymous changes, three different methods were employed.
For the first method, which computes individual non-synonymous (Ka) and synonymous (Ks) values, we used TBtools-II (Toolbox for Biologists, v2.012) [44]. The Ka Ks calculator was utilized with two primary input files. The first file comprised a generated FASTA file, consolidating cagA gene nucleotide sequences from all 24 strains into a comprehensive dataset. Subsequently, a strain combination file was created as the second input, listing the names of all potential pairwise combinations, resulting in a total of 576 unique combinations. The creation of this comprehensive strain combination file was facilitated through a custom R code, ensuring exhaustive coverage of all possible combinations. The software processed this information, systematically calculating the Ka, Ks, and Ka/Ks values for each specified pair outlined in the strain combination file. Upon submission of the two requisite input files into the Ka Ks calculator, a detailed output file was generated, presenting comprehensive information on individual Ka, Ks, and Ka/Ks values for each specific pair. To generate the heatmaps for Ka/Ks, Ks and Ka for the cagA genes of 24 strains raw data comprising the specific mutation rate values for all 576 unique strain pairs were extracted from the custom Python script generated output file. Subsequently, distinct CSV files were generated, each containing the respective mutation rate values for all combinations. These CSV files served as the input data for heatmap generation. Heatmaps were generated using R software (version 3.3.0 +) and RStudio (version: 2023.09.0 + 463), employing a custom code developed for the creation of separate heatmaps. The resulting heatmaps took on a square format, constituting a 24 by 24 matrix that comprehensively represented all unique strain combinations in both row-wise and column-wise orientations. Each data point within the heatmap corresponded to one of the 576 unique substitution mutation rate values, providing a visual representation of the genetic variations across the strains.
The second method involved the usage of a custom Python script to calculate the changes in the first two bases in the codon (dCp1,p2) and the changes in the third base of the codon (dCp3) and their ratios (dCp1,p2/dCp3). The script imports necessary libraries including csv for CSV file manipulation and SeqIO from the Biopython library (version 1.83) for parsing sequence files. The calculate_dCp1,p2_dCp3 function reads the input FASTA file, storing sequences in a dictionary. It then calculates the total numbers of first two bases of the codons and third base of the codons based on the length of the first sequence. Utilizing nested loops, it compares codons between pairs of strains, counting the changes that occurred in the first two bases of the codons and the last bases of the codons. Subsequently, it computes the substitution rates dCp1,p2, dCp3, and dCp1,p2/dCp3 ratio for each strain pair, storing the results in a dictionary.
The third method involved calculating the ‘rate of per codon non-synonymous’ (dpcN) and the ‘rate of per codon synonymous’ (dpcS) changes as well as their ratio (dpcN/dpcS) for a gene. To compute the dpcN, dpcS and dpcN/dpcS custom Python scripts were developed. The scripts were written in Python (version 3.12.1) and executed within the Integrated Development and Learning Environment (IDLE) (version 3.12.1), ensuring a precise and accurate assessment of genetic variations within the genes. The Python script computed the dpcN/dpcS, utilizing the given FASTA file containing the DNA sequences. It first imported necessary libraries, including csv for handling CSV files and SeqIO from the Biopython library for parsing sequences. The code defined a standard genetic code dictionary mapping codons to amino acids. The calculate_ dpcN _dpcS function reads the input FASTA file, storing sequences in a dictionary. It then iterated over each pair of strains and calculated the actual numbers of non-synonymous and synonymous changes occurring in a gene. Within this loop, it further iterated over the codons, comparing corresponding codons between the pair of strains. If codons differed, it checked whether the substitutions resulted in non-synonymous or synonymous changes based on the genetic code dictionary. The function calculated dpcN, dpcS, and dpcN/dpcS values (which depend on the number of codons in the sequences) for each strain pair and stored the results in a dictionary. For each method, the obtained significances were corrected by the Holm-Sidak test. The codon usage for the cagA gene was calculated by using a previously described method [45].

Identification of mutations
The whole length CagA of equal number of Sikkim Western strains (SM21, SM5, SM51, SM8, SM81, SM88, SMVS8, SMVW1 and SMVW19) and Eastern strains (SM24, SM75, SM77, SMVE1, SMVE3, SMVE4, SMVW4, SMVN2 and SMVS3) was used for the analysis. The mutations were determined for the Sikkim strains with respect to the corresponding typical strains, USU101 and F32 for Western and Eastern strains, respectively. The nucleotide sequences were arranged in triplets and were translated to amino acids (https://web.expasy.org/translate/). The numbers of synonymous and non-synonymous mutations were then determined from the data.
To identify the specific mutations in CagA between the Western and Eastern strains from Sikkim, the strains were compared with corresponding regions of typical Western and typical Eastern strains. The frequencies of the Western-specific and Eastern-specific point mutations were determined and represented as Logo images (https://weblogo.berkeley.edu/logo.cgi). The statistical significance of the frequency of the motifs was analysed by Chi-square test and p-values were determined. A similar analysis was carried out with CagA of strains from other regions like Indonesia, Myanmar, Okinawa (Japan), Alaska and South America. Further, the non-synonymous and synonymous mutations in the specific regions of recA, ureB, ureA, atpA, mutY and ppa genes were also determined.

Molecular dynamics simulation studies of CagA
Molecular dynamics (MD) simulations were performed to investigate the interactions of CagA variants complexed with a membrane using the Desmond module of Schrödinger (Desmond Molecular Dynamics System, version 3.1) [46]. Each CagA protein was optimally positioned and oriented within an orthorhombic box of explicit water using the CHARMM-GUI Membrane Builder module [47]. The box was solvated with a lipid bilayer made up of a combination of dipalmitoylphosphatidylcholine, phosphatidylserine, and cholesterol and surrounded by explicit water on either side. Overlapping water molecules were deleted, and the systems were neutralized with Na+ ions. The OPLS- 2005 force field and midpoint method were used in the MD simulation to update the positions and velocities of atoms in large biological systems [48, 49]. Periodic boundary conditions were applied, the PME method for electrostatics, a 10 Å cutoff for Lennard–Jones interactions, and SHAKE algorithm were used. A relaxation simulation was performed using an NPT ensemble at a temperature of 300 K and maintained using the Nose–Hoover thermostat, and pressure was set to 1 bar and maintained through the Martyna–Tobias Klein pressure bath. The relaxed system was then simulated for a total of 200 ns with a time step of 1.2 ps. Trajectories were recorded every 10 ps. RMSD was used to evaluate the stability of the system using Schrödinger (2019).

SNP distance estimation for the core genes and MLST genes
Relatedness and genetic distance between the H. pylori strains with respect to the core genes and MLST genes were determined using Tseemann/snp-dists, a computational tool available in GitHub. This involves the identification of single nucleotide polymorphisms (SNPs) among core and MLST genes from the concatenated sequences for the various strains, resulting in a high-dimensional SNP distance matrix.

t-SNE plot for MLST genes and core genes
The t-distributed Stochastic Neighbor Embedding (t-SNE) plots were generated using the SNP distance matrix created for the MLST genes and the core genes. The t-SNE plots were generated using a specific algorithm made by a custom Python script. The script is based on the scikit-learn library for t-SNE plot implementation. The algorithm transformed the high-dimensional SNP distance data into a lower-dimensional (3-dimensional) space. The analyses generated comprehensive 3D visual plots that provide intuitive representation of the genetic relationships and evolutionary patterns observed among the global H. pylori strains. The Euclidean distance between the centroids of the different clusters in the t-SNE 3D plots were determined.

Results

Results

Existence of both Western and Eastern H. pylori strains in Sikkim
The C-terminal repeat region sequence of the CagA protein is a clear indicator for the Western and Eastern H. pylori strains. Due to the unique geographical location of Sikkim, which connects the regions having either Eastern or Western CagA, we expected that some of the H. pylori strains isolated from Sikkim would express Western CagA, while some would express Eastern CagA (Fig. 1a and b). When the PCR-amplified 3’-end of the cagA gene from 25 Sikkim H. pylori strains were sequenced by Sanger method and the DNA sequences were converted to protein sequences, we noticed that 16 of them are Western (carrying TIDDL motif; AB-C: 56% and AB-C–C: 8%) and 9 of them are Eastern (carrying TIDFD motif, AB-D: 24% and ABB-D: 12%) (Fig. 1c-g). Since we found both Western and Eastern CagA expressing strains in the same geographical region, we thought that the whole genome sequence (WGS) analyses of these strains might be useful in appreciating the admixture between the Western and Eastern H. pylori and their subsequent evolution.

Typical and atypical Eastern H. pylori strains in Sikkim
We performed WGS using Illumina NovaSeq 6000 for all 25 H. pylori strains and these strains were analysed along with 131 H. pylori WGS from different parts of the world using the outputs generated by Prokka and Roary (Supplementary Fig. 1). In the heatmap, the Western and Eastern CagA expressing strains are shown in blue and red dots, respectively (Fig. 2a). The core genomes of some of the Western CagA expressing Sikkim strains (SMVE8, SM81, SM51, SM21, SM88), are clustered with strains from India (Manipal and West Bengal) and Nepal. A few other Western strains (SMVS8, SMVW19, SMVW12, SMVW18) were also close to the strains from India (Manipal, West Bengal and Ladakh). Another Sikkim Western-strain, SM87, remained proximal to the strains from Germany and India (Manipal). The Sikkim Western-strains (SM8, SMVN1, SMVS1 and SM5), showed core genome similarity with strains from India (Manipal) as well as strains from Lithuania, Nepal, and Australia. SMVW1 and SMVW10, two Sikkim Western strains, remained close to the strains from Spain, Germany and Democratic Republic of the Congo. Sikkim Eastern-strains, SM24 and SMVE1, remained proximal to strains from East Asia (Japan and China) and South America (Peru). Surprisingly, the core genomes of 7 other Sikkim Eastern CagA expressing strains (SM75, SM77, SMVE3, SMVE4, SMVS3, SMVN2, and SMVW4) exhibited similarities to the Western-strains from Sikkim and other geographical regions like India (Manipal, West Bengal and Ladakh), Nepal and Russia suggesting both geographic clustering and admixture. These 7 strains are named atypical Eastern, while the Sikkim Eastern strains, which are closely related to the strains from China and Japan, are now named typical Eastern. Then we grouped the core genomes of different strains on the basis of geographical origins and created a circular chord diagram to measure similarities among the groups based on the reciprocals of the Euclidean distances. The thicknesses of the chords indicate the strengths of similarities. The typical Eastern strains of Sikkim showed a strong influence from the East Asia (Fig. 2b). The Sikkim atypical Eastern-strains, on the other hand, were influenced by Eastern-strains from Sikkim and East Asia as well as the Western-strains from India, Nepal and Europe. In the Maximum Likelihood phylogenetic tree of the core genome, the Sikkim typical Eastern-strains are clustered with the East Asian strains, while Sikkim atypical Eastern-strains are clustered separately and proximally to the Sikkim Western-strains (Fig. 2c). Similar findings were observed in the tree made from the concatenated nucleotides of all H. pylori virulence genes (H. pylori-virulome) (Fig. 2d). These findings, along with the findings obtained from the phylogenetic analysis of MLST, recA, ureA, ureB, cagA and vacA genes, confirmed that the Sikkim Eastern-strains are of two types, viz. typical Eastern and atypical Eastern (Supplementary Fig. 2 and 3). Although the Sikkim typical Eastern, atypical Eastern and Western strains are clustered separately for each gene, the adjacent strains vary widely among the genes showing the plasticity of the H. pylori genomes (Supplementary Table 1). This prompted us to look into the mutations in each gene carefully.

Genome-wide inequality in point mutations
The single nucleotide polymorphism (SNP) distance scores based on each of the 747 core genes were compared among all 156 strains (including 25 from Sikkim) and they showed remarkable variabilities for each H. pylori core gene (Fig. 2e). The genes involved in fumarate respiration, (frdC), ribosomal proteins (rpsS, rpsM, rplW, rpsJ and rpsR), associated with translation/tRNA (infA and miaB) and molybdenum ABC transporter (modA) had relatively low mean SNP values. Genes associated with SAM metabolism (cmoA), pyruvate metabolism (porB), stress-associated genes (nfuA), urease maturation factor (hypB), ABC transporter (glnH) and exodeoxyribonuclease (exoA), showed moderate SNP values. Genes that showed higher SNP values include, L-glutamate gamma-semialdehyde dehydrogenase (rocA), bifunctional DNA-directed RNA polymerase subunit beta-beta'(rpoBC), protein translocase subunit (secD), carbamoyltransferase (hypF), DNA polymerase I (polA), tRNA ligase (valS, pheT and alaS) and cation efflux system protein (czcA). Wide variations in the mean SNP-distance scores of the core genes indicate different mutation rates for different genes across the genome.

Distinct rates of non-synonymous and synonymous changes for Western-H. pylori and Eastern-H. pylori
Seven housekeeping genes (trpC, ureI, atpA, mutY, ppa, yphC, efp) and 5 virulence genes (ureA, ureB, vacA, cagA, oipA) were employed to investigate the evolutionary dynamics patterns by examining non-synonymous (Ka) and synonymous (Ks) substitutions. SMVE8, a Western strain that carries a premature stop codon in cagA, was excluded from the analyses (Supplementary Fig. 4a). Another Western strain, SMVN1, with a point mutation in cagA at the second codon leading to a frameshift without any premature stop codon was included in the analyses since due to a second mutation it regained the CagA coding sequence (Supplementary Fig. 4b). The gene sequences of 24 Sikkim strains were used for pairwise comparisons, which gave 576 unique combinations for each gene. ureA and ureB code for the structural subunits of urease enzyme, which is indispensable for acid-adaptation and gastric colonization and hence are highly conserved across H. pylori strains [50]. As expected, the Ka/Ks ratios for most of the virulence genes, except for ureA and ureB, are higher than the Ka/Ks ratios for the housekeeping genes (Fig. 3a). The Ka/Ks ratios are highest for the major two virulence genes, vacA and cagA (Fig. 3a). The higher values of both Ka and Ks for the vacA and cagA genes further confirm the finding (Supplementary Fig. 5a-d). Likewise, the ratios of the nucleotide changes at the first two bases of the codons (dCp1,p2) and the third bases (dCp3) are higher for the vacA and cagA as compared to the other genes (Fig. 3b, Supplementary Fig. 5e-h). The ratio of the rate of per codon non-synonymous (dpcN) changes vs the rate of per codon synonymous (dpcS) changes is highest for the cagA (Fig. 3d; Supplementary Fig. 5i-l). Notably, for cagA, the individual points representing the Ka between two strains and the Ks between two strains are segregated in two distinct groups (either high or low values) in the box plot (Supplementary Fig. 5a-d). Since this indicated that two distinct cagA populations might have two distinct rates of acquiring non-synonymous and synonymous mutations, we analysed the Ka/Ks, Ka and Ks by heatmaps (Fig. 3d-f). All three heatmaps clearly indicated that the Eastern-cagA and Western-cagA are significantly different in their rates of incorporating both non-synonymous and synonymous mutations (Fig. 3d-f). Although the overall codon usage patterns for the Western-cagA and Eastern-cagA are not remarkably different, for the Western-cagA, the cysteine is encoded mostly by TGC and rarely by TGT (Supplementary Fig. 6a). Interestingly, the usage of cysteine is completely absent (neither by TGC nor by TGT) for Eastern-cagA of Sikkim and several countries in East Asia (Supplementary Fig. 6b-f). Thus, the cagA of Western and Eastern strains are different in their codon usage and rate of acquiring point mutations.

Sikkim Eastern-cagA acquires non-random Western-specific non-synonymous point mutations
We noticed that the sequence of amino acids between the second and third EPIYA motifs vary between Western CagA and Eastern CagA. For the CagA of 26695 (a standard Western strain) and 9 more Western strains it is VNAKIDRLNQIAS, while for the CagA of TN2 (a standard Eastern-strain) and 9 more Eastern strains, it is VSAKIDQLNEATS (Fig. 4a and b). Noticeably, for this region, Western CagA vs Eastern CagA differ in Asparagine (N) vs Serine (S) at position 926; Arginine (R) vs Glutamine (Q) at position 931; Isoleucine (I) vs Alanine (A) at position 935; Alanine (A) vs Threonine (T) at position 936. In the same amino acid positions, some of the Sikkim CagA proteins show very distinct substitutions since the Eastern-specific amino acids (S, Q, A, T) were substituted only by Western-specific amino acids (N, R, I, A) and not by any random amino acids (Fig. 4b). For each site, the probability of an Eastern-specific amino acid getting substituted by a Western-specific amino acid is significantly higher than the probability of a Western-specific amino acid getting substituted by an Eastern-specific amino acid (Supplementary Fig. 7a-d).
The exact nucleotide substitutions (e.g. A to G and C to T) in the Eastern cagA genes leading to Western-specific non-synonymous changes have been identified (Supplementary Fig. 7e). A maximum likelihood phylogenetic tree was generated using the corresponding nucleotide sequence (39 bp) (Fig. 4c). The Sikkim Western strains, SM8, SM21, SM88, SM51 and SMVW1 share a cluster with the typical Western strains. Sikkim Western strains such as SM5, SM81, SMVW18 and SMVW19, formed a clade very close to the other Western strains. The Sikkim Eastern strains, however, formed two distinct clusters. The two typical Sikkim Eastern strains, SM24 and SMVE1, are clustered with the typical East Asian strains. However, the seven other Sikkim Eastern strains (SM75, SM77, SMVE3, SMVE4, SMVN2, SMVW4 and SMVS3), which are atypical, formed a unique clade between the typical Western and typical Eastern clusters (Fig. 4c). Then we wanted to know whether these 7 Eastern cagA are indeed of Eastern origin and going through the westward evolution by acquiring Western-specific mutations, or are they of Western origin, but going through the eastward evolution by acquiring Eastern-specific mutations.
One of these two possibilities was eliminated based on the following observations. First, all of these 7 CagA carry the TIDFD motif of either AB-D or ABB-D types, which are present only in Eastern CagA (Fig. 1f and g). Second, the high dpcN-dpcS and dpcN values, and low dpcS value for the Sikkim Eastern cagA (including these 7 cagA) clearly indicate that the rate of acquiring non-synonymous mutations in Sikkim Eastern-cagA is higher than the Sikkim Western cagA (Fig. 4d-f). Third, the total number of non-synonymous (tNc) vs the total number of synonymous (tSc) changes in the Sikkim Eastern CagA is significantly higher than the Sikkim Western CagA (Fig. 4g). And fourth, the numbers of Western-specific mutations in the C-terminal regions of the Sikkim Eastern CagA proteins are significantly higher than the numbers of Eastern-specific mutations in the C-terminal regions of the Sikkim Western CagA proteins (Fig. 4h and i). Based on the above observations, and from the data obtained from the analyses of other CagA regions representing both C- and N-termini (Supplementary Fig. 8–10), we conclude that the 7 atypical cagA that formed a unique clade (Fig. 4c) are of Eastern origin and have accumulated non-random, Western-specific, non-synonymous point mutations in specific positions in the course of the westward evolution.

Westernization of the Eastern-CagA brings adaptive advantages
Molecular dynamics (MD) simulations were employed to investigate the interactions of Eastern-CagA and Western-CagA with the inner surface of the host cell membrane (Supplementary Fig. 11a-d). Root-mean-square deviation (RMSD) values were calculated to assess the structural stability of the CagA variants with the membrane (Supplementary Fig. 11e-i). Sikkim-typical Eastern CagA (SMVE1) has a significantly lower RMSD value (1.1 Å) than the Sikkim-typical Western CagA (3 Å for SM21 and 2.34 Å for SMVW19) indicating that the Eastern CagA forms a more stable complex with membrane and it has a lower degree of movement of the atoms than the Western CagA. Interestingly, the SM75 CagA, which we know is of Eastern origin, but became westernized with several key Western-specific non-synonymous point mutations, has intermediate RMSD values (1.4 Å) suggesting it has less stable interaction with membrane than the typical Eastern CagA.
The potential energy values were computed by accounting for the Lennard–Jones and electrostatic interactions between the CagA atoms and their surrounding membrane environment (Fig. 5a). The extracted potential energy values of each CagA variant were then compared to determine the relative energy of each variant and how their structures affect the forces and torques on the surrounding membrane. Sikkim typical Eastern CagA (SMVE1) exhibited stable lower energy (− 318.60 ± 1.90 kcal/mol) while, the Sikkim Western CagA proteins (SMVW19 and SM21) showed much higher energies (− 205 ± 2.37 kcal/mol and − 202 ± 1.54 kcal/mol) upon binding to the membrane, indicating a more favorable interaction with the surrounding membrane environment for the Eastern CagA than the Western CagA. This is not unexpected since the intracellular action of CagA depends on its ability to interact with the host inner cell membrane and the Eastern CagA is known for its stronger biological response than the Western CagA. We also found that the Sikkim atypical Eastern CagA (SM75), which carries Western-specific mutations, displayed an intermediate level of energy (− 274.36 ± 2.19 kcal/mol) upon membrane binding suggesting that the altered protein has less stable binding with the membrane and less potential to alter intracellular pathways than the typical Eastern CagA (Fig. 5a). Similarly, the westward evolving CagA of SM75 also has intermediate numbers of hydrogen bonds (Fig. 5b) and van der Waals interactions (Fig. 5c) as compared to the typical Eastern (stronger) and typical Western (weaker) CagA proteins. The atypical Eastern CagA, like the Western CagA proteins, also has more fluctuations in the hydrogen bonds and van der Waals interactions than the typical Eastern CagA. Taken together, our data suggest that when an Eastern CagA accumulates key non-random Western-specific mutations at specific sites and gets westernized, it becomes less virulent than its original Eastern version.

H. pylori CagA-Westernization is a global phenomenon
Populations in regions like Indonesia, Myanmar, Okinawa island, Alaska and Peru, like Sikkim, consist of native ethnic groups that have undergone admixtures between the Eastern and Western populations in the recent past due to trade, war and migration. Hence these populations were selected for searching similar mutations due to the possible existence of multiple distinct gene pools. The CagA of H. pylori strains from South East Asian countries like Indonesia and Myanmar; strains from Alaska at the North Western extremity of North America; strains from Peru of South America, and strains from Okinawa, Japan were analyzed. Like Sikkim, for the Eastern CagA of South East Asian countries like Indonesia and Myanmar, the N gets substituted by S at the 926 position; R gets substituted by Q at the 931 position; I gets substituted by A at the 935 position; and A gets substituted by T at the 936 position. Interestingly, the Western CagA of several regions show Eastern signatures suggesting that they are originally of Eastern origin, but have evolved to become Western CagA. For example, the Western CagA of Alaska and Peru carry both Eastern and Western-specific motifs at the same positions (926, 931, 935 and 957) (Fig. 5d). Additionally, the Peru Western-CagA proteins carry Eastern-specific motifs at positions 1006 and 1009. The Okinawa Western CagA proteins also show the presence of Eastern-specific motifs at position 957. This indicates that the non-random variations in CagA, exhibited by the Sikkim strains are also present in several other specific geographical regions where admixtures between the Eastern and Western populations have occurred. The data indicates that the Eastern H. pylori strains, with its natural tendency to acquire more non-synonymous mutations, show increased ability to modify a key virulence protein for better adaptation in the available host population.

The entire Eastern-H. pylori genome accumulates non-random Western-specific point mutations
We wanted to test whether the accumulations of Western-specific point mutations are restricted to the cagA gene, or similar non-random point mutations are also incorporated in other H. pylori genes. Our results show that like Eastern cagA, Eastern vacA (encoding vacuolating cytotoxin, a virulence effector protein) and Eastern recA (encoding recombinase, required for recombination and DNA repair) also have Western-specific substitutions (Fig. 6a and b, Supplementary Fig. 12–13). The Eastern ureB (encoding UreB subunit of urease, an enzyme necessary for the survival of the bacterium), although did not show any non-synonymous changes, showed Western-specific synonymous changes (Fig. 6c, Supplementary Fig. 14). Likewise, several housekeeping genes and virulence genes of the Sikkim Eastern strains also showed incorporations of Western-specific synonymous mutations suggesting that the accumulations of Western-specific mutations are not just non-synonymous mutations (Fig. 6d-k). For housekeeping gene atpA, at 192nd position, the Sikkim Eastern strains possessed Western-specific adenine in addition to guanine, which is typically seen in Eastern strains (Fig. 6d). Similar mutation was seen in mutY at 378 th position (Fig. 6e). The Sikkim Eastern strains have acquired Western-specific thymine in addition to cytosine in ppa at 16 th position (Fig. 6f). In recA, the Western-specific cytosine is present at 697 th and 699 th position (Fig. 6g). Western-specific synonymous mutations were also identified in virulence genes like cagA, vacA, ureA and ureB. Sikkim Eastern cagA has Western-specific cytosine at 637 th and 639 th position and vacA has Western-specific adenine at 495 th position (Fig. 6h and i). Similar mutations were found in ureA (Western-specific adenine at 99 th position) and ureB (Western-specific thymine and adenine at 420 th and 423rd positions, respectively) (Fig. 6j and k). This suggests that the Western-specific synonymous and non-synonymous mutations are common for many genes of the Eastern H. pylori strains, but occurs at a higher frequency in cagA. CagA is a key virulence factor that interacts with multiple host proteins and is a key determinant of H. pylori disease outcome. The higher proportion of Western-specific non-synonymous mutations in Eastern cagA strongly suggests adaptive transformation to a less virulent phenotype, which is identified as a global phenomenon (Fig. 5d).
To obtain a broader perspective on the acquisition of point mutations by the different H. pylori strains, we took another approach. We made an SNP-distance matrix using the concatenated 7 housekeeping MLST gene sequences (trpC, ureI, atpA, mutY, ppa, yphC, efp) of 342 H. pylori strains from different parts of the world. This matrix was used to generate a t-SNE plot (Supplementary Fig. 15a). The Western strains from Sikkim were mostly associated with hpEurope and hpAsia2. The Sikkim-typical Eastern strains, SMVE1 and SM24 were clustered along with the hpEAsia strains. However, the Sikkim-atypical Eastern strains, SM75, SM77, SMVN2, SMVE3, SMVS3, SMVE4 and SMVW4, which carry Western-specific mutations in cagA, vacA, recA and ureB, formed a significantly distinct cluster. This cluster is distant from the hpEAsia cluster and closer to the Sikkim Western, hpAsia2 and hpEurope clusters. The calculated Euclidean distances between the H. pylori populations further confirmed the finding (Supplementary Fig. 15b). Finally, we tested if these 7 atypical Eastern-strains also carry Western-specific mutations throughout their genomes. For this we generated a similar t-SNE plot based on the SNP-distance matrix of 747 concatenated core genes (distributed throughout the genome) of 156 H. pylori strains (Fig. 7a). The Sikkim-typical Eastern strains, SM24 and SMVE1, were seen distinctly in the plot close to strains from East Asia and Latin America. Consistent to the previous finding, the seven Sikkim-atypical Eastern strains (SM75, SM77, SMVN2, SMVE3, SMVS3, SMVE4 and SMVW4), formed a distinct cluster (Sikkim Eastern-Cluster II) also in core genome t-SNE plot (between the Sikkim Eastern-Cluster I and the Sikkim Western Cluster). The Euclidean distance between the Sikkim Eastern-Cluster I and Sikkim Eastern-Cluster II is 7.4, while the distance between the Sikkim Eastern-Cluster II and Sikkim Western-Cluster is 9.3 (Fig. 7b-c, Supplementary Fig. 16). It is worth mentioning that the core genomes of the Okinawa strains are also closer to the Western strains (Euclidean distance: 9.12) than East Asian strains (Euclidean distance: 10.8) (Supplementary Fig. 16). All these findings showed that some of the Sikkim Eastern strains, which express the atypical CagA, have accumulated genome-wide non-random, Western-specific adaptive point mutations.

Discussion

Discussion
Survival of a species in different niches must depend on its adaptations. For a species, acquiring mutations that contribute to gain better fitness should be one of the key mechanisms for adaptations [36–38]. However, finding the evidence of natural point mutations that accumulate in microbial genomes leading to evolutionary shift is challenging [38]. We have been able to isolate H. pylori strains from a high-altitude geographical location, Sikkim, where different forms of evolving H. pylori genomes presently co-exist.
Sikkim is a multilingual and multi-ethnic Indian state, which is heavily influenced by the cultures (including diet) from both East Asia and South Asia [32, 34, 35]. Since H. pylori strains remain colonised in the stomachs of infected individuals for decades and get transmitted to other individuals only by close contacts, finding both typical Eastern and typical Western H. pylori types in Sikkim was not too surprising. However, we also noticed that only 36% of the Sikkim H. pylori strains express the Eastern-CagA, while 64% of them express the Western-CagA. Among the minority of the Sikkim H. pylori strains that express Eastern-CagA, a significant fraction (77.8%) is unique atypical Eastern. These atypical Eastern CagA proteins carry many synonymous and non-synonymous, non-random, Western-specific point mutations. For the Sikkim atypical Eastern-CagA, the key Eastern-specific amino acids get replaced only by the key Western-specific amino acids and not at random by any other amino acids. These unique mutation-driven adaptations allow the evolving H. pylori strains to express, initially an atypical Eastern-CagA, and eventually a more evolved Western-CagA to achieve lesser virulence by minimizing the interactions with host gastric epithelial membrane. Importantly, although the Western-specific mutation accumulation is highest in cagA, it is not restricted only to the cagA. Rather, it occurs throughout the Eastern H. pylori genomes involving both virulence and housekeeping genes. Our analyses also revealed similar Westernization of the Eastern-H. pylori strains in other geographical regions. For example, in Myanmar and Indonesia, the H. pylori strains are still being evolved from the Eastern-CagA to Western-CagA expressing strains, while in Peru and Okinawa, the original Eastern-CagA expressing strains are now mostly evolved to become the Western-CagA expressing strains.
The small genome size [51], high mutation rate [14], intermediate generation time [52], long-term colonization [2, 53] in a particular host, lack of any non-human host [54] and transmission only through close human contacts (like within the family) [55], allow H. pylori to provide valuable information also on the humans that carry the infection [56]. Indeed, several articles in the last two decades suggested co-evolution and co-migration of H. pylori with humans from Africa to the rest of the world [12, 13, 57, 58]. Our study, which showed westernization of Eastern H. pylori, sheds new light on the original “out of Africa” theory of humans and H. pylori co-migration [57, 59]. Existence of two distinct H. pylori types viz. Western (Africa, Europe, South and West Asia) and Eastern (East Asia) clearly indicate that two different humans must have existed independently since for H. pylori, humans are the only host and H. pylori being a microaerophilic bacterium does not exist in the environment. It is also very unlikely that the ancient Western humans carried Western H. pylori to East Asia and those strains gradually evolved as Eastern H. pylori since our data clearly showed that it is the Eastern H. pylori, which evolved to become Western H. pylori. Therefore, it is likely that when the two distinct human populations carrying two distinct H. pylori types met and coexisted in certain geographical regions like Sikkim, due to the close human contacts, the Western and Eastern H. pylori strains had plenty of opportunities for admixtures. These genetic admixtures always result in a unidirectional westward evolution of Eastern H. pylori by the genome-wide accumulation of non-random Western-specific point mutations for achieving lesser virulence and better fitness, as observed with the unique atypical Eastern strains from Sikkim. Over the years it was noted that disparity exists between the incidence of gastric cancer and the estimated prevalence of H. pylori infection (around 50% of the global population), designated as ‘the enigma’, which is identified in population from Africa, and Asian countries like India, Thailand, Bangladesh and Pakistan [60]. The unidirectional evolution of the Eastern strains into less virulent Western strains and the existence of the transition forms partially explain ‘the enigma’ associated with H. pylori disease outcomes. The identified mutations in the Sikkim atypical Eastern strains can be used as potential biomarkers to further extend the conventional H. pylori classification and further sub-type the Eastern strains into typical ones and its less virulent ‘westernized’ variant.
Here we used the various possibilities of H. pylori whole genome analysis that revealed accumulation of Western-specific, non-random point mutations in the Eastern strains may provide survival advantage for the bacteria. However, it is known that the clinical outcome of the H. pylori infection varies across different populations and is influenced by a spectrum of factors including, H. pylori genotype, host immune response and genetics, diet, medication, life style etc. [61]. It is important to note that due to the limited sample size from Sikkim, correlation between the identified mutations and the clinical presentation of the patients could not be drawn, which is a limitation of the study. Even though, ‘westernization’ of Eastern H. pylori strains was convincingly shown in multiple populations that underwent significant genetic admixture in history, future studies involving a broader, global population will add further insights regarding this phenomenon. Further studies with larger sample size and in vivo infection models, which are beyond the scope of the present study, would be helpful to substantiate the current findings.

Supplementary Information

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

Disentangling Links Between Lung Cancer and Infectious Pneumonia via Real-World Data and Integrative Genomics.
Human mutation 2026 Diao YF 외 📖 OA
Saturation Genome Editing Targeting Mutations in HCT 116 Colon Carcinoma Cells for Pooled SNV Functional Profiling in Diploid Cancer Model.
Current issues in molecular biology 2026 Hamada T 외 📖 unpaywall
A Swedish Haplotype GWAS in Familial and Sporadic Site-Specific Colorectal Cancer.
International journal of molecular sciences 2026 Vermani L 외 📖 OA
Journey Towards Piloting Helicobacter pylori Screen-and-Treat to Address Health Inequities in Aotearoa New Zealand.
Helicobacter 2026 Teng A 외 📖 unpaywall
The Role of CRISPR and Its Therapeutic Applications in Glioblastoma.
International journal of molecular sciences 2026 Fayed S 외 📖 OA
The Role of Nuclear APOBEC Enzymes in Neoplastic Progression of Ulcerative Colitis.
Clinical and experimental gastroenterology 2026 Alsøe L 외 📖 OA