본문으로 건너뛰기
← 뒤로

Chromosome structural variation analysis reveals lung cancer-associated gene regulatory networks in rheumatoid arthritis patients.

1/5 보강
BMC medical genomics 📖 저널 OA 93.5% 2022: 2/2 OA 2023: 1/1 OA 2024: 1/1 OA 2025: 11/11 OA 2026: 13/15 OA 2022~2026 2025 Vol.19(1) p. 23
Retraction 확인
출처

Li H, Ding L, Liao R, Li N, Hong X, Jiang Z

📝 환자 설명용 한 줄

[BACKGROUND] Chromosomal structural variations (CSVs) that comprise multiple gene mutations are important determinants for multiple diseases.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Li H, Ding L, et al. (2025). Chromosome structural variation analysis reveals lung cancer-associated gene regulatory networks in rheumatoid arthritis patients.. BMC medical genomics, 19(1), 23. https://doi.org/10.1186/s12920-025-02273-7
MLA Li H, et al.. "Chromosome structural variation analysis reveals lung cancer-associated gene regulatory networks in rheumatoid arthritis patients.." BMC medical genomics, vol. 19, no. 1, 2025, pp. 23.
PMID 41437054 ↗

Abstract

[BACKGROUND] Chromosomal structural variations (CSVs) that comprise multiple gene mutations are important determinants for multiple diseases. However, the relationship between CSVs, rheumatoid arthritis (RA), and lung cancer is not well understood.

[MATERIALS AND METHODS] In this study, we analyzed CSV associations and differences between RA and RA with lung cancer (RA LC) using genome sequencing, with RA-associated interstitial lung disease (RA ILD) as a disease control. First, we analyzed the CSVs of each individual. Then, we identified common CSVs within each disease group and finally analyzed specific CSVs between different diseases. Gene Ontology/KEGG terms, canonical pathways, and feature gene sets were used for the functional annotation and analysis of CSV-related pathways.

[RESULTS] Cell size regulation and axon guidance were mutated in all disease groups. Protein deubiquitination was mutated in RA LC, while the negative regulation of extractable stroma and protein catabolism was mutated in RA ILD. Characterization of clinical data also revealed correlations with these specific pathways.

[CONCLUSION] This study identifies common and specific CSVs and associated pathways for RA, LC, and ILD, uncovering key genetic factors that provide new insights into their diagnosis and treatment.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

같은 제1저자의 인용 많은 논문 (5)

📖 전문 본문 읽기 PMC JATS · ~29 KB · 영문

Introduction

Introduction
Compared to the general population, patients treated with biologic therapies have an increased risk of developing lung cancer, with a standardized incidence ratio (SIR) of 1.35 [1.14; 1.60] [1]. Connective tissue diseases contribute to a poorer prognosis in lung cancer patients, particularly in those with interstitial lung disease (ILD) and Qi deficiency, both of which are associated with reduced survival in some cases [2]. Rheumatoid arthritis (RA) is a systemic inflammatory autoimmune disease with a global incidence of ~ 0.5–1.0.5.0% [3–5]. The main features are persistent synovitis, systemic inflammation, and the presence of autoantibodies, which ultimately lead to joint injury, disability, and complications of various systems [6, 7]. Smoking prevalence in patients with rheumatoid arthritis (RA) exceeds that reported in the general population. In addition, RA patients who smoked were more likely to develop lung cancer than smokers without RA [8]. In 2016, Curtis et al. found that among 5671 patients treated with tropinib, the most common malignant tumor was lung cancer (n = 24, 0.42%) [9]. In 2022, Chatzidionysiou et al. found 590 cases of lung cancer in 44,101 RA patients (56/100000), higher than the general population (HR 1.76, 95% CI 1.60 to 1.93); This increased risk remained after adjustment for smoking [10]. Although the mechanisms underlying this phenomenon are not yet fully understood, genetic factors appear to play a critical role [11].
Genetic factors contribute to 60% of RA susceptibility [12, 13]. Researchers found that more than 100 loci are associated with RA, of which 80% of the risk variants are located in non-coding regions, affecting gene expression and mRNA stability [14]. The HLA-II gene is associated with RA susceptibility (accounts for 30%) [15]. Inherited mutations in HLA-DRB1 and IRX1 were shown to contribute to the pathogenesis of RA [16, 17], and polymorphisms in other classical HLA genes such as HLA-DPB1, HLA-B, and HLA-A genes also increased RA susceptibility [18]. Non-classical HLA gene synonymous mutations in HLA-DOA are an independent risk of anti-citrullin protein autoantibodies (ACPA)-positive RA [19]. In addition, heterogeneous single nucleotide polymorphisms in genes encoding protein tyrosine phosphatase (PTPN22) outside the HLA region were associated with RA [20]. A functional mutant (R620W) of PTPN22 widely expressed in hematopoietic cells has been shown to cause a two-fold increased risk of RA and other autoimmune diseases [21]. Xu Huji et al. firstly identified MMEL1 and CTLA4 as susceptibility genes for RA in Han Chinese [22, 23]. The further international collaborative study uncovered 101 RA pathogenic loci, which shed light on developing new drugs [24]. A genome-wide association of meta-analysis revealed that RA-associated single nucleotide polymorphisms (SNPs) show significant enrichment in the enhancer regions of T cells and NK cells [25]. A study based on the Wellcome Trust Case Control Consortium (WTCCC) SNP datasets identified 49 SNPs presumed to be associated with RA [26]. In addition to HLA-DRB1 and PTPN22, TRAF1-C5 and TAT4 have also been linked to RA susceptibility [27–29]. These studies demonstrate that genetic variation is closely related to the pathogenesis of RA.
Previous studies on RA have primarily focused on mutations in individual genes or amino acids, without giving much attention to chromosomal variations. However, CSVs involve many genes and are crucial to genetic variation. Therefore, it is of significant importance to analyze CSVs the genome-wide level in RA patients, as this can help improve our understanding of the pathogenic mechanisms.

Materials and methods

Materials and methods

Patients and controls
Patients and normal controls were recruited from the Department of Rheumatology, Shenzhen People’s Hospital, the second clinical medical college, Jinan University, China, from January 2020 to March 2022. All participants were over 18 years old and provided written informed consent. Since RA patients are primarily female (80%), we mainly enrolled female patients (90%) in this study. Patients with RA strictly fulfilled the 2010 ACR/EULAR classification criteria, and their medical histories and biochemical test results were obtained from the hospital information system. Peripheral blood mononuclear cells (PBMCs) were stored at −80℃ in a biospecimen bank. This study was approved by the ethics committee of Shenzhen People’s Hospital, the second clinical medical college, Jinan University, China, and was conducted in accordance with the Declaration of Helsinki. Informed consent was obtained from all subjects and/or their legal guardian(s).

Whole-genome sequencing (WGS) and quality control (QC)
The genome sequencing was conducted by BGI (BGI Genomics Co., Ltd. Shenzhen, China) using the DNBSEQ platform for high-throughput sequencing. This platform ensures that each sample meets the required data output standards. Quality control (QC) for the samples involved multiple steps. Initially, DNA samples were fragmented using either a Covaris ultrasonication system or a fragmentation enzyme to produce DNA fragments of approximately 350 bp. These fragments then underwent end repair, addition of an “A” base to the 3’ ends, and ligation of sequencing adapters. For some libraries, linear amplification (LM-PCR) was performed, although this step was omitted for PCR-free libraries. After amplification or adapter ligation, the products were subjected to single-strand separation and circularization, with the circularized libraries undergoing rolling circle amplification (RCA) to produce DNA Nano Balls (DNBs). The DNBs were then quality-controlled, and upon passing QC, they were sequenced on the DNBSEQ platform. The sequencing generated paired-end reads stored in FASTQ format as raw data. On average, each sample produced 100,471.76 Gb of raw bases. After removing low-quality reads, each sample yielded an average of 658,872,994 clean reads (98,830.95 Gb), with high sequencing quality indicated by Q20 = 96.87% and Q30 = 92.36%. The average GC content was 40.90%.

Identification of chromosomal structural variations (CSV)
CSVs were detected using Breakdancer software with default parameter settings. This method primarily relies on the distances between paired-end reads and their alignment directions, utilizing an algorithm based on discordant read pairs to identify structural variations. The specific command used for detecting structural variations was: breakdancer_max sample.cfg > sample.out.

Functional annotation and pathway analysis
DAVID Bioinformatics Resources 6.8 (https://david.ncifcrf.gov/) was employed for functional annotation of the key differential genes primarily associated with either RA or lung cancer [30]. Pathways were enriched using web-based knowledge databases KEGG, GO and BioCarta [31]. KEGG pathway mapping was utilized to generate signaling pathways.

Protein-protein interaction (PPI) analysis
In order to investigate the interactions between proteins, we established a PPI network in the STRING database (Version 11.0; http://string-db.org/). The analysis involves identifying protein-protein interactions, determining their binding strengths, and characterizing the resulting protein complexes.

Clinical data
Safety assessment was performed by blood routine examination, liver and kidney function and immunoglobulin. Serological markers included rheumatoid factor (RF), C-reactive protein (CRP), erythrocyte sedimentation rate (ESR), anti-CCP antibodies, complement C4 (C4), neutrophil and lymphocyte.

Statistical analysis
Statistical analysis was conducted using SPSS 17.0 software (SPSS, Chicago, USA). Violin plots and box plots were used to visualize the distribution of key variables across different disease groups and were generated using GraphPad Prism 5. The measurement data were expressed as “r” with a 95% confidence interval (CI). Normally distributed data were compared using ANOVA or t-tests, while nonparametric methods were applied to data that did not follow a normal distribution. Data were presented as mean ± standard deviation (SD). The choice of statistical method was determined according to the number of samples. A Bonferroni correction was applied to adjust for multiple testing, and a p-value of < 0.05 was considered statistically significant.

Results

Results

Determine the chromosomal structural variation of the individual patient
The patients in this study were divided into three groups: primary rheumatoid arthritis (RA), rheumatoid arthritis with lung cancer (RA LC) and rheumatoid arthritis with interstitial lung (RA ILD), the human reference genome (GRCh38/HG38) was used as a control to identify the differences in CSV between different groups. The analysis flow is briefly as follows: each individual was first analyzed for CSVs, then intersected to obtain the common CSVs in each group, and finally to determine the specific differences between different groups. The inactivated genes in CSVs were subsequently analyzed. The signaling pathways associated with various diseases were finally identified through enrichment analysis of the critical genes.
Deletion mutation was the primary type of chromosomal structural variation of RA, RA LC, and RA ILD. The number of structural variants in the three diseases was similar; the median of RA ILD is the most miniature, which is 88.1% of RA (Fig. 1A)。 The length characteristics in the three groups were consistent, with similar median and quartiles (Fig. 1B). Among them, the structural variation of RA LC has the most extended average length, and RA is the shortest.

Exclude individual differences to determine the chromosomal variation for each disease
Chromosomal variation may vary from patient to patient due to differences in genetic material and physiological conditions. To exclude individual differences, we intersected the chromosomal variants of patients in the same group. 1072, 1702, and 1117 genes were obtained in RA, RA LC and RA ILD, respectively (Fig. 2).

Determination of specific CSVs in each disease and related functional analysis
After obtaining each group’s mutated genes, the differences between different diseases were subsequently analyzed. RA LC had the most specific variant genes (610). RA and RA ILD had 110 and 151 specific variant genes, respectively. The three diseases shared a total of 723 common variant genes (Fig. 3). Next, the specific genetic factors between different diseases were explored by analyzing and mining the functions of the above key genes.
Previous analyses have found that each disease had some specific chromosomal variants, but their functions were unclear. We first identified the top 10 statistically enriched terms, including Gene Ontology (GO)/Kyoto Encyclopedia of Genes and Genomes (KEGG), canonical pathways, and hall mark gene sets. There were 110 specific variant genes of RA, and three terms related to immunity were obtained by functional analysis: Regulation of complement cascade, herpes simplex virus 1 infection and MHC class II antigen presentation. In addition, determination of left/right symmetry and lipoprotein metabolic process were also obtained (Fig. 4A). Regulation of Complement cascade, determination of left/right symmetry and herpes simplex virus 1 infection had the highest significance (Fig. 4B). We then selected a subset of representative terms from this cluster and converted them to a network layout (Fig. 4C). The Molecular Complex Detection (MCODE) analysis identified the top 3 components: determination of left/right symmetry, MHC class II antigen presentation and regulation of complement cascade.
RA LC has 610 specific variant genes, and enrichment analysis yields five key terms: negative regulation of cell differentiation, protein deubiquitination, abacavir transmembrane transport, glycogen synthesis & degradation and regulation of T cell tolerance induction et al. (Fig. 5A). The significance analysis showed that the term protein deubiquitination had the highest significance of –log10(P) = 10.0 (Fig. 5B). The Molecular Complex Detection (MCODE) analysis also identified protein deubiquitination as one of the essential components (Fig. 5C), which has not been reported in RA LC research.
RA ILD has 151 specific variant genes, and enrichment analysis yields seven key terms: negative regulation of protein catabolic process, extracellular matrix organization, IL8 CXCR2 PATHWAY, epithelial cell migration, class B/2 (Secretin family receptors), response to tumor necrosis factor, regulation of synaptic vesicle exocytosis, neuronal system TRKR pathway and regulation of neuron projection development (Fig. 6A). The significance analysis showed that negative regulation of protein catabolic process, extracellular matrix organization, IL8 CXCR2 PATHWAY and epithelial cell migration had statistically significant differences (Fig. 6B). The Molecular Complex Detection (MCODE) analysis also identified the IL8 CXCR2 pathway as a hub for connecting neuronal system, regulation of synaptic vesicle exocytosis and response to tumor necrosis factor. The extracellular matrix organization and epithelial cell migration existed independently (Fig. 6C).

Specific genetic regulatory network in different disease pathways
All three diseases have certain pulmonary manifestations, and specific genetic feature plays a vital role. We performed an intergroup comparison to explore the relation and differences of key chromosomal variations between the diseases. Since RA is the basic disease in all groups, the common variant genes were the most (RA|RA LC|RA ILD). Then RA LC, RA ILD, and RA followed in order, with arcuate connections in the middle indicating the same pathways contain the genes (Fig. 7A). The more variant genes specific to RA LC suggest the more cancer-associated genetic factors, while ILD with the less. There are more links between RA LC and RA ILD than RA. The functional analysis of multiple groups showed that RA|RA LC|RA ILD enriched the most functional terms (Fig. 7B), with the highest significance for axon guidance. Protein-protein interaction analysis for common genes in RA|RA LC|RA ILD revealed axon guidance and cardiac muscle tissue development in the middle positions acting as hubs, while MHC class II protein complex assembly and regulation of ion transport were in the edge positions (Fig. 7C). MHC class II molecules present peptide fragments to T cells for immune recognition and thereby activate adaptive immunity in vivo, so MHC class II-restricted antigen presentation is critical for CD4+ T cell-dependent immune responses [32] (Fig. 7D). Regulation of Complement cascade and determination of left/right symmetry were enriched in RA. The complement system is an essential component of the innate immune system. It is essential for defense against microbial infections and for the clearance of immune complexes and damaged cells [33]. Complement has been implicated in the pathogenesis of RA; Elevated levels of complement activation products have been measured in the plasma, synovial fluid, and synovial tissue of patients. Complement polymorphisms are associated with RA in genome-wide association studies [34]. Activation of the complement system drives local inflammatory responses through metabolic reprogramming of synovial fibroblasts [35]; Synovial C3D levels were increased in active RA joints compared to controls [36]. These results suggest that complement activation plays an essential role in the induction and development of RA. Studies in recent years have demonstrated that anti-TNF agents are effective for the treatment of RA, and the reduction of complement activation may be one of the mechanisms by which TNFα inhibitors exert their effectiveness in inflammatory arthritis [37]. Therefore, inhibition of complement activation may become one of the targets for the treatment of RA. There have been some relevant studies on complement abnormalities; however, the association of determination of left/right symmetry and RA is still lacking. We annotated the locations of key mutated genes in the body’s immune pathways that are critical for antigen presentation.
The analysis of CSV highlighted the crucial role of MHC class II molecules in immune recognition and the complement system in the pathogenesis of RA. Building on this, we further analyzed the changes in various hematological biomarker levels across the three disease groups. Hematological clinical data revealed significant variations in biomarker levels, emphasizing the role of these immune components. Rheumatoid factors were elevated in RA and RA LC, with median values of 86.55 and 213.50, respectively, while RA ILD had a notably lower median value of 18.8. This elevation in rheumatoid factors suggests pronounced immune dysfunction, likely linked to dysregulated MHC class II-mediated antigen presentation. C-reactive protein (CRP) levels were significantly higher in RA ILD and RA LC, being three and two times greater, respectively, than in RA. All three disease groups had CRP levels exceeding the normal threshold of 5 mg/L, indicating heightened systemic inflammation and complement activation. The erythrocyte sedimentation rate (ESR) showed a similar pattern, being lower in RA compared to RA ILD and RA LC, with the latter two groups indicating more acute inflammatory processes. Additionally, the levels of neutrophils and lymphocytes were lower in RA ILD than in RA and RA LC, reflecting distinct inflammatory profiles, potentially driven by immune system dysregulation. These findings, combined with genetic regulatory network analysis, underscore the complex interplay between MHC class II molecules, the complement system, and disease-specific pathways in RA, RA LC, and RA ILD (Fig. 8).

Discussion

Discussion
This study aimed to identify differential CSVs among RA, RA LC and RA ILD. RA was chosen as the underlying disease because patients with RA are known to be at increased risk of malignancy [38–40]. Pulmonary manifestations are common in RA, and the genetic causes of lung disease in patients with RA were explored in this study.
While meta-analyses suggest that patients with RA have an increased risk of lung cancer compared with the general population [41, 42], the role of genetic factors is not clear. Therefore, we focused on the functions of disease-associated chromosomal variants encompassing genes and found some variations and associations between different diseases.
We observed that protein deubiquitination was absent in RA LC. Deubiquitinating enzymes are a large class of proteases, and changes in deubiquitinating enzymes have been linked to tumor cell proliferation and survival. For example, the downregulation of deubiquitinating enzyme USP12 promotes mouse and human lung tumor growth and promotes an immunosuppressive microenvironment [43]. Compared with healthy controls, the genes specific to RA are enriched in axon guidance [44]. Abnormal levels of Netrin 1, an axon guidance factor, had been detected in the synovial fluid of patients with RA. Netrin 1 has been proven to be a key regulator of osteoclast differentiation. Although the related mechanism in osteoclasts is mainly unknown, it could be a novel therapeutic target for RA bone destruction [45, 46]. MHC class II antigen presentation is strongly associated with immune function. Antigenic peptide-loaded MHC class II molecules are constitutively expressed on the surface of professional antigen-presenting cells (APCs), including dendritic cells, B cells, macrophages, and thymic epithelial cells [47]. The observed elevations in rheumatoid factor in RA LC, suggest that MHC class II molecules play a critical role in the immune response.
Interestingly, the loss of extracellular matrix organization was significantly enriched in RA ILD. The lung extracellular matrix plays a vital role in the normal structure of the lung [48], and its properties are essential in response to changes in lung disease. However, they are poorly understood. Here we found null mutations associated with the extracellular matrix organization in interstitial lung disease, including proteins VIT, ST7, ADAMTSL2, SPINK5, EFEMP2, HPSE2, COL28A1, LAMA2, SLIT2 and FRAS1.
A limitation of this study is the small sample size, which limits the statistical significance of the comparisons. The incidence of RA is 0.5–1.0.5.0% [3–5], and the average incidence of lung cancer was 0.059% [49], which resulted in fewer patients suffering from both diseases. Another limitation is sex predilection, with RA more common in women and lung cancer more common in men. As RA is the basic disease of this study, most of the patients are women (89%), and male patients are insufficient.

Conclusion

Conclusion
Determining the genetic factors of RA, LC and ILD is a significant challenge that plays a vital role in detecting and treating diseases. Based on the comprehensive analysis of chromosome variation, we described the genetic characteristics of RA compared with related lung diseases, and revealed the related hematological characteristics. Identifying genetic factors in defined disease groups can not only explain the limitations of current pathogenesis but also provide insights for discovering promising targets and pathways.

Supplementary Information

Supplementary Information

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기