본문으로 건너뛰기
← 뒤로

Etiology of gene expression-based subtypes of breast cancer in the Ghana Breast Health Study.

2/5 보강
International journal of cancer 📖 저널 OA 56.1% 2022: 0/3 OA 2023: 1/3 OA 2024: 6/16 OA 2025: 32/61 OA 2026: 153/241 OA 2022~2026 2026 Vol.158(11) p. 2890-2899 OA Breast Cancer Treatment Studies
Retraction 확인
출처
PubMed DOI PMC OpenAlex 마지막 보강 2026-04-28
OpenAlex 토픽 · Breast Cancer Treatment Studies Cancer Risks and Factors BRCA gene mutations in cancer

Hurson AN, Butler EN, Hamilton AM, Shah KK, Sienso BA, Ocansey GA

📝 환자 설명용 한 줄

Breast cancers are heterogeneous and largely classified using immunohistochemistry of estrogen receptor expression.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 표본수 (n) 278
  • 연구 설계 Case-control

이 논문을 인용하기

↓ .bib ↓ .ris
APA Amber N. Hurson, Ebonee N. Butler, et al. (2026). Etiology of gene expression-based subtypes of breast cancer in the Ghana Breast Health Study.. International journal of cancer, 158(11), 2890-2899. https://doi.org/10.1002/ijc.70332
MLA Amber N. Hurson, et al.. "Etiology of gene expression-based subtypes of breast cancer in the Ghana Breast Health Study.." International journal of cancer, vol. 158, no. 11, 2026, pp. 2890-2899.
PMID 41544200 ↗
DOI 10.1002/ijc.70332

Abstract

Breast cancers are heterogeneous and largely classified using immunohistochemistry of estrogen receptor expression. However, research suggests RNA-based subtyping, including intrinsic (luminal vs. non-luminal) and TP53-based subtypes, may offer additional etiologic insight. TP53 mutant tumors, often more aggressive and non-luminal, are common among women of African descent. We examined possible heterogeneity for RNA-based luminal/non-luminal and TP53 subtypes among women of west African ancestry. We analyzed 595 invasive breast cancer cases and 2096 controls in the Ghana Breast Health Study. RNA was extracted from formalin-fixed paraffin-embedded tumor samples and profiled via nCounter® Breast Cancer 360™. Tumors were classified as luminal (N = 278) vs. non-luminal (N = 282) and TP53 wildtype-like (N = 324) vs. mutant-like (N = 271) using the PAM50 assay and a validated RNA signature, respectively. Case-control odds ratios and 95% confidence intervals were estimated using polytomous logistic regression. Etiologic heterogeneity was assessed in case-only analyses. Higher parity was more protective for luminal than non-luminal tumors (p-heterogeneity = .05). Older age at menarche and alcohol use ≥6 months were associated with elevated risk of luminal, but not non-luminal tumors (p-heterogeneity = .01). Similar trends were observed for TP53 wildtype-like tumors, though not statistically significant. Cross-classification of PAM50/TP53 showed that higher parity, older age at menarche, and alcohol use ≥6 months were more strongly associated with luminal/TP53 wildtype-like than other subtypes. RNA-based breast cancer subtyping suggests TP53 refines breast cancer etiologic heterogeneity in a sub-Saharan African population. The high prevalence of aggressive, mostly TP53-mutant tumors in this population underscores the need for further studies to clarify etiologic heterogeneity.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~29 KB · 영문

INTRODUCTION

INTRODUCTION
It is well established that breast cancer is a heterogeneous disease, with clinically, molecularly and pathologically defined subtypes that have very different etiologies and outcomes.1-3 A recent systematic review found suggestive to convincing evidence of breast cancer risk factor heterogeneity across estrogen receptor (ER) subtypes, with consistent patterns across multiple racial and ethnic populations.4
Breast cancer incidence and mortality rates (including its molecular subtypes) vary considerably by race and ethnicity, despite consistent subtype-specific risk factor effects across populations.4,5 Data on cancer incidence in sub-Saharan Africa are extremely limited, but in the US, Black women have higher incidence of aggressive breast cancer and higher mortality rates of all subtypes compared to White women. 6,7 Nonbiological factors (e.g., socioeconomic status, access to health care,) are clear contributors to racial/ethnic/geographic heterogeneity in breast cancer incidence and mortality rates, however, differences persist after controlling for these factors, suggesting tumor biology and genetics might also be playing a role. The hypothesis that West African ancestry influences breast cancer biology is supported by research showing higher rates of an aggressive breast cancer phenotype in West Africa compared to other African regions,8,9 as well as the observation of an elevated tumor mutational burden and distinct immunologic profiles among breast cancer patients of West-African descent.10,11
The majority of breast cancer subtyping for clinical and etiologic research purposes has been based on immunohistochemical markers of ER status. However, emerging data indicate that RNA-based subtyping, encompassing intrinsic (luminal vs. non-luminal) and TP53 subtypes, may provide additional etiologic resolution beyond ER status. For example, within a racially diverse population-based study in the US, we previously assessed the relative contribution of different tumor markers to the heterogeneity effects of established breast cancer risk factors.12 RNA-based TP53 and immunohistochemistry (IHC)-based ER accounted for more heterogeneity than other markers, had specific risk factor profiles, and were found to have independent and combined effects. However these markers have never been applied in etiologic studies of African populations.
Breast cancer incidence rates are increasing in sub-Saharan Africa,13 with high incidence of tumors with aggressive characteristics,9 mirroring patterns among African American women in the US. The Ghana Breast Health Study (GBHS) is a population-based case-control study that was designed to investigate breast cancer etiology among women with west African ancestry to further our understanding of aggressive breast tumor subtypes. A prior analysis in this population evaluated associations between reproductive risk factors and ER-based breast cancer subtypes14 and found no statistically significant differences by ER status. However, after stratifying by ≥50 and <50 years of age, associations between some reproductive factors (parity and breastfeeding) differed by ER subtypes with patterns expected based on previous literature. Herein we extend that work by employing gene expression-based classifiers to categorize the breast tumors into intrinsic (luminal vs. non-luminal) and TP53 (mutant-like vs. wildtype-like) subtypes to further resolve etiologic factors not fully captured by IHC or in ER-based classification schema.

MATERIALS AND METHODS

MATERIALS AND METHODS

Study Population
The Ghana Breast Health Study (GBHS) is a population-based case-control study run from 2013 to 2015. The study was conducted in collaboration with the three Ghanaian hospitals responsible for treating most of the country’s breast cancer cases: Korle Bu Teaching Hospital (KBTH) in Accra, as well as Komfo Anokye Teaching Hospital (KATH) and Peace and Love Hospital (PLH) in Kumasi.15 Cases were defined as women who, within the preceding year, presented with a breast lump suspected to be malignant and were subsequently referred either for biopsy at one of the study hospitals or for clinical management. Controls were women aged 18 to 74 years and never diagnosed with breast cancer who were identified through household enumeration of census-defined geographic areas within Ghana’s Ashanti, Central, Eastern, and Greater Accra regions. All participants, both cases and controls, were required to meet the following criteria: (1) female sex, (2) age 18–74 years, (3) residence for at least one year in the defined catchment areas surrounding Kumasi and Accra, and (4) completion of an in-person interview.
Recruitment started in 2013 and ended in October of 2015 with enrollment (i.e., interviewing) of 2,202 cases and 2,106 controls which were age and region of residence matched as previously described.15 Participation rates were over 90% for cases and controls.15 Suspected breast cancer cases were enrolled at the time of biopsy, with ultimately N=1,071 of the 2,202 cases receiving pathologically confirmed breast cancer diagnoses.

Risk Factor Information
Data on breast cancer risk factors from the GBHS were collected through a standardized interview-based questionnaire. The questionnaire includes information on an expansive range of patient characteristics and risk factors.15 In the present analysis we evaluated well-established breast cancer risk factors, which have been shown to be related to risk in studies of other (predominantly European ancestry) populations: age at menarche, parity, age at first birth, breastfeeding duration, oral contraceptive (OC) use, body size, family history of breast cancer, and alcohol use.4 Breastfeeding duration is defined as the median months of breastfeeding per pregnancy. Body size is classified according to a previously published 9-scale pictogram.16 Alcohol use was assessed by asking: “Has there ever been a time in your life when you had at least one drink a month?” and “For how long did you have at least one drink a month: Would you say less than six months, or six months or longer?”.

Tumor biopsy tissue collection
A total of 4-8 needle core biopsies (14-gauge) were taken from cases prior to any treatment and processed into FFPE blocks for diagnostic purposes using standardized protocols.15 Blocks that were not needed for diagnosis were sent to the NCI for research (75% of cases had one block and 25% had 2-5 blocks). Centralized histopathology review of H&E sections was conducted for tumor blocks sent to the U.S., where 1,071 biopsy samples were confirmed to be malignant tumor.
For this study, inclusion criteria were 1) pathologically confirmed invasive cases and, 2) availability of biopsy tissue blocks. To ensure high-quality mRNA expression data, biopsy tissue blocks containing less than 10% tumor tissue (as determined by pathology review of H&E sections) were excluded (N=327). This left 745 tissue blocks eligible for analysis (Supplementary Figure 1).

Clinicopathological data
Methods for obtaining information on IHC markers have been described previously.14 Briefly, IHC marker status was obtained from pathology departments in Ghana for 69% of cases. Tumors were classified as positive for estrogen receptor (ER) and progesterone receptor (PR) if ≥10% of cells showed positive staining. HER2 status was considered positive if staining was 3+. Borderline and negative cases were considered HER2 negative. Agreement of IHC assays performed in pathology departments in Ghana with those performed at the NCI laboratory was high (79% for ER, 65%, for PR, and 78% for HER2; p<0.01).14 Tumor size was determined by clinical palpation at diagnosis, and histologic grade was assessed through centralized pathology review.

RNA-based gene expression
Gene expression was analyzed using the Breast Cancer 360™ panel on the Nanostring nCounter® platform, which measures expression of 776 genes that are involved in various key breast cancer pathways and processes.17 Gene expression data was run in two batches and then cleaned and normalized using Remove Unwanted Variation (RUVg)18 as previously described19,20. For normalization purposes, we leveraged 11 out of 17 available housekeeping genes with r Pearson correlation values ≥ 0.85 (i.e., SF3A1, MRPL19, POLR2A, ABCF1, PUM1, TBC1D10B, SDHA, OAZ1, UBB, PSMC4 and TBP), and using RUV k=1. Of 745 tissue blocks eligible for RNA extraction, 595 passed normalization (80% passing rate, Supplementary Figure 1).
A research version of the PAM50 molecular subtype predictor21 was used to classify tumors as luminal A, luminal B, HER2-enriched, basal-like or normal-like. Tumor subtype was then dichotomized as luminal (luminal A or luminal B) or non-luminal (HER2-enriched or basal-like). Few tumors were classified as normal-like (n=35 [6%]), typically reflecting low tumor content; therefore, only estimates for luminal and non-luminal subtypes are shown. We also applied a previously validated RNA signature that aggregates information on 48 TP53-dependent genes to classify TP53 status (mutant-like or wildtype-like) based on a similarity-to-centroid approach, as previously described.22

Statistical Methods
To determine whether the characteristics of individuals included in the analysis differed from those that were excluded (based on percentage of tumor tissue for RNA analysis), we compared frequencies of patient/clinical characteristics and risk factors between cases with 0, >0 to <10, and ≥10% tumor tissue using a chi-square test. We also assessed the degree of agreement between RNA- and IHC-based subtyping schema by calculating the percent agreement between PAM50 intrinsic subtype and ER status, as well as between PAM50 intrinsic subtype and hormone receptor status.
In case-control analyses, polytomous unconditional logistic regression models were used to estimate odds ratios (OR) and 95% confidence intervals (CI) between breast cancer risk factors and tumor subtypes defined by RNA-based TP53 status (wildtype-like/mutant-like) and by PAM50 intrinsic subtype (luminal/non-luminal). To evaluate the combined effects of these two markers, we estimated associations between risk factors and the four possible breast cancer subtypes when cross-classifying RNA-based TP53 status and PAM50 intrinsic subtype (TP53 wildtype-like/luminal, TP53 wildtype-like/non-luminal, TP53 mutant-like/luminal, TP53 mutant-like/non-luminal). In additional case-control analyses, we estimated the effects of risk factors, i.e., OC and alcohol use, that were previously unpublished in the GBHS on risk of ER subtypes.14
Case-case p-values were estimated to test for etiologic heterogeneity of risk factor associations for TP53 mutant-like compared to wildtype-like cases, as well as for non-luminal compared to luminal cases. Given the high correlation between several breast tumor markers, we conducted a sensitivity analysis to estimate the independent effect of several additional tumor characteristics (tumor grade, ER status, PR status, HER2 status, and tumor size), with RNA-based TP53 status and PAM50 intrinsic subtype, on the risk factors of interest.
For all analysis, risk factors were modeled as dichotomous variables and models were adjusted for study site, age (as a continuous variable), education, and mutually adjusted for all risk factors. All statistical tests were two-sided. Analyses were conducted in R software version 4.2.0 (R Foundation for Statistical Computing).

RESULTS

RESULTS
Table 1 describes the frequencies of characteristics for controls and breast cancer cases. Compared to controls, cases had later ages at first birth, fewer months breastfeeding, more OC use, and a higher frequency of breast cancer family history. Among cases, aggressive tumor characteristics (e.g., hormone receptor negative status, grade 3), as well as younger ages at first birth were more frequent among TP53 mutant-like compared to wildtype-like cases, as well as among non-luminal compared to luminal cases (Supplementary Table 1). Characteristics of cases with tumor samples eligible for RNA extraction (i.e., ≥10% tumor tissue) were similar to those with ineligible samples (<10% tumor tissue), except with regard to tumor size and ever use of alcohol (Supplementary Table 2). Consistent with previous studies,23-25 high agreement was observed between RNA- and IHC-based subtyping schema (79% agreement, Supplementary Table 3).
Differences in risk factor patterns were observed across breast cancer subtypes defined by RNA-based TP53 status (Table 2). Alcohol use duration ≥6 months was associated with TP53 wildtype-like (1.51 [1.03, 2.20]), but not with mutant-like (p-heterogeneity=0.06). Although not statistically different across subtypes in case-case analyses, we observed an association between later age at menarche (OR [95% CI]=1.37 [1.06, 1.78]) and higher parity (0.67 [0.50, 0.90]) with TP53 wildtype-like, but not with mutant-like breast cancer. Additionally, there was a suggestive association between ever use of alcohol and TP53 wildtype-like status (1.28 [0.99, 1.66]), and no association with mutant-like.
Risk factor patterns also differed across luminal/non-luminal subtypes, as shown in Table 3. Age at menarche ≥16 years, (1.27 [0.96, 1.67], p-het=0.01), ≥3 births (0.63 [0.46, 0.86], p-het=0.05), and alcohol use duration ≥6 months (1.75 [1.19, 2.58], p-het=0.01) were associated with luminal, but not non-luminal breast cancer. In further analysis of associations with all PAM50 subtypes (Supplementary Table 4), higher parity and alcohol use duration ≥6 months were more strongly associated with luminal B than with luminal A subtype. Although case-case comparisons did not reach statistical significance, ever use of OCs was associated with non-luminal (1.68 [1.15, 2.46]), not luminal subtype; and ever use of alcohol was associated with luminal (1.30 [0.99, 1.71]), not non-luminal subtype. Further analysis revealed that the association between OC use and non-luminal subtypes was limited to basal-like cases, while ever use of alcohol was primarily associated with luminal B tumors, as shown in Supplementary Table 4.
Table 4 reports the associations of risk factors with breast cancer subtypes defined by the joint classification of RNA-based TP53 status and luminal/non-luminal subtypes. Inclusion of both markers helped to clarify the association with age at menarche and parity, such that later age at menarche and higher parity were associated with the luminal/TP53 wildtype-like subtype (1.31 [0.98, 1.75] and 0.61 [0.44, 0.85], respectively) but not the other subtypes. For certain risk factors, the subtype heterogeneity was adequately captured by only one of the two markers. For instance, ever use of OCs was associated with an increased risk of non-luminal subtype, regardless of TP53 status (although the association with TP53 wildtype-like did not reach statistical significance). Similarly, ≥6 months duration of alcohol use was associated with an increased risk of luminal subtype, regardless of TP53 status (although the association with TP53 mutant-like did not reach statistical significance).
To understand the contributions of multiple correlated tumor characteristics, including TP53 status and PAM50 subtype, in determining risk factor associations, we modeled these markers simultaneously with other highly correlated breast tumor characteristics in case-only logistic regression models with risk factors as outcomes (Supplementary Table 5). The markers with the strongest independent contribution to the subtype heterogeneity for age at menarche were PR (p=0.05), HER2 (p=0.03), and to a lesser extent ER (p=0.08). For parity, it was HER2 (p=0.05). The markers contributing to heterogeneity for OC use were PAM50 (p=0.02) and to a lesser extent TP53 (p=0.10). For ever use of alcohol it was grade (p=0.01) and for <6 months duration of alcohol use it was grade (p=0.01) and PR (p=0.03).
Analyzing all pathologically confirmed invasive cases by ER status (N=926, Supplementary Table 6) revealed similar subtype-specific patterns, with a stronger association between ever OC use and risk of ER-negative than with ER-positive subtype (1.56 [1.18, 2.05] and 1.24 [0.92, 1.68], respectively), and an association between alcohol use duration ≥6 months and ER-positive (1.52 [1.08, 2.12]), but not ER-negative breast cancer.

DISCUSSION

DISCUSSION
We investigated the contribution of two distinct biological processes on etiologic heterogeneity of breast cancer in a population-based study of Ghanaian women. The estrogen-dependent pathway, represented in this study by PAM50 subtype (luminal/non-luminal), is a commonly used and useful mechanism for characterizing breast cancer subtype heterogeneity across populations, particularly for reproductive, anthropometric, and medical history factors.4 There is also growing evidence for the importance of the DNA repair pathway, represented in this study by TP53 status, in breast cancer etiology.12,26-30 The present analysis demonstrates that in the GBHS, these tumor markers independently and jointly define breast cancer subtypes with unique risk factor associations.
TP53 and luminal subtypes were found to be jointly driving the subtype heterogeneity for parity and age at menarche in our study. In line with our findings, the dual action of these pathways has been observed for the association with parity12,27,28,30 and age at menarche27,28 within several racially and ethnically diverse population-based studies in the US. Among Chinese women, the combination of TP53 and luminal/non-luminal subtypes refined parity-related breast cancer etiologic heterogeneity beyond any of the markers individually.26 Additionally, prior studies have found markers of TP53 and ER status to be jointly informative when characterizing subtype heterogeneity for pre- and post-menopausal BMI,27,28 breastfeeding,27,30 and menopausal status.27
In the current study, OC and alcohol use demonstrated suggestive evidence for etiologic heterogeneity by luminal and TP53 status. OC use appears to be primarily acting through the estrogen-dependent pathway and alcohol use through the DNA-repair pathway to impact breast cancer risk. However, the analysis of the joint classification of RNA-based TP53 and luminal/non-luminal subtypes (Table 4), along with case-case analyses across multiple correlated tumor characteristics (Supplementary Table 5) suggest that the relationships may be more nuanced, and both pathways could be involved. Prior studies of US populations have had mixed findings, with breast cancer subtype associations for OC use reportedly driven by TP53 status,12 ER status,12 or jointly by TP53 and ER.28 In the only prior study of TP53 and ER tumor markers with alcohol use (which included a US study population), this risk factor was not observed to be acting through either of the two pathways.12 When interpreting the results of this study, it is important to note that the reported prevalence of alcohol use in the GBHS was low compared to high income countries. Only 8% of controls reported drinking ≥1 drink per month for over 6 months. Further studies are needed to conclusively determine the impact of OC and alcohol use on risk and their underlying biological pathways.
The availability of biopsy samples was a major strength of the study that provided the unique opportunity to apply gene expression-based classifiers to a population for which molecular data is rarely available. Most previous studies have used IHC staining to classify TP53 status,27,28,30,31 which misses many mutations that are not associated with TP53 protein overexpression.1,22,32 IHC classification methods, therefore, can be problematic when evaluating etiology of breast tumors with an aggressive phenotype because such tumors are more likely to carry TP53 mutation types that are not associated with protein overexpression (e.g., nonsense and frameshift mutations).1 Thus, RNA classification methods of TP53 may be preferred in etiologic studies, as they capture downstream transcriptional activity and are more sensitive to pathway changes caused by these diverse mutation types. In a US population, RNA-based methods were more sensitive than DNA or IHC methods for classifying TP53 mutant-like tumors, particularly among Black and younger women, who are more likely to be diagnosed with tumors that have aggressive features.12 Another key strength was the high participation rates of cases and controls (>90%), leading to greater internal generalizability of the study findings.
This study was not without limitations. Gene expression data was not available for all breast cancer cases. While this may reduce the precision of our estimates, it is not likely to impact the validity, as the characteristics of cases eligible for RNA extraction generally do not vary substantially from those without (Supplementary Table 2). Although frequency of alcohol use varied between these groups, the observed associations were similar when stratifying by tumor size (Supplementary Tables 7a-7b). The sample size required dichotomization of risk factors. Further, there were small numbers of subjects when cross-classifying cases by both tumor markers, which reduced our power to detect associations with the less common marker combinations (i.e., TP53 wildtype-like/non-luminal and TP53 mutant-like/luminal).
Much of the current understanding of breast cancer etiology among African women derives from studies in African Americans (for which the genetic structure represents a mixture of African and non-African ancestry). To the best of our knowledge, this is the first study of risk factor associations with TP53 subtypes in an indigenous African population. Such studies are important, as Black women are known to have a higher frequency of TP53 mutations and p53 protein expression compared to White women.33-37 There is also evidence of differences in TP53 mutation type by race, with a higher proportion of nonsense and indel mutations for Black compared to non-Black cases.12 Molecular epidemiology studies in unscreened populations, such as the GBHS, are valuable because the cases likely constitute a more accurate reflection of the natural history of breast cancer in Black women. It has been suggested that screening can interrupt the study of the natural history of breast cancer by uncovering tumors that would have otherwise not come to clinical attention due to their inherent biology.38,39
In sum, using high-quality RNA expression data, we have shown that RNA-based TP53 status and PAM50 intrinsic subtype are useful breast tumor markers for describing etiologic heterogeneity. Cross-classification of these markers further refines the subtype-specific risk factor associations, which in turn could inform potential mechanisms by which targeted risk reduction could be achieved. To further characterize heterogeneity of breast cancer phenotypes and improve understanding of etiologic mechanisms in African-ancestry populations, additional studies integrating data on tumor (e.g., histologic grade, Ki67) and tumor microenvironment (e.g., immune, inflammation, wound repair) markers will be required. As we showed in the current study, aggressive tumor characteristics, e.g., grade, TP53, and PAM50, are highly correlated. Given the preponderance of high-grade tumors (which often precludes further stratification), as well as the challenges of procuring molecular assays among women in sub-Saharan Africa, further studies are warranted to uncover cost-effective markers with sufficient dynamic range to allow the identification of epidemiologically and clinically relevant breast cancer subtypes in this population. Owing to the heterogeneity of genetic structure and exposure profiles across African populations, the results of this study will need to be considered together with those from future studies in populations of other African regions.

Supplementary Material

Supplementary Material
Supplementary Material

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기