본문으로 건너뛰기
← 뒤로

Psychometric Properties of the Breast Cancer Awareness Measure (Breast-CAM): A Systematic Review and Meta-Analysis.

메타분석 1/5 보강
Cancers 📖 저널 OA 100% 2021: 20/20 OA 2022: 79/79 OA 2023: 89/89 OA 2024: 156/156 OA 2025: 683/683 OA 2026: 512/512 OA 2021~2026 2026 Vol.18(6)
Retraction 확인
출처

Fejer A, Atbaei MA, Zand A, Varjas T, Kiss Z

📝 환자 설명용 한 줄

: Breast cancer awareness is essential for early detection and timely help-seeking among women and represents a key component of multidisciplinary breast cancer prevention.

🔬 핵심 임상 통계 (초록에서 자동 추출 — 원문 검증 권장)
  • 연구 설계 systematic review

이 논문을 인용하기

↓ .bib ↓ .ris
APA Fejer A, Atbaei MA, et al. (2026). Psychometric Properties of the Breast Cancer Awareness Measure (Breast-CAM): A Systematic Review and Meta-Analysis.. Cancers, 18(6). https://doi.org/10.3390/cancers18060956
MLA Fejer A, et al.. "Psychometric Properties of the Breast Cancer Awareness Measure (Breast-CAM): A Systematic Review and Meta-Analysis.." Cancers, vol. 18, no. 6, 2026.
PMID 41899558 ↗

Abstract

: Breast cancer awareness is essential for early detection and timely help-seeking among women and represents a key component of multidisciplinary breast cancer prevention. The Breast Cancer Awareness Measure (Breast-CAM) is widely used to assess awareness of breast cancer symptoms, risk factors, and screening behaviors. Its measurement quality across populations has not yet been comprehensively evaluated. As Breast-CAM is a population-reported measurement instrument, evaluation using a standardized framework for measurement properties is required. This systematic review and meta-analysis aimed to assess the psychometric properties of the Breast-CAM across diverse populations and cultural adaptations, in accordance with COSMIN (COnsensus-based Standards for the selection of health Measurement INstruments) methodological standards. Major bibliographic databases and trial registries were systematically searched for peer-reviewed English-language studies published between 2010 and 2025 that evaluated at least one psychometric property of the Breast-CAM in adult women. Methodological quality was assessed using the COSMIN Risk of Bias checklist. Measurement properties were evaluated according to COSMIN criteria, and the certainty of evidence was graded using a modified GRADE approach. Meta-analysis was performed when data were sufficiently comparable. Seventeen studies met the inclusion criteria for narrative synthesis, of which eleven were included in a meta-analysis, representing fourteen cultural adaptations of the instrument. A descriptive random-effects meta-analysis of reported Cronbach's α yielded a pooled estimate of 0.89 (95% confidence interval 0.85-0.92). This value should be interpreted cautiously, as structural validity was frequently insufficient across cultural adaptations, limiting interpretation of internal consistency according to COSMIN guidance. Other measurement properties, including reliability and measurement error, were frequently inadequately assessed or unreported. The certainty of evidence ranged from very low to moderate. : Content validity was generally rated as sufficient, although certainty of evidence was low. Despite the high pooled α estimate, the reliability of Breast-CAM cannot be firmly established because structural validity was frequently insufficient across cultural adaptations. In accordance with the COSMIN ceiling rule, internal consistency was not considered sufficient in the absence of adequate structural validity. Key measurement properties, including test-retest reliability, measurement error, and responsiveness, were rarely evaluated. Further high-quality psychometric studies, particularly in culturally diverse populations, are needed to address these gaps and support appropriate use of the instrument in research and public health practice.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~87 KB · 영문

1. Introduction

1. Introduction
Breast cancer is the most common cancer among women worldwide, with ~2.3 million cases and ~670,000 deaths each year globally [1,2,3]. Rates are predicted to rise further to ~3 million cases and ~1 million deaths by 2040–2050 based on demographic projection models that account for population growth and aging [4]. Beyond its high incidence, breast cancer imposes a significant burden on global public health [3]. Despite substantial advances in breast cancer therapy, disparities in survival persist worldwide. These disparities are associated with differences in awareness, early detection, and access to timely diagnosis and treatment [5,6]. Breast cancer awareness and screening behaviors have been linked to earlier detection and earlier presentation [7]. Public knowledge of breast cancer warning signs, risk factors, and screening recommendations may contribute to effective prevention strategies, psychosocial support, and population-level cancer control.
Accurate measurement of awareness is essential for evaluating health promotion efforts, designing interventions, and informing policy. These aims require instruments that are not only psychometrically robust but also culturally appropriate. The Breast Cancer Awareness Measure (Breast-CAM), developed in 2009 by Cancer Research UK in collaboration with King’s College London and University College London [8], was created as a standardized tool to assess awareness of breast cancer symptoms, risk factors, and help-seeking behaviors. Since its introduction, Breast-CAM has been translated and adapted for use in multiple linguistic and cultural contexts. Yet, the quality and methodological consistency of these adaptations—particularly their reliability, structural validity, and responsiveness—have not been synthesized. Most existing evidence comes from single-country validation studies, limiting conclusions about the instrument’s consistency across settings. As a result, researchers and practitioners face uncertainty when selecting appropriate versions for use at local, regional, or international levels. To address this gap, this systematic review and meta-analysis aims to evaluate the psychometric properties of the Breast Cancer Awareness Measure (Breast-CAM) in adult women across diverse populations and settings. We specifically assessed reliability, validity, and responsiveness, and examined cultural adaptations for non-UK and multilingual populations. By synthesizing the psychometric evidence for the Breast Cancer Awareness Measure, this study supports multidisciplinary breast cancer research by informing prevention strategies, awareness-based interventions, and population-level early detection efforts.

2. Materials and Methods

2. Materials and Methods

2.1. Followed Guidelines
This systematic review was conducted and reported in accordance with the PRISMA 2020 statement for reporting systematic reviews [9] and the PRISMA-COSMIN for OMIs 2024 extension for systematic reviews of outcome measurement instruments [10]. PRISMA-COSMIN is a recently developed reporting guideline specifically intended for systematic reviews evaluating measurement properties of health-related instruments. The completed PRISMA-COSMIN checklist is provided as Supplementary Material (Supplementary File S1: PRISMA-COSMIN 2024 checklist for systematic reviews of outcome measurement instruments.) The database search strategy adhered to the PRISMA-S extension [11], which provides structured guidance for transparent reporting of search methods. Methodological appraisal and synthesis followed the COSMIN [12] framework, including use of the COSMIN Risk of Bias checklist [13] for assessing study quality, the COSMIN criteria for rating measurement properties, and the COSMIN-adapted GRADE approach to evaluating the certainty of evidence [12]. The study protocol was registered in PROSPERO (CRD420251158142).

2.2. Eligibility Criteria
Studies were eligible for inclusion if they met the following criteria. The target population consisted of adult women (≥18 years) from any country or setting. Eligible studies employed the Breast Cancer Awareness Measure (Breast-CAM), including the original instrument or any culturally adapted version. Quantitative study designs were included if they evaluated at least one psychometric property of the Breast-CAM or reported its cross-cultural adaptation. Psychometric properties of interest included reliability (internal consistency, test–retest or inter-rater reliability, and measurement error), validity (content, structural, construct, or criterion validity), and responsiveness, as defined by COSMIN guidelines. Studies published in English between January 2010 and October 2025 were considered eligible.
Studies were excluded if they involved children, adolescents, mixed-gender samples, or male or transgender participants. The Breast Cancer Awareness Measure was originally developed to assess breast cancer awareness among women. According to the instrument developers, a version designed to measure men’s awareness of breast cancer in women is currently under development [8]. The use of the Breast-CAM in populations other than women does not align with its intended and validated application. Publications were also excluded if they reported breast cancer awareness levels without assessing measurement properties or adaptation procedures, or if they were qualitative studies, narrative reviews, conference abstracts, or lacked an accessible full text. Studies not employing the Breast-CAM or using unrelated awareness measures were excluded. Non-English language publications were excluded to ensure accurate COSMIN-based assessment of psychometric properties, as some such studies lacked sufficient methodological detail and reliance on translated data could have affected interpretation. Multiple culturally adapted versions of the Breast-CAM from diverse linguistic and regional populations were included in this review. However, limiting the review to English-language publications may still have introduced language bias. [14] Validation studies published only in local languages may have been missed. Conclusions regarding global cross-cultural validity should be interpreted with caution. This decision was considered necessary to maintain methodological rigor. Studies that did not report sufficient psychometric or adaptation data to allow extraction or COSMIN evaluation were excluded.

2.3. Information Sources/Electronic Databases
A comprehensive search was conducted across multiple electronic databases and trial registries to identify studies evaluating the psychometric properties of the Breast-CAM. The following databases were systematically searched between 15 September and 15 October 2025, with coverage from January 2010 to 15 October 2025: PubMed/MEDLINE, Scopus, the Web of Science Core Collection, Embase, CINAHL (via EBSCOhost), the Cochrane Library, and Google Scholar. Trial and study registries: ClinicalTrials.gov, the WHO International Clinical Trials Registry Platform (ICTRP), the ISRCTN Registry, and the EU Clinical Trials Register (EU-CTR). The final database and registry searches were completed on 15 October 2025 (Table 1).

2.4. Additional Sources
Preprints identified during the search were screened but were not eligible for inclusion because they did not report psychometric properties of the Breast-CAM. All information sources and search dates were documented in line with PRISMA-S [11] and PRISMA-COSMIN OMI 2024 guidelines [10].

2.5. Search Strategy
We searched using customized strategies for each source. The following section outlines the exact terms, filters, and limits used to capture all relevant Breast-CAM studies. The search strategy was developed and independently reviewed by two members of the research team to identify empirical studies evaluating the psychometric properties of the Breast Cancer Awareness Measure (Breast-CAM). Elements of the PICO framework were adapted to structure the search strategy, focusing on the target population, construct, and outcome measurement instrument: (“breast cancer”/exp OR “female”/exp) AND (“bcam” OR “breast cancer awareness measure” OR “breast cam” OR “validated questionnaire”/mj OR “questionnaire”/exp OR “tool”/exp) AND (“breast cancer awareness” OR “breast cancer knowledge”) AND [2010–2025]/py. Reference lists of included studies and citation tracking were screened to ensure that key relevant studies were captured.

2.6. Database Search Overview
Searches were performed in PubMed/MEDLINE, Scopus, Web of Science, Embase, CINAHL (via EBSCOhost), the Cochrane Library, and Google Scholar (screening the first 200 results ranked by relevance) using customized strategies for each platform, accounting for differences in indexing and syntax. This limit was applied because Google Scholar ranks results by relevance, and relevant studies are typically concentrated within the initial results. Comprehensive searches were already conducted in multiple bibliographic databases to ensure broad coverage. Each database was searched for peer-reviewed studies published in English between January 2010 and 15th of October 2025, focusing on adult women. Full search strings and database-specific adaptations are provided in Supplementary File S2. Search strategy to ensure transparency and reproducibility.

2.7. Selection Process
All search results were first imported into Zotero, where duplicate records were identified and removed. De-duplicated records were then exported to ASReview, (Active Learning for Systematic Reviews) [15] as a screening support tool during the title and abstract screening phase. ASReview was applied after duplicate removal to assist prioritization of records based on relevance predictions. An initial training set was created by manually labeling a subset of records as relevant or irrelevant based on the predefined eligibility criteria. The model was updated as screening progressed, using additional records screened and labeled by the reviewers, allowing gradual refinement of relevance predictions. Screening was conducted using the default machine-learning model with human-in-the-loop active learning, and all inclusion and exclusion decisions were made by reviewers, not by software. In total, 1229 records (after duplicate removal) were screened with the support of ASReview, as shown in the PRISMA flow diagram. The tool was used only to prioritize the order of review, while all inclusion and exclusion decisions were made by the reviewers. Screening continued until no further potentially relevant records were identified, and included studies were checked against the screened set to ensure that relevant studies were not missed.
Screening was conducted in two stages:Title and abstract screening:

Two reviewers (from a team of five) independently screened titles and abstracts within ASReview. The tool was used only to prioritize records; all inclusion and exclusion decisions were made by the reviewers based on the predefined eligibility criteria. Disagreements were resolved through discussion.
2.Full-text screening:

The same two reviewers independently assessed full-text articles for eligibility. A third reviewer was available to adjudicate unresolved disagreements, although consensus was achieved in most cases.
The eligibility criteria described above were applied consistently. ASReview was used solely to support efficiency and did not replace independent reviewer judgment or predefined eligibility criteria.

2.7.1. Data Collection Process
Data extraction was carried out using a structured form developed in line with COSMIN recommendations [12]. The form was piloted on four representative studies (two describing development or cultural adaptation of the Breast-CAM and two evaluating its measurement properties) to ensure that all relevant domains were captured; minor revisions were made for clarity.
Two reviewers independently extracted data from each included study, recording:Study design and setting;

Population characteristics;

Version and language of the Breast-CAM;

Measurement properties assessed;

Statistical methods used and numerical psychometric outcomes.

Extraction followed PRISMA-COSMIN OMI reporting guidance [10]. Discrepancies were resolved by discussion, with a third reviewer available if needed. Formal inter-rater agreement statistics (e.g., Cohen’s kappa) were not calculated. No automation or AI tools were used for data extraction. All data was managed in Microsoft Excel.

2.7.2. Data Items
From each included study, we extracted:Study characteristics: First author, year, country, design (development, translation, validation), and sample size.

Population characteristics: Key demographics (e.g., age range, recruitment setting) and inclusion criteria where available.

Instrument details: Breast-CAM version, language, and any reported adaptation procedures.

Measurement properties and results: Internal consistency, test–retest reliability, construct validity, responsiveness, statistical methods, and main findings (α, ICC, factor loadings, hypothesis-testing results).

2.8. Study Risk of Bias Assessment
The methodological quality of the included studies was assessed using the COSMIN Risk of Bias checklist [13] for studies on measurement properties, in line with PRISMA-COSMIN OMI 2024 guidance [10].
Two reviewers independently evaluated each study across all relevant measurement properties (e.g., content validity, structural validity, internal consistency, reliability, measurement error, hypothesis testing, responsiveness, cross-cultural validity). Each property was rated using the COSMIN four-point scale (very good, adequate, doubtful, inadequate), applying the “worst score counts” principle [13]. When a study reported several measurement properties with different quality ratings, each property was assessed separately according to COSMIN criteria. Differences between reviewers were resolved through discussion, with a third reviewer consulted if needed. The final rating for each property was based on the reviewers’ agreement.
All ratings were recorded in a structured Excel sheet (Supplementary File S3—COSMIN Risk of Bias and Measurement Properties Extraction Sheet), which included study characteristics, Breast-CAM version, psychometric properties evaluated, COSMIN ratings per property, and an overall risk of bias judgment. No automation or AI was used in this step.

Measurement Properties
In line with COSMIN methodology [12,16], content validity reflects the degree to which items adequately represent the breast cancer awareness construct for the target population. Structural validity assesses whether the dimensional structure of the instrument corresponds to the construct being measured. Internal consistency evaluates the interrelatedness of items within a unidimensional scale, while reliability reflects score stability in stable individuals over time. Measurement error quantifies random and systematic error not attributable to true change. Construct and criterion validity assess whether scores behave as theoretically expected in relation to other variables or a gold standard, respectively. Responsiveness evaluates the ability of the instrument to detect meaningful change over time, and cross-cultural adaptation examines equivalence across language and cultural versions [12].
Each measurement property of the Breast-CAM was evaluated in accordance with COSMIN standards for outcome measurement instruments [12]. For each included study, we examined evidence for internal consistency, reliability, structural validity, content validity, and hypothesis-testing construct validity. Measurement results were judged using the COSMIN criteria for good measurement properties, applying thresholds such as Cronbach’s α or intraclass correlation coefficients (ICC) ≥ 0.70 [12], appropriate model-fit indices for CFA or IRT models, and adequate factor loadings for structural validity. While α values ≥ 0.70 were considered acceptable, values exceeding 0.95 were interpreted with caution, as very high internal consistency may suggest item redundancy. Based on these criteria, each finding was classified as sufficient (+), insufficient (−), or indeterminate (?), following COSMIN decision rules [12,17].
Two reviewers independently assigned ratings, resolving disagreements through discussion; a third reviewer was available when consensus could not be reached. Ratings were synthesized by measurement property to provide an overall picture of the evidence base. Where quantitative pooling was not feasible due to methodological heterogeneity or incomplete reporting, results were narratively synthesized instead. The certainty of evidence for each measurement property was subsequently evaluated using the COSMIN-adapted GRADE approach, considering risk of bias, inconsistency, imprecision, and indirectness. All evaluations and grading decisions were performed manually to ensure transparent, context-appropriate interpretation. The Summary of Measurement Properties (SOMP) and COSMIN Rating Criteria are presented in Table 2.

2.9. Synthesis Methods
After full-text inclusion, studies were grouped according to the measurement properties they reported (e.g., internal consistency, structural validity, reliability, content validity). Only studies providing empirical data suitable for COSMIN-based rating were included in the synthesis for that property.
Studies that evaluated multiple properties contributed to each relevant synthesis. Where studies described properties qualitatively (e.g., “reliable” or “valid” without numerical data), they were included in narrative summaries but excluded from quantitative pooling. This approach followed COSMIN recommendations to ensure that each synthesis was based on analyzable and methodologically acceptable data [12].
For measurement properties with sufficient homogeneous quantitative data, we conducted a random-effects meta-analysis using the metafor package (version 4.8.0) in R (version 4.5.1; R Foundation for Statistical Computing, Vienna, Austria). [18]. For internal consistency, Cronbach’s α coefficients were pooled using appropriate variance estimation methods (e.g., Bonett’s approach), and 95% confidence intervals were obtained from random-effects models [19,20]. Cronbach’s α is influenced by the number of items and the underlying factor structure. Accordingly, values from different Breast-CAM versions may not be directly comparable and were interpreted cautiously.
Statistical heterogeneity was examined using Q, τ2, and I2. When pooling was not feasible due to heterogeneity in methods, populations, or Breast-CAM versions, we used narrative synthesis. In these cases, we summarized: (a) direction and magnitude of findings, (b) COSMIN Risk of Bias ratings, and (c) sufficiency ratings (+/−/?). Certainty of evidence for each measurement property was assessed using the COSMIN-modified GRADE approach [12].
Where sufficient studies were available, we explored potential sources of heterogeneity qualitatively, considering differences in language versions, sample characteristics, and study design. Because of the limited number of studies per subgroup, formal subgroup meta-analysis was not performed.
To assess robustness, we conducted leave-one-out sensitivity analyses for the internal consistency meta-analysis, re-estimating pooled Cronbach’s α after sequentially removing each study. This tested whether any single study had a disproportionate influence on summary estimates.

2.10. Certainty Assessment
We assessed the certainty of the evidence for each measurement property of the Breast-CAM using the COSMIN-modified GRADE approach [12]. This method evaluates four domains: risk of bias, inconsistency, imprecision, and indirectness. Risk of bias was judged using the COSMIN Risk of Bias checklist [13], focusing on the appropriateness of psychometric methods, sample adequacy, and adherence to recommended validation procedures. Inconsistency was assessed by comparing the direction and magnitude of results across cultural versions and analytical approaches. Imprecision was evaluated according to total sample size and the width of confidence intervals, with higher certainty assigned to results supported by larger and more precise estimates. Indirectness was considered when study populations or validation contexts differed from the intended application of the Breast-CAM as an awareness tool.
Certainty ratings were assigned at the level of each measurement property, not at the study level, and were expressed as high, moderate, low, or very low. Following COSMIN guidance, the ceiling rule was applied such that the certainty of evidence for internal consistency did not exceed that of structural validity. This reflects the principle that reliability is interpretable only when dimensionality is adequately supported, regardless of the magnitude of pooled Cronbach’s α values [12]. Evidence was downgraded when one or more domains displayed serious limitations. Two reviewers performed the GRADE assessments independently; disagreements were resolved by discussion, with a third reviewer available when necessary. No deviations from the COSMIN framework were introduced. Table 3 presents the COSMIN-modified GRADE decision rules used to rate the certainty of evidence for each measurement property.

2.11. Formulating Recommendations
Recommendations on the use of the Breast Cancer Awareness Measure (Breast-CAM) were developed using a predefined framework. This framework linked the certainty of evidence for each measurement property to its intended use. Certainty was assessed using the COSMIN-modified GRADE approach [12].
Following COSMIN guidance [12], measurement properties with high, moderate, or low certainty evidence were considered suitable for population-level applications. These included awareness monitoring and evaluation of public health interventions. Measurement properties supported by very low certainty evidence were not considered suitable for individual-level use. This included clinical decision-making and the assessment of change over time. A summary of recommended applications of the Breast-CAM based on certainty of evidence is presented in Table 4.
Final recommendations were based on three factors: the certainty of evidence, the specific measurement property, and the intended context of use. This approach ensured consistency with COSMIN standards for outcome measurement instruments.

3. Results

3. Results

3.1. Study Selection
A total of 1446 records were identified across all database searches and supplementary sources. After removing duplicates, 1229 records were screened by title and abstract. A total of 115 full-text articles were assessed for eligibility using predefined COSMIN criteria [12].
Following full-text screening, 17 study reports met the inclusion criteria, each providing empirical evidence on at least one psychometric property of the Breast Cancer Awareness Measure (Breast-CAM). These studies represented a range of cultural and linguistic adaptations, including versions from the UK, Oman, Kenya, Iran, China, Malaysia, Pakistan, Turkey, Greece, Brunei, Sudan, and Palestine. A LitMaps visualization illustrating conceptual linkages across the 17 included studies (e.g., cultural adaptation methods, measurement properties assessed, and study designs) is provided in Figure 1 to support visualization of the evidence landscape.
Of the 17 included studies, 11 reported Cronbach’s α estimates that were sufficiently comparable to allow a descriptive random-effects meta-analysis of reported α values. Estimates were considered comparable when they referred to the total scale or clearly defined subscales of the Breast-CAM and were calculated in adult female populations. They also needed to include sufficient statistical information (e.g., sample size) to permit variance estimation. The remaining studies contributed to the narrative syntheses of other measurement properties such as structural validity, content validity, reliability, construct validity, responsiveness, and measurement error.
All included study reports evaluated the Breast-CAM or one of its culturally adapted versions. No alternative OMIs met the inclusion criteria for this review, which focused on the Breast-CAM and its adaptations.
A PRISMA flow diagram [9] summarizing the search, screening, and inclusion process is presented in Figure 2.
A total of 88 full-text articles were excluded for predefined reasons. A complete list of excluded articles and reasons for exclusion is provided in Supplementary Material (Supplementary Appendix S1).

3.2. OMI Characteristics
The included studies used multiple culturally adapted versions of the Breast Cancer Awareness Measure (Breast-CAM), all derived from the original UK instrument. Versions were translated and adapted to local languages and contexts and were administered as self-report questionnaires in community or healthcare settings. Wording, format, and item numbers varied slightly across versions. The core domains were similar and generally covered symptom recognition, risk factors, screening awareness, and barriers to help-seeking.
Many adaptations used recognized translation and cultural-adaptation procedures, such as forward–backward translation, expert review, and pilot testing [37]. In some cases, documentation was incomplete and these procedures were not always described in detail.
Samples in the development or validation studies generally included adult women recruited from community, clinical or screening settings. These instruments were designed for population-level assessment. They are useful for describing and comparing patterns of breast cancer awareness across different domains, but not for precise interpretation of individual scores. This is consistent with COSMIN guidance on reliability and interpretability requirements and established measurement principles [12,38].
Key characteristics of each version, including settings, translation procedures, and assessed psychometric properties, are summarized in Table 5.

3.3. Interpretability Aspects for Each Included OMI
Across the 17 included studies, interpretability information for the various BCAM versions was limited and often inconsistently reported. Most studies described item-level percentages of correct recognition for symptoms, risk factors, or screening behaviors rather than providing summary scores, measures of score distribution, or thresholds for meaningful improvement [13]. Specifically, no studies established minimal important change (MIC) or minimal important difference (MID) values, preventing interpretation of whether observed score differences represent meaningful change. Only a few adaptations, such as the Persian and Chinese versions, included numeric score distributions (e.g., means, medians, IQRs) that allow a clearer understanding of where respondents typically scored.
Missing data reporting was generally poor. Most authors mentioned high completion rates, but only a minority provided exact percentages. Formal floor or ceiling analyses were rare. In several studies, classical symptoms showed high recognition (ceiling-like patterns), while age-related risk or less common signs showed very low recognition (floor-like patterns). Consequently, the ≥15% threshold recommended by COSMIN could not be consistently applied because sufficient quantitative data were usually not reported [12,38].
Change-score interpretability was available only for intervention studies, which reported improvements after educational programs or mobile-app exposure. None of the included studies established a minimally important change (MIC) or minimally important difference (MID) [12]. No study attempted to define patient-important thresholds, responder definitions, or cut-points for interpretation of total or domain score.
Interpretability evidence across settings remains limited and inconsistent, with most Breast-CAM versions reporting only descriptive percentages rather than the structured interpretability metrics recommended by COSMIN [12]. Future validation studies should determine minimal important change (MIC) or minimal important difference (MID) values to allow meaningful interpretation of score levels and changes over time. Interpretability results for each Breast-CAM version are presented in Table 6.

3.4. Feasibility Aspects for Each Included OMI
Across the 17 studies, all BCAM versions were feasible to use in both research and community settings, consistent with feasibility considerations recommended for outcome measurement instruments [38,39]. Most instruments were delivered as paper questionnaires, either self-completed or interviewer-administered when literacy was limited. This flexibility made the measures usable across a wide range of populations, including older adults in the UK, women with low literacy in Kenya and Oman, and rural groups in Sudan and Palestine. Completion was consistently high, and none of the studies reported major difficulties in understanding or responding to items.
The length of the instruments varied only slightly. Full BCAM versions included symptoms, risk-factor, age-risk and help-seeking blocks, while several adaptations used shorter modules tailored to local education or intervention programs. Despite these differences, all versions remained manageable and were considered appropriate for routine data collection.
Response formats were simple. Nearly all questionnaires relied on yes/no/don’t know or brief multiple-choice options, with occasional Likert-type questions. No study reported the need for special mental or physical abilities beyond basic comprehension or the ability to respond orally during interviews. Interviewer administration was especially valuable in settings where literacy was variable.
Scoring procedures were straightforward. Many studies used summed scores or simple counts of correct responses, while others relied only on item-level percentages. Psychometric studies that used derived domain scores or 0–100 transformations described these procedures clearly, and none required specialized software for routine scoring.
All adaptations acknowledged Cancer Research UK as the original source of the BCAM. No copyright restrictions affected feasibility.
The BCAM and its international adaptations were practical, acceptable and easy to administer across diverse populations and study designs. Feasibility results for each Breast-CAM version are presented in Table 7.

3.5. Study Characteristics
Seventeen studies met the inclusion criteria and reported at least one measurement property of BCAM or its adapted versions. The original BCAM was developed in the United Kingdom to assess women’s awareness of breast cancer symptoms [8]. It was later used without modification in an intervention trial with older women [22].
Several studies focused on adapting BCAM to local cultural and linguistic contexts. A full Arabic adaptation was completed in Oman [24], using forward–back translation, expert review, and pilot testing. The validated Arabic version was then used in community samples [23] and applied in tailored health education programs [25]. Comparable translation and validation processes were carried out for Swahili in Kenya [26], Persian in Iran [27], Mandarin Chinese in China [28], Malay in Malaysia [30] and Turkish in Turkey [33].
Other studies used BCAM as an outcome tool rather than re-evaluating its psychometric performance [22,29,32]. A smaller group of studies introduced localized modifications to fit regional contexts, such as the Sudanese adaptation [34], a Greek version developed for rural communities [35], and a modified Arabic version used in a nationwide Palestinian survey [36].
Across all studies, most instruments were administered to adult women in community, clinic, screening, or digital health settings. Only a subset reported quantitative measurement properties suitable for meta-analysis (e.g., Cronbach’s α with sufficient structural validity). The full characteristics of each included study, including adaptation method, sample details, and psychometric evidence, are summarized in Table 8.

3.6. Assessment of Risk of Bias in Studies
Risk of bias assessments showed considerable variation across the included studies. Instruments that performed a full psychometric evaluation generally demonstrated stronger methodological quality. The Persian BCAM [27] and the Turkish BCAM-Tr [33] received very good ratings across multiple domains, reflecting the use of established structural validity procedures, high internal consistency within confirmed factor structures, and appropriate test–retest reliability. The Chinese C-BCAM [28] validation article also showed a solid methodological foundation, although gaps remained in responsiveness and measurement error.
A second group of studies focused primarily on translating and adapting the Breast-CAM for local use, with more limited evaluation of its measurement properties. These studies—including the Arabic adaptations [23,24,25] and the Swahili BCAM validation [26]—demonstrated that the instrument was culturally and linguistically appropriate and provided evidence for internal consistency and, in some cases, structural validity. The key properties such as measurement error and responsiveness were rarely examined, and reliability testing was applied inconsistently across studies. As a result, methodological quality ratings for these studies generally fell within the adequate to doubtful range.
The weakest evidence emerged in studies that modified BCAM items or used translated versions without re-establishing psychometric characteristics. Modified Arabic [34] and Greek [35] versions, for example, were implemented after small pilots or expert opinion but lacked structural validation and repeated measures, resulting in predominantly inadequate ratings. Across the dataset, measurement error and responsiveness were the least frequently reported domains, and reliability was inconsistently assessed outside comprehensive validation studies. Hypothesis testing and cross-cultural adaptation showed the most variability, with higher ratings when predefined expectations were evaluated rather than exploratory associations.
Methodological quality tended to align with study purpose: validation studies were stronger, while applications of BCAM as an outcome tool or context-specific modifications without re-assessment showed greater risk of bias. The COSMIN risk-of-bias ratings for studies that reported evaluable measurement properties are shown in Table 9.

3.7. Results of Individual Studies
Structural validity evidence was mainly available from the subset of studies that conducted factor analysis and reported fit indices. The Persian BCAM [27] provided the most comprehensive evidence, combining exploratory and confirmatory factor analyses with excellent model fit (e.g., RMSEA = 0.046, CFI = 0.984). The Chinese [28] and Turkish [33] versions also demonstrated acceptable factor structures and fit statistics, supporting dimensional stability. Versions that were translated and implemented without testing latent structure—such as the Omani [23,24,25], Sudanese [34], and Greek [35] adaptations—were rated insufficient for structural validity regardless of their reliability values. This follows the COSMIN rule that internal consistency cannot be judged without evidence that items form a unidimensional factor.
Internal consistency was the most consistently supported measurement property across the included BCAM adaptations. Most instruments demonstrated alpha values above the COSMIN threshold of 0.70, which indicates generally high reported internal consistency estimates within individual versions. The Persian [27], Turkish [33], Malay [30], and Chinese BCAM [28,29] versions stood out, with Cronbach’s α values typically ranging from 0.84 to 0.94, indicating strong internal coherence across items and subscales. The Swahili BCAM [26] was the exception: α values differed meaningfully between domains. Knowledge items clustered well (α ≈ 0.80), but the “external barriers” domain produced a lower coefficient (α ≈ 0.60). Because COSMIN recommends domain-specific assessment when structural assumptions are unclear, this pattern was rated as indeterminate rather than insufficient.
Reliability evidence, although less frequently reported than internal consistency, supported acceptable stability in several versions. The Persian [27] and Turkish [33] BCAMs demonstrated strong test–retest reliability (ICC ≈ 0.84–0.89), and the Arabic-BCAM-A validation study [25] demonstrated high inter-rater reliability (r = 0.97). Other adaptations did not include repeated-measures testing, resulting in indeterminate ratings for this property.
Measurement error was directly quantified in one instrument (Persian BCAM [27]), where SEM and SDC were reported, however, MIC was not established. Measurement error remained indeterminate under COSMIN.
Construct validity varied widely. Several studies relied on demographic regressions or post hoc associations (e.g., age, education, income), which do not meet COSMIN criteria because they do not evaluate predefined hypotheses about the direction or magnitude of relationships. Such exploratory approaches were rated as indeterminate. Only versions that explicitly compared theoretically distinct groups—such as the Turkish [33] known-groups analysis—or evaluated discriminant and convergent validity, as in the Persian BCAM [18], were rated sufficient.
Criterion validity, evidence was limited and based on indirect comparators. The Persian BCAM [27] demonstrated strong ability to distinguish between medical experts and the general population (AUC = 0.822), and the original Arabic adaptation [24] showed meaningful correlations with established indicators.
Responsiveness was limited; only the Malay BCAM used in a mobile-app intervention [32] demonstrated pre–post changes.
Cross-cultural adaptation procedures were generally well executed across the translated versions. Most studies applied internationally accepted translation frameworks—such as forward–back translation, expert panel review, and cognitive interviewing—as seen in the Arabic, Persian, Chinese, Malay, Turkish, and Greek adaptations. These procedures generally fulfilled COSMIN criteria for cultural equivalence. Adaptations relying mainly on expert review, showed lower methodological robustness, resulting in adequate to very good ratings. A few studies relied on expert review alone (e.g., Sudanese adaptation), which led to lower ratings due to limited pilot testing or lack of cognitive debriefing.
The reported results for each measurement property, as assessed in the included studies, are summarized in Table 10.

3.8. Results of Synthesis

3.8.1. Reliability (Internal Consistency, Test–Retest Reliability, Measurement Error)
A random-effects meta-analysis was performed on 11 Breast-CAM studies that reported Cronbach’s α estimates and provided sufficiently comparable data for quantitative synthesis. Where a single publication reported more than one independent internal consistency estimate, each estimate was entered separately in the meta-analysis. The pooled Cronbach’s α was 0.89 (95% CI 0.85–0.92). In line with COSMIN guidance, this pooled estimate was not interpreted as evidence of sufficient internal consistency because structural validity was frequently insufficient or inconsistently assessed across studies. Differences in item composition, adaptation procedures, and study populations further contributed to heterogeneity [40] (Figure 3). Individual α values ranged from 0.82 to 0.96, indicating generally high reported internal consistency estimates across studies. Substantial heterogeneity was observed (I2 = 97.2%; Q = 250.2, p < 0.001; τ2 = 0.055), indicating very high between-study variability. This suggests that differences in cultural adaptations, item composition, and study populations contributed meaningfully to variation in the reported α estimates. The 95% prediction interval (0.74–0.95) reflects variability in reported α values across studies. These values should not be interpreted as confirmation of adequate internal consistency in the absence of confirmed structural validity. Funnel plot inspection and trim-and-fill analysis showed no evidence of small-study effects. Interpretation is limited by the small number of studies and substantial heterogeneity. The pooled estimate remained unchanged (Figure 4). In line with COSMIN guidance, internal consistency was rated as indeterminate (?), because structural validity was insufficient or inconsistently assessed across versions. Applying the COSMIN ceiling rule, the certainty of evidence for internal consistency did not exceed that for structural validity and was judged as low. The pooled estimate represents a descriptive summary of reported α values rather than definitive evidence of sufficient internal consistency across Breast-CAM versions.
Each point represents the study-specific α estimate with its 95% confidence interval, scaled by inverse-variance weight. Multiple independent estimates from a single study are shown separately. The pooled random-effects estimate (α = 0.89, 95% CI: 0.85–0.92) is indicated on the summary line.
The results of the meta-analysis are summarized in Table 11 which presents the pooled effect size, heterogeneity indices, COSMIN overall rating, and GRADE certainty assessment.
At the study level, most Breast-CAM adaptations reported α ≥ 0.70 at scale or subscale level, including the Persian [27], Turkish [33], Malay [32], Chinese [28,29] and several Arabic versions [23,24,34] which often showed coefficients between 0.84 and 0.94. An exception was the Swahili [26] “external barriers” subscale, where α was around 0.60, while symptom domains remained acceptable. In versions where structural validity was not assessed, COSMIN recommends treating internal consistency findings as indeterminate rather than definitively sufficient, which is reflected in the summary ratings.
Evidence for test–retest reliability was more limited. Only a small number of versions—including the Persian [27], Turkish [33], and Chinese [28] Breast-CAM—reported test–retest metrics, typically with ICC or correlation coefficients ≥ 0.70 over appropriate intervals [41]. Sample sizes for retest subsamples were modest, and not all studies clearly prespecified intervals or conditions of administration. Other versions reported inter-rater correlations or cross-group comparisons under the label of “reliability,” which does not meet COSMIN criteria for test–retest reliability. When synthesized, reliability beyond internal consistency was rated as insufficient or indeterminate, and the certainty of evidence was judged very low, mainly due to imprecision, small sample sizes, and methodological limitations.
Measurement error was directly quantified in only one instrument (the Persian Breast-CAM [27]), where SEM and SDC statistics were reported but no minimal important change (MIC) was defined [38]. Without MIC, COSMIN recommends classifying measurement error as indeterminate. For all other versions, measurement error was either not estimated or only partially addressed through floor and ceiling effects. As a result, measurement error was rated indeterminate (?) overall, with very low certainty of evidence.

3.8.2. Validity (Content Validity, Structural Validity, Construct Validity, Criterion Validity)
Across versions, content validity was generally supported but unevenly reported. Most adaptations employed formal translation and cultural adaptation procedures, such as forward–backward translation, expert panel review, and small-scale pilot or cognitive testing. Some versions (e.g., Turkish [33], Chinese [28], Persian [27]) additionally reported item- and scale-level content validity indices (I-CVI, S-CVI [42] with values in the acceptable or excellent range. Other adaptations described the process more briefly, without quantitative indices or user-testing details. When synthesized, content validity for the Breast-CAM was judged sufficient (+) at the overall instrument level. However, the certainty of evidence was low, reflecting indirectness due to limited involvement of target users in some settings and incomplete reporting across several studies.
Structural validity was systematically evaluated in only a subset of Breast-CAM versions. The Persian [27] and Chinese [28] instruments provided the most rigorous evidence, combining exploratory and confirmatory factor analyses with acceptable fit indices (e.g., RMSEA ≤ 0.06; CFI ≈ 0.94–0.98) [43] and coherent factor structures for warning signs, risk factors, and barriers. The Swahili BCAM [26] supported a plausible factor solution through EFA, and the Turkish version [33] reported CFA results for an 11-item one-factor model with fit indices around conventional thresholds. Most other adaptations, including Arabic [23,24], Pakistani [31], Sudanese [34], and Greek versions [35], did not evaluate latent structure beyond face or content evaluation. When considered together, these findings led to an overall COSMIN rating of insufficient (−) for structural validity, with low certainty, driven by the small number of high-quality factor-analytic studies and the absence of dimensional testing in many versions.
Findings for construct validity (hypothesis testing) were mixed. A few versions of the Breast-CAM, such as the Persian [27] and Turkish [33] adaptations, evaluated predefined hypotheses related to discriminant or known-groups validity [44]. These studies compared experts with lay women or health professionals with the general population and generally confirmed the expected differences or associations. Many studies relied on post hoc associations with demographic variables (e.g., age, education, income, previous contact with breast cancer) without formally stating a priori hypotheses or expected effect sizes. Under COSMIN criteria, such exploratory analyses cannot be rated as clear evidence of construct validity. Construct validity was synthesized as inconsistent or indeterminate (±/?), with low certainty of evidence due to inconsistent methods and a lack of prespecified hypotheses in most studies.
Criterion validity was rarely examined, and no true gold standard for breast cancer awareness exists. The Persian Breast-CAM [27] reported an AUC of 0.822 [45] in discriminating experts from general women, and one Arabic version [24] showed moderate correlations with an external criterion (r ≈ 0.58). The comparators used were not universally established standards, and such analyses were not replicated across settings. As a result, criterion validity was rated as insufficient (−) or indeterminate (?), with very low certainty. This limits confidence in the Breast-CAM’s performance when compared with external benchmarks.
Cross-cultural adaptation [46] processes, while not graded as a measurement property, were generally adequate in most versions and form an important foundation for interpreting the validity evidence. The uneven application of cognitive interviewing, user involvement, and pilot testing across settings contributed to the downgrading of certainty for several validity domains.

3.8.3. Evidence on Responsiveness
Evidence for responsiveness of the Breast-CAM was limited to a small number of studies. The original UK intervention [21] format and the Malay app- or intervention-based versions [32] reported pre–post increases in awareness scores following educational programs, with changes in the expected direction. These results suggest that the instrument can detect change at the group level in the context of structured awareness interventions [12]. Designs were typically single-group pre–post or involved limited follow-up periods, and no study established MIC values or systematically linked change scores to meaningful behavioral outcomes (e.g., help-seeking, screening uptake) which limits interpretability of responsiveness under COSMIN standards [12,38].
When synthesized, responsiveness was therefore rated sufficient (+) in specific intervention contexts but supported by low certainty of evidence overall, due to imprecision, indirectness, and limited design diversity. The Breast-CAM appears suitable for evaluating changes in awareness in research or program settings, but the responsiveness evidence is not yet strong enough to guide individual-level interpretation of change.

3.9. Study-Level Evidence Synthesis
The detailed study-level ratings for each measurement property are presented in Supplementary Table S1, which summarizes the psychometric properties of individual BCAM versions in accordance with COSMIN recommendations. For each version, we report only those measurement properties that were empirically assessed, without imputing missing domains. For multidimensional instruments (e.g., Persian [27] and Chinese versions [28,29]), results are presented at the subscale level to ensure conceptual clarity and appropriate interpretation. Studies are organized chronologically (2016–2024) to illustrate the evolution of BCAM adaptation, structural models, and methodological rigor across cultural contexts. In versions where specific measurement properties were not evaluated, the table uses “–” to distinguish missing evidence from insufficient findings. COSMIN overall ratings (+/–/±/?) and PROM-adapted GRADE levels reflect synthesis per measurement property, preserving transparency in the evaluation of each domain [10,12,13].

3.10. Result of Inconsistency
No formal subgroup or meta-regression analyses were carried out. Although studies reported factors such as age, education, and setting, these characteristics were closely tied to specific Breast-CAM versions, making statistical comparisons inappropriate. Instead, we examined heterogeneity descriptively [47]. Versions that included proper structural validity testing (e.g., the Persian [27], Chinese [29], and Turkish [33] adaptations) showed more stable reliability estimates, while versions without factor analysis produced more variable results. Instruments adapted through rigorous translation and expert review also performed more consistently than minimally modified translations. Some variation was linked to sample types—large community samples tended to yield more stable findings than small groups such as students or app users. Differences between studies appeared to reflect mainly from methodological variation rather than true differences between population subgroups.

3.11. Sensitivity Analyses
A leave-one-out sensitivity analysis was performed to examine the robustness of the pooled internal consistency estimate [47]. Removing each study in turn led to only small changes in Cronbach’s α, which remained between 0.87 and 0.90 compared with the full-model estimate of 0.89. This indicates that no single study had a meaningful influence on the pooled result [48,49]. The substantial heterogeneity therefore likely reflects real differences between Breast-CAM versions rather than instability of the meta-analysis (Figure 5).
Each point represents the recalculated pooled reliability estimate after removal of one study from the meta-analysis. The dashed line indicates the original pooled α (0.89). Minimal variation in α across iterations (0.87–0.90) suggests that no single study unduly influenced the overall estimate, supporting the robustness of internal consistency findings.

3.12. Heterogeneity Exploration
High heterogeneity was observed [48] in the meta-analysis of internal consistency. Differences in item content likely contributed to this variability. Variability was observed across Breast-CAM versions in item content, scale length, and response formats. Cultural adaptation procedures also differed, and study populations and settings were heterogeneous. Because of the small number of included studies and the methodological diversity [47] of the instruments, subgroup or meta-regression analyses were not conducted. Heterogeneity was therefore examined narratively and through sensitivity analyses. These analyses showed that no individual study had a meaningful influence on the pooled estimate.

3.13. Certainty of Evidence
The certainty of evidence for each measurement property of the Breast-CAM was assessed using the COSMIN-modified GRADE framework. Ratings were applied at the level of the synthesized measurement property rather than individual studies. Four domains—risk of bias, inconsistency, imprecision, and indirectness—were evaluated, and downgrading occurred when any domain showed serious limitations. As recommended by COSMIN, the ceiling rule was applied, meaning the certainty rating for internal consistency could not exceed that of structural validity [10,12]. A summary of all certainty ratings is presented in Table 12.

3.13.1. Internal Consistency
Reported Cronbach’s α values for the Breast-CAM were generally high [40] across culturally adapted versions. The pooled estimate was 0.89 (95% CI 0.85–0.92), exceeding commonly used thresholds. Because structural validity was insufficient or inconsistently assessed across versions, internal consistency was rated as indeterminate under COSMIN criteria. Certainty of evidence was downgraded for inconsistency—largely due to versions without structural validity assessment—and for methodological heterogeneity. In line with the COSMIN ceiling rule, the certainty of evidence for internal consistency did not exceed that for structural validity.

3.13.2. Structural Validity
Evidence for structural validity was of low certainty. Only three studies used EFA and CFA to confirm factor structure [43] while most versions reported α values without testing dimensionality. Downgrades reflected risk of bias (incomplete construct testing) and inconsistency in reported models.

3.13.3. Content Validity
Content validity was supported by low-certainty evidence. Twelve studies described cultural adaptation and expert review, and seven provided clear evidence of relevance and comprehensibility. Downgrades reflected indirectness (e.g., limited use of population-specific cognitive interviews) and reporting gaps, particularly the lack of I-CVI/S-CVI [14,42] indices in some studies.

3.13.4. Reliability (Test–Retest)
Very low-certainty evidence suggests acceptable test–retest reliability. Only two studies reported ICC values > 0.70 [41]. Small samples, inconsistent intervals, and partial reporting led to downgrades for imprecision and inconsistency.

3.13.5. Measurement Error
Measurement error evidence was of very low certainty. One study reported SEM and SDC, but without MIC values interpretability [38] was limited. Evidence was downgraded for imprecision and indirectness.

3.13.6. Construct Validity/Hypothesis Testing
Low-certainty evidence supports construct validity. Four of eight studies confirmed predefined hypotheses [44] others relied on exploratory associations with demographics. Downgrades reflected inconsistency and the absence of prespecified hypotheses.

3.13.7. Criterion Validity
Criterion validity evidence was rated very low. No study compared the Breast-CAM against a true gold standard, and only two studies used indirect comparators [45]. These comparators captured related aspects of awareness but did not represent an established reference standard. As a result, the evidence was downgraded for indirectness and lack of an appropriate criterion.

3.13.8. Responsiveness
Low-certainty evidence suggests that the Breast-CAM is responsive to educational interventions. Three of four studies demonstrated meaningful pre–post improvements. Certainty was downgraded for indirectness, as MIC values were not established [38].

3.13.9. Cross-Cultural Adaptation (Not Graded)
Cross-cultural adaptation procedures [46] were described in 14 studies. Seven followed full translation–review–pilot protocols, while others used partial or undocumented methods. Adaptation was not graded because it is a preparatory step rather than a psychometric property.
Overall, reported Cronbach’s α values for the Breast-CAM were generally high across culturally adapted versions. At the overall instrument level, internal consistency was rated as indeterminate due to insufficient or inconsistent evidence for structural validity. Evidence for structural validity, content validity, construct validity, and responsiveness was generally supported by low certainty. Evidence for reliability and measurement error was rated as very low certainty. Although cross-cultural adaptation procedures were often reported, they were not graded because they represent a preparatory step rather than a psychometric measurement property. A summary of overall ratings and certainty of evidence for each measurement property is presented in Table 13.

3.14. Recommendations for Use and Research
Based on the synthesized evidence, the Breast-CAM may be used selectively, depending on its intended purpose and the strength of the available measurement evidence. The Breast-CAM is most appropriate for population-level assessment, including community surveys, public health monitoring, and evaluation of awareness-raising programs. Across adaptations, internal consistency estimates were generally high [12]. However, COSMIN ratings for internal consistency were classified as indeterminate because evidence for structural validity was limited. Several studies still demonstrated responsiveness to educational interventions. These findings support its use for tracking group-level changes in awareness rather than for individual decision-making [38].

3.14.1. Versions with the Strongest Empirical Support
Where rigorous measurement performance is required, priority should be given to adaptations that established structural validity and followed comprehensive cultural adaptation procedures. These include the Persian BCAM [27], Chinese BCAM [28], and Turkish BCAM [33]. These versions confirmed underlying dimensional structures (via EFA/CFA), reported strong internal consistency (α > 0.80), and demonstrated clearer methodological transparency [13,38]. They are therefore well suited for research and program evaluation contexts where robust psychometric evidence is essential.

3.14.2. Versions Appropriate for Exploratory or Preliminary Use
Some adaptations—such as the Arabic [23,24,25], Swahili [26], Greek [35], and Malay intervention-based version [32]—show acceptable internal consistency and reasonable cultural adaptation [38]. However, because evidence for structural validity and reliability is limited, these versions are best used for exploratory or descriptive studies, with cautious interpretation of results [12].

3.14.3. Versions Not Recommended for Individual-Level Interpretation
Adaptations with incomplete reporting or insufficient psychometric evidence—such as the Pakistani study [31], the Sudanese study [34], and the Chinese outcome-use study without psychometric re-evaluation [29]—are not recommended for applications requiring individual-level interpretation or screening decisions [12]. These versions may still be used for generating hypotheses or preliminary descriptive work.

3.14.4. Limitations Applicable Across All Versions
Across the evidence base, documentation of test–retest reliability, measurement error, and interpretability thresholds was limited. No version has established criterion validity, and minimal important change values are unavailable [10]. As a result, absolute score interpretation and cross-population comparisons should be made cautiously, regardless of version.

4. Discussion

4. Discussion
To our knowledge, this review is the first to systematically synthesize—and, where possible, meta-analyze—the psychometric performance of the Breast Cancer Awareness Measure (Breast-CAM) across culturally adapted versions. Previous validations have been conducted in individual settings, including Arabic [24,25], Swahili [26], Persian [27], Chinese [28,29], Turkish [33], Malay [32], Sudanese [34], and Greek contexts [35]. While these studies provide valuable insight into local adaptation, none examined whether the instrument performs consistently across populations or synthesized psychometric evidence within a unified framework. As a result, questions regarding cross-cultural comparability and overall measurement quality have remained unresolved. Patterns of breast cancer incidence and mortality vary across regions and age groups, underscoring the need for culturally sensitive awareness tools [50].
Across the included studies, internal consistency was the most frequently reported measurement property [13]. Most adaptations reported Cronbach’s α values above commonly accepted thresholds [39] and the pooled estimate reflected this pattern. These findings align with the intended conceptual structure of the Breast-CAM and suggest that reported internal consistency estimates for core awareness domains were generally high. Because structural validity was infrequently assessed, confidence in internal consistency as a measurement property cannot be clearly established under COSMIN standards [51]. Evidence for several other psychometric properties remains limited [10,51]. Test–retest reliability was rarely assessed, and measurement error and interpretability thresholds were largely unexamined. Responsiveness was evaluated in only a small number of studies [51]. Given the influence of educational exposure, cultural beliefs, and socio-behavioral factors on awareness, these gaps are understandable. They limit confidence in the instrument’s ability to perform consistently over time. The available evidence supports the use of the Breast-CAM primarily as a group-level assessment tool in community surveys and evaluations of awareness interventions [12]. At present, the evidence is insufficient to support its use in clinical decision-making, individual risk assessment, or screening.

4.1. Limitations of the Evidence Included in the Review
Several limitations should be considered when interpreting the findings of this review. Although extensive database and registry searches were conducted, restriction to English-language publications may have introduced language bias and resulted in the lack of locally developed or unpublished adaptations of the Breast-CAM. In addition, culturally adapted versions reported in gray literature [52], theses, or program evaluations may not have been captured, potentially under-representing adaptations from low- and middle-income settings. None of the included studies formally assessed measurement invariance [53] across language or cultural versions, limiting the ability to determine whether pooled estimates reflect equivalent constructs across populations. Pooling psychometric estimates across heterogeneous adaptations is challenging because versions differ in item content, scoring, and study design. Cronbach’s α depends on the number of items and the underlying factor structure. Because Breast-CAM adaptations varied in dimensional structure, internal consistency estimates across studies may not be directly comparable, and pooled α values should be interpreted cautiously. The body of evidence available to evaluate the Breast-CAM is also limited in scope and depth. Most validation studies relied on small, convenience samples drawn from university populations, outpatient clinics, or community education programs. This sampling approach limits generalizability, particularly to underserved or higher-risk populations in which awareness gaps may be most pronounced. Demographic characteristics such as socioeconomic status, literacy level, and access to healthcare were reported inconsistently, making it difficult to assess whether the instrument performs similarly across diverse groups.
Reporting of psychometric properties varied across the included studies. Internal consistency was almost always reported, but other properties central to COSMIN standards—such as structural validity, test–retest reliability, responsiveness, and measurement error—were often assessed inconsistently or described in limited detail [12,13]. Construct validity was frequently examined using descriptive analyses rather than predefined hypotheses or theory-driven approaches. Similar limitations have been noted in recent evaluations of outcome measurement instrument reviews, which highlight wide variation in methodological quality and recurrent shortcomings in psychometric assessment. Together, these issues reduce confidence in individual study findings and constrain the interpretation of pooled estimates, reinforcing the importance of close adherence to COSMIN guidance [51].
Adaptation procedures contributed further variability. Several studies relied on forward–back translation without incorporating cognitive interviews or direct engagement with target populations, raising concerns about conceptual equivalence across cultures. The predominance of cross-sectional designs also limits insight into temporal stability and sensitivity to change. Geographical representation was uneven, with stronger coverage in the Middle East and East Asia but limited evidence from regions with high breast cancer mortality. Collectively, these limitations constrain external validity and highlight the need for more rigorous and geographically diverse validation efforts.

4.2. Limitations of the Review Processes Used
Several methodological aspects of this review may influence how the findings are interpreted. Although the search strategy was comprehensive and covered multiple databases and trial registries, no psychometric evaluations beyond the peer-reviewed literature were identified. Studies that used the Breast-CAM only as an outcome measure, without assessing its measurement properties, were therefore excluded. The synthesis reflects the published psychometric evidence and may not capture unpublished or locally developed adaptations.
Limiting the review to English-language publications may have introduced language bias. Although multiple culturally adapted versions of the Breast-CAM from diverse linguistic and regional populations were included, validation studies published exclusively in local-language journals may not have been captured. This may limit the comprehensiveness of the review and should be considered when interpreting conclusions regarding global cross-cultural validity. Screening was conducted by a team of four reviewers with support from ASReview to priorities likely relevant records. While machine-learning assistance [16,54] improved efficiency, it relies on training inputs, and studies using atypical terminology may have required manual identification. Although all disagreements were resolved through consensus, the combination of automation and reviewer judgment introduces a small risk of selection bias.
The completeness of the synthesis was shaped by limitations in primary reporting. Key properties such as measurement error, temporal stability, and factorial structure were frequently missing, precluding more detailed psychometric evaluation. Planned subgroup or comparative analyses were not feasible due to insufficient or inconsistent data. These limitations do not undermine the overall conclusions but reduce the certainty with which findings can be generalized [10].

4.3. Implications for Practice, Policy, and Future Research
The findings of this review suggest that the Breast-CAM is well suited for population-level awareness assessment, particularly in community surveys and public health program evaluation [8]. Across multiple cultural adaptations—including Arabic [23,24], Swahili [26], Persian [27], Chinese [28], Malay [32], Turkish [33], Sudanese [34], and Greek versions [35]— reported Cronbach’s α values for symptom recognition items were generally high, [13] although evidence for structural validity was limited. The instrument can support identification of knowledge gaps, monitoring of awareness initiatives, and planning of targeted public health messaging.
Greater caution is warranted when interpreting domains related to beliefs, barriers, and help-seeking intentions. Variation in these domains likely reflects genuine contextual differences—such as stigma, cultural norms, and access to screening—rather than weaknesses in the instrument itself [55]. Policymakers should therefore avoid direct cross-country comparisons and interpret findings in relation to local sociocultural and health-system contexts.
Future research should prioritize strengthening the psychometric foundation of the Breast-CAM. Structural validity should be evaluated systematically using confirmatory factor analysis or item-response theory rather than relying on internal consistency alone. Longitudinal studies are needed to assess temporal stability and establish minimal important change values. Cultural adaptation processes should incorporate cognitive interviewing, stakeholder engagement, and multi-group invariance testing [53] to ensure conceptual equivalence beyond literal translation. Greater emphasis should be placed on under-represented populations—including older adults, low-literacy groups, rural communities, and women facing structural barriers to screening—to ensure the instrument remains relevant and equitable across diverse settings. Table 14 summarizes the key priorities for strengthening the psychometric evaluation and future application of the Breast-CAM identified in this review.

5. Conclusions

5. Conclusions
This systematic review and meta-analysis is the first comprehensive synthesis of psychometric evidence on the Breast Cancer Awareness Measure across culturally adapted versions. The findings suggest that reported internal consistency estimates for the Breast-CAM are generally high across different settings, supporting its use in population-level assessments and public health program evaluation. Evidence for other measurement properties remains limited and inconsistently reported. Structural validity, test–retest reliability, responsiveness, and measurement invariance have not been systematically examined. Until these gaps are addressed, the Breast-CAM should be used with caution for purposes beyond group-level assessment. Further psychometric research using rigorous and longitudinal designs is needed to support broader application of the instrument.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기