Development of a robust corpus for automated evaluation of online health information in Chinese using the DISCERN scale.

E T; Li X; Liang J; Ma J; Fang Q; Chen S; Lei J; Chute CG

doi:10.1093/jamia/ocaf175

← 뒤로

Development of a robust corpus for automated evaluation of online health information in Chinese using the DISCERN scale.

Journal of the American Medical Informatics Association : JAMIA 2026 Vol.33(2) p. 316-325

E T, Li X, Liang J, Ma J, Fang Q, Chen S, Lei J, Chute CG

PMC 전문 ↗ 원문 ↗ DOI ↗ BibTeX ↓ RIS ↓

📝 환자 설명용 한 줄

이 논문을 인용하기

BibTeX ↓ RIS ↓

APA E T, Li X, et al. (2026). Development of a robust corpus for automated evaluation of online health information in Chinese using the DISCERN scale.. Journal of the American Medical Informatics Association : JAMIA, 33(2), 316-325. https://doi.org/10.1093/jamia/ocaf175

MLA E T, et al.. "Development of a robust corpus for automated evaluation of online health information in Chinese using the DISCERN scale.." Journal of the American Medical Informatics Association : JAMIA, vol. 33, no. 2, 2026, pp. 316-325.

PMID 41223037

DOI 10.1093/jamia/ocaf175

Abstract

[OBJECTIVE] To develop the first comprehensive, standardized annotated corpus of Chinese online health information (OHI) using the full 16-item DISCERN instrument and to establish a reliable annotation process that supports automated quality assessment.

[MATERIALS AND METHODS] We assembled 510 web-sourced articles on breast cancer, arthritis, and depression. All the articles were independently annotated by three trained raters using the DISCERN scale. Annotation followed a four-step workflow: data collection and preprocessing, rater training, iterative annotation, and quality control. Raters calibrated through consensus sessions and calibration articles. The Dawid-Skene model aggregated individual annotations into final consensus scores. Original five-point ratings were retained and binarized (scores 1-3 as low quality, 4-5 as high quality) to enable both fine-grained and coarse evaluation for machine learning.

[RESULTS] Initial annotation of a 60-article pilot produced low agreement (mean Krippendorff's α ≈ 0.022) due to subjective variability. Successive calibration exercises improved agreement markedly, culminating in a corpus-wide Krippendorff's α of 0.834. Consensus scores correlated strongly with individual rater scores, confirming annotation robustness. The dual-scale design yielded a relatively balanced distribution of labels across topics, with roughly equal representation of low- and high-quality articles, and preserved granularity for detailed DISCERN analysis.

[DISCUSSION] Our iterative calibration approach and consensus modeling effectively addressed the subjective ambiguity inherent in quality assessment. The binary and five-class labeling strategies facilitate flexible downstream applications, allowing automated systems to perform both broad filtering and nuanced quality differentiation. The high inter-rater reliability demonstrates that rigorous training and consensus methods can overcome domain-specific annotation challenges.

[CONCLUSION] The resulting Chinese OHI corpus, annotated via a standardized DISCERN framework and refined through iterative calibration, provides a robust benchmark for training and evaluating machine learning models. This resource lays the foundation for scalable, reliable automated quality assessment of OHI in Chinese public health settings.

MeSH Terms

Humans; China; Internet; Machine Learning; Consumer Health Information; Breast Neoplasms; Data Curation; Depression