본문으로 건너뛰기
← 뒤로

Transformer-assisted convolutional feature extraction with deep representation learning models for lung and colon cancer diagnosis using histopathological images.

1/5 보강
Scientific reports 📖 저널 OA 98.2% 2021: 24/24 OA 2022: 32/32 OA 2023: 45/45 OA 2024: 140/140 OA 2025: 938/938 OA 2026: 732/767 OA 2021~2026 2026 Vol.16(1) p. 4026 OA
Retraction 확인
출처

Jayanthi S, Kaur I, Laxmi Lydia E, Kumar KV, Joshi GP, Cho W

📝 환자 설명용 한 줄

Cancer is one of the deadliest diseases that can be caused by some metabolic anomalies or a convergence of transmitted disorders.

이 논문을 인용하기

↓ .bib ↓ .ris
APA Jayanthi S, Kaur I, et al. (2026). Transformer-assisted convolutional feature extraction with deep representation learning models for lung and colon cancer diagnosis using histopathological images.. Scientific reports, 16(1), 4026. https://doi.org/10.1038/s41598-025-34160-7
MLA Jayanthi S, et al.. "Transformer-assisted convolutional feature extraction with deep representation learning models for lung and colon cancer diagnosis using histopathological images.." Scientific reports, vol. 16, no. 1, 2026, pp. 4026.
PMID 41501109 ↗

Abstract

Cancer is one of the deadliest diseases that can be caused by some metabolic anomalies or a convergence of transmitted disorders. Lung and colon cancer (LCC) are posited as the most common causes of death and disability in the modern world. Identification of the tumour in an initial stage before it spreads further inside the body will lessen the chance of death. Histopathological images (HIs) are extensively applied by medical professionals for identification, and they are highly essential in predicting patients' chances of survival. Usually, detection cancer using HIs requires a lengthy expert evaluation, but advanced technology allows for faster and more efficient diagnosis. Recently, artificial intelligence (AI) and DL methods are prevalently utilized for quick inspection, decision making and effectual handling of high-dimensional data, such as multi-dimensional anatomical images and videos. In this manuscript, a Lung and Colon Cancer Diagnosis via Transformer-Assisted Convolutional Feature Extraction and Deep Representation Learning (LCCD-TCFEDRL) technique using HI analysis is proposed. The aim is to develop an effective diagnostic model for LCC by utilizing advanced analytical methods to improve early detection accuracy and support improved treatment outcomes. Initially, the guided image filtering (GIF) model is employed in the image pre-processing stage to enhance the quality of images by eliminating the noise. Furthermore, the CoAtNet method is utilized for feature extraction to recognize and isolate the most relevant information from raw data. Finally, the bidirectional temporal convolutional network (BiTCN) with Adan optimizer (AO) is employed for the LCC classification process. The experimentation of the LCCD-TCFEDRL methodology is examined under the LCC HIs dataset. The comparison study of the LCCD-TCFEDRL methodology portrayed a superior accuracy value of 99.36% over existing models.

🏷️ 키워드 / MeSH 📖 같은 키워드 OA만

📖 전문 본문 읽기 PMC JATS · ~174 KB · 영문

Introduction

Introduction
The World Health Organisation (WHO) says cancer is the primary cause of death globally. Cancer cells exhibit genetic mutations, autonomous growth, and significant metastatic power1. LCC are the most affected organ and represents the highest death rates. The probability of both LCCs occurring together is around 17%. Although this occurrence is uncommon, without early detection, the spreading of cancer cells between these two organs is highly likely2. At Present, suitable treatment and prompt identification are the only ways to decrease cancer deaths. The quicker an individual is identified, the better the treatment process, and the higher the likelihood of retrieval and survival of the patients3. There are numerous causes behind cancer, from tobacco to physical carcinogens, high body mass index, and alcohol consumption, like contact with UV rays, as well as some genetic and biological cancer-causing agents. Nevertheless, the reason might differ from person to person4. General cancer symptoms are pain, nausea, fatigue, breathing difficulties, continuous cough, muscle pain, weight loss, bruising, bleeding, etc5. Therefore, it is difficult to confirm the existence of the cancer without a detailed diagnostic process. In several situations, the person shows minor to no symptoms in their initial phases, and if symptoms become noticeable, it is already too late6.
Analyzing HIs was conventionally a labour-intensive process for health professionals to diagnose colon and lung cancer. However, this process is automated using current technological tools, resulting in reduced time and effort. For early identification and prolonged survival, traditional therapies are considered options. Enhanced monitoring for people with simultaneous LCC is sensible to evaluate; further, genetic and epidemiological analysis is vital to determine a possible link between those two cancers7. Currently, numerous computer-assisted diagnosis systems (CADs) are presented for automatically checking for symptoms of cancer development in the lung and colon. Because of AI, it is possible to detect cancer with a diagnostic system8. The popular AI techniques to identify LCC are machine learning (ML) and DL models from HI analysis. AI has displayed tremendous ability in the diagnosis field and has given us a proficient replacement for classic diagnostic techniques8. As a subfield of ML, DL can help this procedure by allowing machines to understand and learn from high-dimensional data, namely images and videos, through algorithms that are encouraged by the brain’s functional structures. In the process of making an effective disease identification by medical practices, precise identification of histological images is vital and is automated with the employment of ML models, especially DL. The DL methods are created to allow machines to process higher-dimensional data, like multi-dimensional anatomical images and videos9.

Paper contributions
In this article, a Lung and Colon Cancer Diagnosis via Transformer-Assisted Convolutional Feature Extraction and Deep Representation Learning (LCCD-TCFEDRL) technique is proposed by using HI analysis. The main contributions of this study are mentioned below:

The guided image filtering technique is integrated during the pre-processing stage to improve the visual quality of HIs by effectually eliminating noise. This improves feature clarity and ensures that critical cellular structures are preserved, contributing to more accurate feature extraction and ultimately improving the performance of the LCC classification model.

The LCCD-TCFEDRL approach utilizes the CoAtNet model for effectually capturing both convolutional and attention-based representations, enabling the model to learn rich spatial and contextual features. This dual capability improves the representation power of the network, allowing it to better distinguish between subtle discrepancies in histopathological structures, which significantly enhances the accuracy and robustness of LCC classification.

The LCCD-TCFEDRL methodology implements the BiTCN model optimized with the AO technique for performing accurate and efficient LCC classification by capturing temporal dependencies in feature sequences. This model allows for efficient processing of contextual patterns in both forward and backwards directions, improving its decision-making capability and mitigating training instability, thereby enhancing classification precision across all cancer classes.

The novelty of the LCCD-TCFEDRL model is in the deliberate adaptation and integration of guided image filtering for noise reduction, CoAtNet for hybrid convolutional-attention feature extraction, and BiTCN optimized with the AO technique for effective classification within the framework. This incorporation assists in capturing rich spatial and temporal features, and improves image quality, optimization efficiency resulting in significantly higher diagnostic accuracy for LCC compared to existing methods.

Outline of the work
The remaining section is established as demonstrated. Section "Prior work on LCC analysis" reviews the relevant network models and related literature; Section "Materials and methods" details the experimental method, comprising the proposed framework and processing steps; Section "Experimental validation" presents and discusses the experimental outcomes; Section "Conclusion" completes the study by summarising key findings and outlining current limitations.

Prior work on LCC analysis

Prior work on LCC analysis
Ozdemir et al.10 presented a hybrid DL approach, which incorporates convolutional neural networks (CNN) alongside vision transformers (ViTs). By enhancing and incorporating block and grid attention mechanisms (AMs) with InceptionNeXt, the approach efficiently extracts large-scale and fine-grained features in CT scans. Attallah11 introduced a CAD technique that merges deep features from EfficientNetB0, MobileNet, and ResNet-18 with multiple handcrafted feature extraction techniques. Vittal12 proposed a context-aware multi-image fusion (CA-MIF) model in the pre-processing step. Then, the model used preprocessing, U-Net + + for segmentation, DeepLabv3 Plus for refined lesion analysis, and ResNet-50 for final classification. Dash et al.13 presented a hybrid approach utilizing an advanced deep transfer learning (TL) EfficientNet and a masked autoencoder for image-based distribution estimation (MADE) technique. The model utilized EfficientNetB7 for final classification.
Alsulami et al.14 introduced a new method named LCC by Swin transformer (ST) with an ensemble model on the histopathologic image (HI) (LCCST-EMHI) model. The approach integrated BF, BiLSTM with Multi Head Attention (BiLSTM MHA), Double Deep Q Network (DDQN), and Sparse Stacked Autoencoder (SSA) optimized through the Walrus Optimization Algorithm (WaOA) method. Obayya et al.15 developed an innovative biomedical image analysis for LCC detection through a tuna swarm algorithm with a DL (BILCCD-TSADL) method. The technique involved Gabor Filtering (GF)-based preprocessing, GhostNet-based feature extraction, adaptive Firefly Algorithm Optimization (AFAO)-based tuning, and the Echo State Network (ESN)-based classification. Kassem16 suggested a Snake Optimiser with a DL-driven Disease Detection technique for Colorectal Cancer (SODL-DDCC). The model also integrated BF for noise elimination, Inceptionv3 for feature extraction with Snake Optimization (SO) for tuning, and Graph Convolution Network (GCN) for final classification. Ragab et al.17 recommended a novel Self-Upgraded Cat Mouse Optimiser with an ML-based LC Classification (SCMO-MLLCC) approach to CT imaging. Further, a Gaussian Filtering (GF) is utilized for noise removal, densely connected network (DenseNet-201) architecture is used for feature extraction with the slime mould algorithm (SMA) as parameter tuning, and Elman Neural Network (ENN) for final classification. A comparison analysis of existing LCC analyses is in Table 1.

Though the utilized studies are efficient, they exhibit several limitations, such as, lack of local–global representations, high computation requirement. Various studies illustrate mitigated scalability and consistency across varied His, as they depend on handcrafted features or multi-stage pipelines. Furthermore, techniques utilizing TL or ensemble models need substantial parameter tuning and hence does not generalize well across diverse datasets. Moreover, few techniques illustrate limitation due to insufficient balance and computational cost and also optimization-based models introduce more complexity. Few models are sensitive to pre-processing as the segmentation-based process rely on accurate ROI masks. Various existing models also give restricted end-to-end adaptability, highlighting a research gap in unified architectures capable of robust LCC classification.

Materials and methods

Materials and methods

Model overview
In this study, an LCCD-TCFEDRL technique is proposed by using HI analysis. The LCCD-TCFEDRL technique involves various processes, including image pre-processing to improve the image quality. Next, the CoAtNet method is employed in the feature extraction procedure to select the most relevant information. The last stage of the LCCD-TCFEDRL model utilizes BiTCN for LCC classification using the AO model to improve the result. Figure 1 illustrates the overall process of the LCCD-TCFEDRL model.

Guided image filtering
Initially, the GIF model is employed to improve the quality of images by eliminating the noise. Filtering is essential for images to extract unnecessary things. It relies on the value of pixels in the imagery, and several kinds of filters rely on a weighted average of pixel values18. The model is selected for its edge-preserving smoothing capabilities, making it highly appropriate for improving medical images without blurring critical anatomical boundaries. This model also utilizes guidance image for refining the output, ensuring better structure preservation. The model also shows excellence in handling noise while also maintaining fine details, making it appropriate for preprocessing in biomedical image analysis tasks.

Bilateral filter (BF)
BF is an edge-preserving image smoothing filter. A weighted average of neighbouring pixels substitutes the intensity of specified pixels. The selected weights rely on a Gaussian distribution. Weights depend on the value of pixels, depth, colour, and more. The significant benefits of the model are maintaining the edges, creating a staircase image, and creating the illusion of a carton, likewise presenting false edges. Once filtering, the subsequent pixel value is specified.here indicates a set of pixels surrounding the pixel . The weighted values are not reliant on guided images, . The intensity or colour of the guided image is taken, although the weighted filter is specified in BF. Consider the image being filtered as '' and the guided image as '', also pixels located ‘’, formerly the joint BF kernel is.
Now, denotes a normalizing factor such that , and represents pixel coordinates. The parameters and modify the sensitivity of spatial similarity and range similarity, respectively.

Guided filter (GF)
The GF is an advanced variant of BFs. Once employing the similar notation, either or are specified in the image and similar. Let indicate a linear transform of in the selected window that is centred at pixel :
Now refers to some linear coefficient presumed to be constant in To describe the linear coefficient concerning image parameters. The output image is the input image , and it gets rid of unnecessary elements such as noise and textures:
To counter the huge value of , reduce the succeeding cost function in window :
Now, denotes a regularisation parameter, and Eq. (5) employs the ridge regression methodology.
Now, the mean and variance of are and , respectively, and the pixel counts in are specified as . Likewise, the values of will be different, thus calculated for every window in the image.

CoAtNet-based feature extraction
For the feature extraction procedure, the LCCD-TCFEDRL methodology implements the CoAtNet model to recognize and isolate the most relevant information from raw data19. The fusion model is chosen for its effectual capability in improving both accuracy and generalization, specifically on intrinsic histopathological textures. The model also shows efficiency over other hybrid CNN-Transformer models due to its balanced architecture. BiTCN is appropriate for histopathological data as it captures contextual dependencies across spatial patterns in image sequences, presenting better feature propagation and robustness related to conventional recurrent models. Its parallel structure also ensures faster training and inference, critical for medical applications. Figure 2 indicates the flow of the CoAtNet technique.

Utilising CoAtNet integrates the power of CNNs technique with the capability of transformers. The image classification techniques are upgraded to incorporate either strength. With this framework, local patterns are acquired with a convolution operation, while also considering global context. In the CoAtNet method, the main principle is integrating CNNs with Transformers to carry out depth-wise convolutions20. Some strengths influence this incorporation of methods:
Convolutional Layers: An inductive bias might be acquired with a convolution layer that induces better generalization in scenarios with specified data. Local spatial patterns are employed as input in the convolution layer to extract local spatial features in the data. Textures and edges are identified to use a filter on image areas named receptive areas. To concentrate localized features, the convolution layer is specifically capable of removing vital data from smaller patches of the image.
Depth-wise Convolution: An aggregated area is made by utilizing a fixed kernel. According to input features, , here indicates height, denotes width, and represents channel counts. Every channel is convolved with a diverse filter , resulting in an output mapping feature . The depth-wise convolution at location for channel is specified in Eq. (8).
Now, and refer to 2D convolution kernels, such as width and height. refers to the kernel value for channel , at position In this operation, local spatial data is stored to combine features inside a small, localized region of image-acquiring local spatial data. While depth-wise convolutions are beneficial to acquire local context inside an image, they cannot be effective at taking long-range, image‐wide dependencies. Convolutional networks hold the benefits of being effective and capable of generalizing well, particularly in samples with restricted datasets. Primarily, this means that they are capable of learning from some instances because their framework is ideal for examining local patterns.
Self-Attention Layers: It is proficient in modelling global dependency and has a superior capability to other layers, which makes it a great selection when it comes to handling massive datasets. On the other hand, self‐attention layers are intended to deal with global dependency. Spatial relations are examined through every spatial location inside a single image rather than concentrating on local patterns. It identifies long‐range dependency employing pairwise relations among any dual points. A complex and huge dataset was acquired. Additionally, the methodology is proficient in managing data and modelling intricate patterns, making it specifically beneficial while scaling up. This integration enhances the scalability and generalization of CoAtNet to deal with diverse sizes of data. In CoAtNet, AMs are incorporated with the convolution layer. Consequently, depth-wise convolution and self‐attention are considered complementary operations which are employed to process spatial data. To combine these dual models, CoAtNet employs relative positional embedding to assess attention; it also integrates convolutional kernels into the computation of self‐attention. With these methods, the advantage of translation equivariance of convolution and attention-based techniques is that it preserves the benefits of dual methodologies. Initially, the CoAtNet-generated feature maps flattened channel-wise, while also conserving the spatial arrangement to form sequential tokens. These tokens, maintaining local-to-global contextual data, are sent to BiTCN in ordered sequences for temporal-spatial modeling.

LCC diagnosis using BiTCN
At last, the BiTCN is mainly designed for the LCC classification process with the AO model to enhance the performance of the classification21. This model shows efficiency in capturing both local and global spatial patterns in HIs and also integrates temporal convolutional mechanisms in both directions, unlike conventional CNNs. This technique also improves contextual comprehension across image regions, thus refining the classification boundaries by optimizing key parameters. Bi-TCN utilizes dual -directional comprehensive convolutional networks: one TCN encodes the upcoming covariates, while the other encodes the previous covariates and the historic value of sequences. Hence, the method can study temporal data from the information, and the use of convolution preserves processing performance. Bi-directional dilatational convolution, in contrast, utilizes a comparable number of upcoming instances as prior instances for forecasting as dilatational convolution.
This model comprises dual independent convolution routes: one addresses the forward data and the other addresses the backwards information. The outcomes of these dual routes are often connected at the last and applied to forecast an output. In other words, the method can implement backwards and forward convolution on input data.
Still, bi-directional null convolution is successful in removing feature data at the latency cost, as it utilizes backwards convolution, which is so-called data from upcoming instants. The residual networking architecture is a better solution to the explosion and gradient disappearing problem. Residual link is removed from the method using the novel features and short-circuit link to prevent the loss of significant new features in the information extraction process, thereby enhancing model stability. The residual link is presented in Eq. (9).whereas denotes the residual network and refers to the input. Over the residual link, it can successfully avert the gradient from vanishing.
The residual link is to capture an input value and transform an output over the module’s series, and the method of calculating the residual link is presented in Eq. (10).
It is presumed that Bi-TCN contains a stack of residual blocks; thus, an output was exposed in Eqs. (11, 12):
denotes the output of the layer in the residual block, signifies the serial number of residual blocks, and represents the sequence length. The Bi-TCN residual block includes dual residual units, and every residual unit includes batch normalization (BN), activation function, residual connection, bi-dilation convolution, and dropout layer.
Bi-directional dilatation convolution is applied to remove features from either the backwards or forward direction of input data. Weight normalization through BN standardizes the input of the HL to evade the gradient vanishing problem. The activation function can further aid in improving the gradient disappearance problem. The Dropout incorporation can successfully resolve the overfitting problem of the model. Residual connectivity, like deep layers of the network, does not cause the performance of the model to decline.
By incorporating the modified adaptive optimization and Nesterov momentum models and presenting decoupled weighted decay, the AO is gained22. In the optimization field, there is a momentum model equivalent to the model, the Nesterov momentum model.
The Nesterov momentum model has a fast theoretical rate of convergence for smoother and usually convex problems, and can conceptually handle large batch sizes. Nesterov model fails to compute the gradient; however, it utilizes the momentum to discover a point of extrapolation, and then takes on the momentum gathering afterwards, computing the gradient at that point. The most crucial cause is that the Nesterov method requires computing the gradient at the point of extrapolation, which needs various overloads of parameters of the model in upgrading at present points and demands artificial back-propagation (BP) at the point of extrapolation.
To provide the complete benefits of the Nesterov momentum model, AO investigators gained the last AO by incorporating the modified Nesterov momentum using the adaptive optimizer model with weighted attenuation. To resolve the difficulty of numerous model parameter burdens, the investigators initially revised as presented in Eq. (14).

Incorporating the rewritten Nesterov momentum model using an adaptive class optimizer, substituting the upgrade of from the average method, and utilizing the 2nd order instant to reduce the rate of learning has led to a simplified form of AO method. As displayed in Eq. (15).

While it is noticed that the upgrade of incorporates the gradient using the difference of the gradient, in real-time applications, it is often required for treating the dual physically different meaningful things individually. Therefore, the investigators established the gradient dissimilarity momentum , as presented in Eq. (16).
According to the concept of L2 regularisation decoupling, AO presents a weight attenuation tactic, where all iterations abate some first-order estimate of the optimizer objective , as presented in Eq. (17).

For L2 weight regularisation, it is very smooth, so it is redundant to create a 1st estimate. Then, the 1st estimate of training loss is implemented, and L2 weight regularisation goes unnoticed. Formerly, the final iteration of AO will transform into what is displayed in Eq. (18).
Algorithm 1 illustrates the AO technique.

Table 2 summarises the key hyperparameters used in the AO model. The learning controls how much the model weights are updated during training, usually set to 0.01 for balanced convergence. The , , ​ parameters depict decay rates that regulate the momentum, gradient difference, and adaptive components, subsequently assisting the optimizer to effectually smooth updates and adapt to changing gradients. Lastly, is a small constant added for numerical stability to prevent division by zero during calculations. These hyperparameters collectively ensure stable and efficient model training with AO.

The last AO model is gained by incorporating the above-mentioned dual developments, Eqs. (15) and (17) into the original version of AO. The fitness choice is a significant aspect of manipulating the performance of the AO model. The method of hyperparameter range contains the solution encoding approach to estimate the efficacy of the candidate solution. The AO system prioritizes precision as the foremost standard to design FF, as demonstrated below.here and characterize the positive value of true and false.

Experimental validation

Experimental validation

Dataset details
The performance simulation of the LCCD-TCFEDRL model is examined under the LCC HIs dataset23. The method runs on Python 3.6.5 with an i5-8600k CPU, 4GB GPU, 16GB RAM, 250GB SSD, and 1 TB HDD, using a 0.01 learning rate, ReLU, 50 epochs, 0.5 dropout, and batch size 5. The dataset comprises 25,000 samples under five class names as signified below in Table 3. Figure 3 specifies the sample images. Figure 4 illustrates the five different classes of LCC images.

Result analysis
Figure 5 depicts the confusion matrices created by the LCCD-TCFEDRL methodology on 80%:20% and 70%:30% of TRPHE/TSPHE under the LCC HIs dataset. The figure illustrates that most samples are correctly classified into their respective categories. The lower values (0.2%) depict the minimal misclassifications and this consistency across both TRPHE/TSPHE under diverse 80:20 and 70:30 splits highlight the robustness and reliability of the model, illustrating that it generalizes well without significant overfitting or data leakage. The outcomes state that the LCCD-TCFEDRL model effectively recognizes each class.

Table 4, Fig. 6 present the LCC detection of the LCCD-TCFEDRL model at 80:20 and 70:30 under the LCC HIs dataset. On 80% TRPHE, the LCCD-TCFEDRL model obtains an average of 99.36%, of 98.40%, of 98.40%, of 98.40%, of 99.00%, and Kappa of 99.07%. Besides, on 20% TSPHE, the LCCD-TCFEDRL methodology obtains an average of 99.35%, of 98.37%, of 98.37%, of 98.37%, of 98.98%, and Kappa of 99.06%. Also, on 70% TRPHE, the LCCD-TCFEDRL methodology obtains an average of 99.19%, of 97.97%, of 97.97%, of 97.97%, of 98.73%, and Kappa of 98.80%. At last, on 30% TSPHE, the LCCD-TCFEDRL methodology obtains an average of 99.20%, of 98.00%, of 98.00%, of 98.00%, of 98.75%, and Kappa of 98.81%.

In Fig. 7, the training (TRAIN) and validation (VALID) outcomes of the LCCD-TCFEDRL method at 80:20 under the LCC HIs dataset is displayed. The figure indicates that the TRAINand VALID values demonstrate upward tendencies, specifying the potential of the LCCD-TCFEDRL methodology with an outstanding performance across numerous iterations. Furthermore, the TRAIN and VALID remain close throughout the epochs, signifying minimal over-fitting and demonstrating the excellent performance of the LCCD-TCFEDRL methodology.

In Fig. 8, the TRAIN and VALID loss graphs of the LCCD-TCFEDRL approach on 80:20 under the LCC HIs dataset are depicted. It is signified that the TRAIN and VALID values exemplify downward tendencies, informing the efficiency of the LCCD-TCFEDRL approach to balance a trade-off. The persistent decline further assures the heightened performance of the LCCD-TCFEDRL methodology and eventually modifies the prediction outcomes.

In Fig. 9, the precision-recall (PR) inspection analysis of the LCCD-TCFEDRL model at 80:20 under the LCC HIs dataset provides insight into its performance through mapping Precision against Recall for each class. The figure illustrates that the LCCD-TCFEDRL approach continually attains improved PR values among different classes, signifying its potential to support an essential proportion of true positive predictions. The continual improvement in PR results across all classes denotes the LCCD-TCFEDRL approach’s efficacy in the classifier process.

In Fig. 10, the ROC inspection of the LCCD-TCFEDRL methodology on 80:20 under the LCC HIs dataset is inspected. The findings symbolize that the LCCD-TCFEDRL methodology accomplishes elevated ROC outcomes on all classes, indicating central capability to distinguish the classes. This constant trend of increased values of ROC across numerous classes implies the efficacious performance of the LCCD-TCFEDRL model on class prediction, underlining the classification process’s strong nature.

Comparative discussion under diverse datasets
Table 5, Fig. 11 depict the comparative study of the LCCD-TCFEDRL technique with current methodologies on various metrics under the LCC HIs dataset22,24. The outcomes emphasized that the existing methods, such as CNN, AlexNet, ResNet, LeNet, GoogleNet, ResNet50 + SVM RBF, and DITNN, have shown the worst performance. But, the LCCD-TCFEDRL model got higher , , and of 99.36%, 98.40%, 98.40%, and 98.40%, correspondingly.

Table 6, Fig. 12 indicate the comparison assessment of the LCCD-TCFEDRL technique with existing models under the LCC dataset25,26. The Graph-Sparse Principal Component Analysis Network (GS-PCANET) technique illustrated an of 90.80%, of 93.59%, of 91.82%, of 95.43%, respectively. Likewise, the Densely Connected Convolutional Network (DenseNet169) approach specified an of 95.00%, of 90.48%, of 95.13%, of 93.37%, subsequently. Furthermore, the ColonNet approach indicated an of 93.60%, of 91.48%, of 94.95%, of 90.57%, while CNN reached an of 90.00%, of 90.58%, of 93.38%, of 90.06%. Additionally, the CNN with Efficient Channel Attention Network (CNN + ECA-Net) approach portrayed an of 94.01%, of 91.27%, of 95.75%, of 93.95%. However, the LCCD-TCFEDRL approach highlighted the superior of 98.49%, of 98.50%, of 98.34%, of 98.49%.

Table 7, Fig. 13 specify the comparison evaluation of the LCCD-TCFEDRL methodology with existing approaches under the LCC image dataset27,28. The CNN with convolutional block attention module (CNN + CBAM) methodology attained an of 96.92%, of 94.70%, of 94.87%, of 85.63%, while the DenseNet121 obtained an of 87.00%, of 93.87%, of 95.77%, of 86.13%. Likewise, VGG16 attained lesser of 89.73%, of 87.96%, of 89.49%, of 96.09%. Additionally, InceptionV3 achieved an of 92.11%, of 85.44%, of 91.79%, of 94.63%, while the Xception reported an of 93.23%, of 88.81%, of 89.51%, of 95.29%. However, the LCCD-TCFEDRL model emphasized the superior values with an of 97.83%, of 97.83%, of 97.84%, of 97.89%.

In Table 8, Fig. 14, the processing time (PT) of the LCCD-TCFEDRL model with the present models is proven. The LCCD-TCFEDRL method obtains a lower PT of 6.73 s while the CNN, AlexNet, ResNet, LeNet, GoogleNet, ResNet50 + SVM RBF, and DITNN approaches obtain greater PT of 12.28 s, 10.75 s, 13.14 s, 9.39 s, 11.14 s, 10.62 s, and 13.28 s, respectively.

Table 9, Fig. 15 demonstrate the error analysis of the LCCD-TCFEDRL approach with existing models. The LeNet model achieved the highest of 23.40% but still showed relatively low of 7.55%, of 6.55%, and of 4.98%. ResNet followed with an of 6.23%, of 6.55%, of 3.47%, and of 7.66%, indicating imbalanced predictive capabilities. Other models like CNN, AlexNet, GoogleNet, and DITNN recorded below 10%, with ranging from 3.23% to 7.66%, illustrating poor classification reliability. The LCCD-TCFEDRL technique recorded only an of 0.64% with 1.60% for all other metrics, additionally highlighting its limited efficiency. These results underline the challenges in existing models and emphasize the requirement for robust solutions.

Table 10, Fig. 16 indicate the ablation study of the LCCD-TCFEDRL methodology. The baseline BiTCN model attained an of 97.49%, of 96.46%, of 96.13%, and of 96.14%. When CoAtNet is integrated for feature extraction without optimization, the performance improved to an of 98.16%, of 96.96%, of 96.85%, and of 96.94%. On the contrary, incorporating the AO optimizer without feature extraction results in an of 98.71%, of 97.63%, of 97.60%, and of 97.72%. Finally, the LCCD-TCFEDRL model that incorporates both feature extraction and optimization achieved the optimum performance with an of 99.36%, of 98.40%, of 98.40%, and of 98.40%, clearly demonstrating the efficiency of integrating both components. Compared to other advanced models, the LCCD-TCFEDRL model consistently outperforms across all metrics. This validates the efficiency of the LCCD-TCFEDRL model and its components in enhancing classification accuracy and robustness.

Table 11 specify the analysis of the LCCD-TCFEDRL technique against various recent models in terms of FLOPs, GPU memory usage, and inference time29. The LCCD-TCFEDRL technique indicates superior efficiency with the lowest FLOPs of 0.34G, GPU memory usage of 879 MB, and the fastest inference time of 4.98 s. In contrast, models such as ResNet50 and Attention-InceptionResNet-V2 exhibit significantly higher FLOPs of 9.13G and 15.00G and longer inference times of 16.17 and 8.33 s, subsequently. Even lightweight models such as MobileNet and MobileNetV2 illustrate higher resource demands and slower inference compared to the LCCD-TCFEDRL method, validating its effectualness for fast and resource-efficient deployment.

Conclusion

Conclusion
In this paper, an LCCD-TCFEDRL methodology is proposed by using HI analysis. This paper aims to develop a successful diagnostic technique for LCC using advanced analytical methods to improve early detection and treatment outcomes. Initially, the GIF model is employed in the image pre-processing step to increase the quality of images by eliminating the noise. Additionally, the CoAtNet model is used for the feature extraction process. Finally, the BiTCN with AO is implemented for classification. The experimentation of the LCCD-TCFEDRL methodology is examined under the LCC HIs dataset. The comparison study of the LCCD-TCFEDRL methodology portrayed a superior accuracy value of 99.36% over existing models.

출처: PubMed Central (JATS). 라이선스는 원 publisher 정책을 따릅니다 — 인용 시 원문을 표기해 주세요.

🏷️ 같은 키워드 · 무료전문 — 이 논문 MeSH/keyword 기반

🟢 PMC 전문 열기