基于超声动态视频的三维卷积神经网络模型诊断肝纤维化

蒙文仪; 李任杰; 何丹妮; 毛木翼; 黎文涛; 徐作峰

doi:10.12464/j.issn.0253-9802.2026-0255

摘要:

目的探讨基于超声动态视频的三维卷积神经网络（3D-CNN）模型在肝纤维化无创诊断中的价值。

方法回顾性收集2023年12月13日至2025年9月23日在中山大学附属第七医院就诊的肝纤维化患者的超声视频及静态图像，将其按7∶1.5∶1.5随机划分为训练集、验证集和测试集。分别基于超声静、动态图像，采用二维卷积神经网络（2D-CNN）、2D-CNN+长短期记忆网络（LSTM）和3D-CNN构建肝纤维化分期诊断模型，并与剪切波弹性成像（SWE）、肝纤维化4项指数（FIB-4）、天冬氨酸氨基转移酶和血小板比率指数（APRI）等常用无创诊断指标比较。采用五折交叉验证法对训练集进行模型选择与参数调优，并在独立测试集上计算受试者操作特征曲线下面积（AUC）、准确率等指标的平均值，以评估模型的最终性能，并通过梯度加权类激活映射（Grad-CAM）进行模型可解释性与特征可视化分析。

结果共纳入110例患者，其中训练集77例、验证集16例、测试集17例。3D-CNN模型的AUC为0.847，高于2D-CNN+LSTM模型的0.761及2D-CNN模型的0.753。3D-CNN模型识别显著性肝纤维化的准确率为76.47%、精确率为78.03%、召回率为75.69%、F1分数为75.71%，在召回率与精确率之间实现了更佳平衡，F1分数均优于其他模型与常用无创诊断指标。3D-CNN模型的训练过程收敛性良好，具备较高的训练稳定性与泛化能力。Grad-CAM显示3D-CNN高度集中关注肝包膜区域，与临床诊断关注区域高度重合。

结论基于3D动态视频的深度学习模型在肝纤维化诊断中综合效能优于2D静态图像相关模型及SWE、FIB-4、APRI等常用无创诊断指标，具备较好的临床辅助诊断潜力。

Abstract:

Objective To explore the value of a three-dimensional convolutional neural network (3D-CNN) model based on ultrasound dynamic videos for the non-invasive diagnosis of liver fibrosis.

Methods Ultrasound videos and static images of liver fibrosis patients who visited the Seventh Affiliated Hospital of Sun Yat-sen University between December 13, 2023, and September 23, 2025 were retrospectively collected and were randomly divided into training, validation, and test sets at a 7:1.5:1.5 ratio. Diagnostic models for liver fibrosis staging were constructed based on static and dynamic ultrasound images using 2D convolutional neural networks (2D-CNN), 2D-CNN + long short-term memory networks (LSTM), and 3D-CNN models. These models were compared with commonly used non-invasive diagnostic indicators such as shear wave elastography (SWE), liver fibrosis-4 (FIB-4) index, and aspartate aminotransferase-to-platelet ratio index (APRI). Five-fold cross-validation was used for model selection and parameter tuning on the training set. The final performance of the model was evaluated by calculating the area under the receiver operating characteristic curve (AUC) and accuracy on an independent test set. The model’s interpretability and feature visualization were analyzed using Gradient-weighted Class Activation Mapping (Grad-CAM).

Results A total of 110 patients were included, with 77 in the training set, 16 in the validation set, and 17 in the test set. The AUC of the 3D-CNN model was 0.847, significantly higher than the 0.761 of the 2D-CNN+LSTM model and 0.753 for the 2D-CNN model. For significant liver fibrosis, the 3D-CNN model achieved an accuracy of 76.47%, precision of 78.03%, recall of 75.69%, and an F1 score of 75.71%. It attained a better balance between recall and precision, and its F1 score was higher than those of the other models and commonly used non-invasive diagnostic indicators. The training process of the 3D-CNN model demonstrated good convergence, with high training stability and generalization ability. Grad-CAM visualization showed that the 3D-CNN model focused intensely on the liver capsule region, which highly aligned with the region of interest in clinical diagnosis.

Conclusions The deep learning model based on 3D dynamic videos exhibits superior comprehensive performance for liver fibrosis diagnosis compared with 2D static image-related models and commonly used noninvasive diagnostic indicators such as SWE, FIB-4, and APRI, showing strong potential as a clinical diagnostic aid.

基于超声动态视频的三维卷积神经网络模型诊断肝纤维化

Diagnosis of liver fibrosis using a 3D CNN model based on ultrasound dynamic videos