Abstract:
Objective To explore the value of a three-dimensional convolutional neural network (3D-CNN) model based on ultrasound dynamic videos for the non-invasive diagnosis of liver fibrosis.
Methods Ultrasound videos and static images of liver fibrosis patients who visited the Seventh Affiliated Hospital of Sun Yat-sen University between December 13, 2023, and September 23, 2025 were retrospectively collected and were randomly divided into training, validation, and test sets at a 7:1.5:1.5 ratio. Diagnostic models for liver fibrosis staging were constructed based on static and dynamic ultrasound images using 2D convolutional neural networks (2D-CNN), 2D-CNN + long short-term memory networks (LSTM), and 3D-CNN models. These models were compared with commonly used non-invasive diagnostic indicators such as shear wave elastography (SWE), liver fibrosis-4 (FIB-4) index, and aspartate aminotransferase-to-platelet ratio index (APRI). Five-fold cross-validation was used for model selection and parameter tuning on the training set. The final performance of the model was evaluated by calculating the area under the receiver operating characteristic curve (AUC) and accuracy on an independent test set. The model’s interpretability and feature visualization were analyzed using Gradient-weighted Class Activation Mapping (Grad-CAM).
Results A total of 110 patients were included, with 77 in the training set, 16 in the validation set, and 17 in the test set. The AUC of the 3D-CNN model was 0.847, significantly higher than the 0.761 of the 2D-CNN+LSTM model and 0.753 for the 2D-CNN model. For significant liver fibrosis, the 3D-CNN model achieved an accuracy of 76.47%, precision of 78.03%, recall of 75.69%, and an F1 score of 75.71%. It attained a better balance between recall and precision, and its F1 score was higher than those of the other models and commonly used non-invasive diagnostic indicators. The training process of the 3D-CNN model demonstrated good convergence, with high training stability and generalization ability. Grad-CAM visualization showed that the 3D-CNN model focused intensely on the liver capsule region, which highly aligned with the region of interest in clinical diagnosis.
Conclusions The deep learning model based on 3D dynamic videos exhibits superior comprehensive performance for liver fibrosis diagnosis compared with 2D static image-related models and commonly used noninvasive diagnostic indicators such as SWE, FIB-4, and APRI, showing strong potential as a clinical diagnostic aid.