Assessing the generalizability of temporally-coherent echocardiography video segmentation
Yida Chen ’22
Xiaoyan Zhang, Christopher M. Haggerty
Joshua Stough, Computer Science Department
Ciffolillo Healthcare Technology Inventors Program
Existing deep-learning methods achieve state-of-art segmentation of multiple heart substructures from 2D echocardiography videos, an important step in the diagnosis and management of cardiovascular disease. However, these methods generally perform frame-level segmentation, ignoring the temporal coherence in heart motion between frames, which is a useful signal in clinical protocols. In this study, we implement temporally consistent video segmentation, which has recently been shown to improve performance on the multi-structure annotated CAMUS dataset. We show that data augmentation further improves results, which are consistent with prior state-of-art works. Our 10-fold cross-validation shows that video segmentation improves the automatic comparison to clinical indices including smaller median absolute errors for left ventricular end-diastolic volume (6.4 ml), end-systolic volume (4.2 ml), and ejection fraction (EF) (3.5%). In segmenting key cardiac structures, video segmentation achieves mean Dice overlap of 0.93 on left ventricular endocardium, 0.95 on left ventricular epicardium, and 0.88 on left atrium. To assess clinical generalizability, we further apply the CAMUS-trained video segmentation models, without tuning, to a larger, recently published EchoNet-Dynamic clinical dataset. On 1274 patients in the test set, we obtain absolute errors of 6.3% ± 5.4 in EF, confirming the reliability of this scheme. In that the EchoNet-Dynamic videos contain limited annotation only for left ventricle endocardium, this effort extends at little cost generalizable, multi-structure video segmentation to a large clinical dataset.