FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark

Publication Year: 2021 Publication Type : JournalArticle


The automatic generation of long and coherent medical reports given medical images (e.g. Chest X-ray and Fundus Fluorescein Angiography (FFA)) has great potential to support clinical practice. Researchers have explored advanced methods from computer vision and natural language processing to incorporate medical domain knowledge for the generation of readable medical reports. However, existing medical report generation (MRG) benchmarks lack both explainable annotations and reliable evaluation tools, hindering the current research advances from two aspects: firstly, existing methods can only predict reports without accurate explanation, undermining the trustworthiness of the diagnostic methods; secondly, the comparison among the predicted reports from different MRG methods is unreliable using the evaluation metrics of natural-language generation (NLG). To address these issues, in this paper, we propose an explainable and reliable MRG benchmark based on FFA Images and Reports (FFA-IR). Specifically, FFA-IR is large, with 10,790 reports along with 1,048,584 FFA images from clinical practice; it includes explainable annotations, based on a schema of 46 categories of lesions; and it is bilingual, providing both English and Chinese reports for each case. Besides using the widely used NLG metrics, we propose a set of nine human evaluation criteria to evaluate the generated reports. We envision FFA-IR as a testbed for explainable and reliable medical report generation. We also hope that it can broadly accelerate medical imaging research and facilitate interaction between the fields of medical imaging, computer vision, and natural language processing.


@inproceedings{li2021ffa, title={FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark},
    author={Li, Mingjie and Cai, Wenjia and Liu, Rui and Weng, Yuetian and Zhao, Xiaoyun and Wang, Cong and Chen, Xin and Liu, Zhong and Pan, Caineng and Li, Mengke and others},
    booktitle={Thirty-fifth Conference on Neural Information Processing Systems Datasets and Benchmarks Track (Round 2)},


Related Publications

RUP: Large Room Utilisation Prediction with carbon dioxide sensor
Type : JournalArticle
Show More
A Scalable Room Occupancy Prediction with Transferable Time Series Decomposition of CO 2 Sensor Data
Type : JournalArticle
Show More
Topical Event Detection on Twitter
Type : ConferenceProceeding
Show More

© 2021 Flora Salim - CRUISE Research Group.