AI Rivals Expert Radiologists at Detecting Brain Hemorrhages

Richly Annotated Training Data Vastly Improves Deep Learning Algorithm’s Accuracy

By Laura Kurtzman

An algorithm developed by scientists at UC San Francisco and UC Berkeley did better than two out of four expert radiologists at finding tiny brain hemorrhages in head scans – an advance that one day may help doctors treat patients with traumatic brain injuries (TBI), strokes and aneurysms.

brain scans
A deep learning algorithm recognizes abnormal CT scans of the head in neurological emergencies in 1 second. The algorithm also classifies the pathological subtype of each abnormality: red - subarachnoid hemorrhage, purple - contusion, green - subdural hemorrhage.

The continued increase in diagnostic imaging studies, including 3D imaging studies such as computed tomography (CT), means that radiologists are looking at thousands of images each day, searching for tiny abnormalities that can signal life-threatening emergencies. The number of images from each brain scan can be so large that on a busy day, radiologists may opt to scroll through some large 3D stacks of images using mice with frictionless wheels, almost like viewing a movie. But it could be much more efficient – and potentially more accurate – if artificial intelligence technology could pick out the images with significant abnormalities, so radiologists could examine them more closely.

“We wanted something that was practical, and for this technology to be useful clinically, the accuracy level needs to be close to perfect,” said Esther Yuh, MD, PhD, associate professor of radiology at UCSF and co-corresponding author of the study, published the week of Oct. 21, 2019, in Proceedings of the National Academy of Sciences (PNAS). “The performance bar is high for this application, due to the potential consequences of a missed abnormality, and people won’t tolerate less than human performance or accuracy.”

The algorithm the team developed took just one second to determine whether an entire head scan contained any signs of hemorrhage. It also traced the detailed outlines of the abnormalities it found – demonstrating their location within the brain’s 3D structure. Some spots may be on the order of 100 pixels in size, in a 3D stack of images containing over a million of them, and even expert radiologists sometimes miss them, with potentially grave consequences. 

The algorithm found some small abnormalities that the experts missed. It also noted their location within the brain, and classified them according to subtype, information that physicians need to determine the best treatment. And the algorithm provided all of this information with an acceptable level of false positives – minimizing the amount of time that physicians would need to spend reviewing its results.

Yuh said one of the hardest things to achieve with the AI technology was the ability to determine whether an entire exam, consisting of a 3D “stack” of approximately 30 images, was normal. 

Portrait of UCSF's Esther Yuh
Esther Yuh, MD, PhD, co-corresponding author of the study.

“Achieving 95 percent accuracy on a single image, or even 99 percent, is not OK, because in a series of 30 images, you’ll make an incorrect call on one of every 2 or 3 scans,” she said. “To make this clinically useful, you have to get all 30 images correct – what we call exam level accuracy. If a computer is pointing out a lot of false positives, it will slow the radiologist down, and may lead to more errors.”

The radiology experts said the algorithm’s ability to find very small abnormalities and demonstrate their location in the brain was a substantial advance.

“The hemorrhage can be tiny and still be significant,” said Pratik Mukherjee, MD, PhD, professor of radiology at UCSF. “That’s what makes a radiologist’s job so hard, and that’s why these things occasionally get missed. If a patient has an aneurysm, and it’s starting to bleed, and you send them home, they can die."

Jitendra Malik, PhD, the Arthur J. Chick Professor of Electrical Engineering and Computer Sciences at Berkeley, said the key was choosing which data to feed into the model. The new study made use of a type of deep learning known as a fully convolutional neural network, or FCN, which trains algorithms on a relatively small number of images, in this case 4,396 CT exams. But the training images used by the researchers were packed with information, because each small abnormality was manually delineated at the pixel level. The richness of this data – along with other steps that prevented the model from misinterpreting random variations or “noise” as meaningful – created an extremely accurate algorithm.

The scientists could have chosen to feed an entire stack of images, or one complete image, all at once. Instead, they chose to feed only a portion or “patch” of an image at a time, contextualizing this image with the ones that directly preceded and followed it in the stack. Viewing an image in patches is also how people read text or look at a computer screen, and this enabled the network to learn from the relevant information in the data without “overfitting” the model by drawing conclusions based on insignificant variations that were also present in the data. They called their model PatchFCN.

“We took the approach of marking out every abnormality – that’s why we had much, much better data,” said Malik, a co-corresponding author of the study. “Then we made the best use possible of that data. That’s how we achieved success.”

A noted computer vision expert, Malik said he gets many more requests to collaborate on research than he can honor, but he agreed to work on the project because of its great potential to help patients.

The authors are now applying the algorithm to CT scans from trauma centers across the country that are enrolled in a research study led by UCSF’s Geoffrey Manley, MD, PhD, a professor of neurosurgery in the UCSF Department of Neurological Surgery and member of the UCSF Weill Institute for Neurosciences.

“Given the large number of people who suffer from traumatic brain injury every day and are rushed to the emergency department, this has very big clinical importance,” Malik said. “That convinced me to work on this problem.”

Authors: Weicheng Kuo, PhD, Christian Häne, PhD, and Jitendra Malik, PhD, of UC Berkeley; and Esther Yuh, MD, PhD, and Pratik Mukherjee, MD, PhD, of UCSF.

Funding: This study was supported in part by the California Initiative to Advance Precision Medicine (California Governor’s Office of Planning and Research). C.H. also received funding from Swiss National Science Foundation Early Postdoc. Mobility Fellowship 165245. Amazon Web Services provided computing time.

Disclosures: E.L.Y. and P.M. are named inventors on US Patent and Trademark Office No. 62/269, 778, “Interpretation and Quantification of Emergency Features on Head Computed Tomography” filed by the Regents of the University of California. W.K., C.H., P.M., J.M., and E.L.Y. currently have a provisional patent application titled, “Expert-Level Detection of Acute Intracranial Hemorrhage on Head CT scans” and a patent application under review titled, “Interpretation and Quantification of Emergency Features on Head Computed Tomography.”

The University of California, San Francisco (UCSF) is exclusively focused on the health sciences and is dedicated to promoting health worldwide through advanced biomedical research, graduate-level education in the life sciences and health professions, and excellence in patient care. UCSF Health, which serves as UCSF’s primary academic medical center, includes top-ranked specialty hospitals and other clinical programs, and has affiliations throughout the Bay Area.