The integration of artificial intelligence (AI) algorithms into radiology practices is rapidly increasing, with many promising applications on the horizon. However, successfully adopting these technologies requires more than just implementation – it necessitates a robust strategy for monitoring the performance of AI models after they have been deployed in clinical workflows.

This was the focus of a recent webinar hosted by the Society for Imaging Informatics in Medicine (SIIM), titled “AI-Enabled Radiology Practices: Strategies for Effective Post-Production Monitoring of AI Algorithms.” The webinar featured insightful presentations from experts at leading healthcare institutions, who shared their approaches and lessons learned in monitoring AI algorithms in clinical practice.

Setting the Stage: An Overview of AI in Radiology

To provide context, Dr. Timothy Kline, Assistant Professor of Radiology at Mayo Clinic, and the webinar’s moderator, outlined the current landscape of AI applications in radiology. Many of the algorithms currently in use target what he termed “low-hanging fruit” tasks, such as worklist prioritization, image denoising, object detection, and segmentation.

However, the field is rapidly progressing toward more ambitious “moonshot” applications, including differentiating benign from malignant tumors, predicting tumor subtypes, early disease detection, and personalized treatment planning – all based on radiological images.

Integrating these tools into clinical practice requires careful consideration of how the AI will function – as a first or second reader, for triaging and pre-screening, or potentially even in a replacement mode. However, Dr. Kline acknowledged that removing human radiologists from the loop entirely is still a distant prospect.

At Mayo Clinic, Dr. Kline highlighted the importance of a collaborative approach involving radiologists, data scientists, technologists, educators, and business analysts to ensure seamless integration and provide high-quality metrics for monitoring algorithm performance.

Post-Production Monitoring: A Governance Perspective

Amina Elahi, Information Systems Application Manager at Penn Medicine, emphasized the critical role of governance in ensuring the success and sustainability of an AI program. Penn Medicine’s diverse AI governance committee oversees every aspect of the process, from approving product demonstrations to monitoring performance after deployment.

Elahi shared an example of Penn Medicine’s AI-enabled pathway for analyzing and diagnosing liver steatosis (fatty liver disease) from abdominal CT scans. This end-to-end approach begins with an HL7 message sent from the electronic health record (EHR) to the vendor-neutral archive (VNA), which then instructs the AI server to process the images. The AI server generates image overlays, summary statistics, and structured reports, which are seamlessly integrated into the radiologist’s reporting workflow.

Continuous feedback from radiologists and referring clinicians is vital for monitoring and improving the algorithm’s performance. Elahi highlighted the importance of establishing a rubric for evaluating the algorithm’s reliability, accuracy, bias, usability, and clinical utility during initial blind reader studies.

Monitoring Performance and Data Drift

Matheus Ferreira, Lead Data Scientist and MLOps Manager at Dasa, presented his team’s approach to monitoring their in-house MRI denoising model, called “Ace.” This model takes low-quality, low-next (number of excitations) MRI images as input and generates synthetic, high-quality counterparts, enabling faster scan times without compromising image quality.

Ferreira showcased a comprehensive monitoring dashboard that provides insights into various aspects of the algorithm’s performance. The “Inference Overview” section displays metadata about incoming studies, such as MRI manufacturers, machine types, and institutions, helping identify potential causes of underperformance.

Monitoring for data drift is a critical component of Ferreira’s team’s strategy. They visualize the mean, median, and standard deviation of input image voxel values over time, as significant shifts in these metrics could indicate changes in the scanning environment that may impact model performance.

To monitor model drift, Ferreira’s team employs similarity metrics between the input and output images. If the output is too similar to the input, it may indicate insufficient noise removal, while excessive dissimilarity could signify the removal of important information from the image.

Real-time notifications and alerts are also integrated into the monitoring process, ensuring prompt intervention when issues arise.

User Feedback and Clinical Validation

Dr. Walter Wiggins, a neuroradiologist at Greensboro Radiology, emphasized the importance of thorough real-world clinical validation of AI tools to ensure they perform as intended for the diverse patient populations encountered in practice. FDA clearance, while necessary, is not sufficient – Wiggins’ team has declined to use some FDA-cleared tools due to unexpectedly poor real-world performance.

Wiggins shared examples of how AI algorithms can enhance radiologists’ detection capabilities, such as identifying subtle pulmonary emboli on complex scans with distracting pathologies or detecting incidental findings that may have been overlooked during focused examinations.

Radiologist training and feedback are crucial components of Wiggins’ team’s monitoring strategy. Rather than integrating AI outputs directly into the PACS, they use vendor-supplied widgets to capture user interactions and feedback, which can inform improvements to education materials and algorithm refinements.

Defining what constitutes an “auditable result” is essential – should feedback be collected on true positives, false positives, false negatives, or all the above? Wiggins’ team has opted to solicit feedback on true positives as well, especially in cases where the radiologist might not have detected the finding without AI assistance.

Leveraging Natural Language Processing for Monitoring

One innovative approach Wiggins’ team is exploring is monitoring algorithm performance through natural language processing (NLP) of radiology reports. By comparing the radiologist’s impression (treated as a soft label or ground truth) with the algorithm’s prediction, cases where there is disagreement can be flagged for adjudication by the AI innovation team.

Wiggins demonstrated how a private instance of ChatGPT could be used to extract structured findings from radiology reports, facilitating this comparison. The team is also investigating using similar NLP techniques to detect hallucinations or inconsistencies in AI-assisted reporting tools.

Key Takeaways and Best Practices

The webinar presentations highlighted several key best practices for effective post-production monitoring of AI algorithms in radiology:

  1. Establish a strong governance structure: A diverse committee overseeing all aspects of the AI program, from product evaluation to monitoring, is essential for ensuring success and sustainability.
  2. Conduct thorough clinical validation: Real-world performance should be evaluated on representative data from the practice, as FDA clearance alone may not guarantee optimal performance.
  3. Implement robust monitoring dashboards: Comprehensive dashboards should track various performance metrics, data drift indicators, and user feedback over time, enabling timely intervention when issues arise.
  4. Leverage user feedback: Radiologists are the “canaries in the coal mine” – their feedback on algorithm performance, usability, and clinical utility is invaluable for refining algorithms and education materials.
  5. Explore innovative monitoring techniques: Methods like NLP of radiology reports and similarity metric analysis can provide valuable insights into algorithm performance and data drift.
  6. Foster collaboration: Effective monitoring requires close collaboration between radiologists, data scientists, IT professionals, and other stakeholders, leveraging their diverse expertise.

As AI continues to advance in radiology, post-production monitoring will become increasingly crucial to ensure these powerful tools deliver on their promise of improving patient care while maintaining safety and efficacy. The strategies and lessons shared in this webinar provide a valuable roadmap for radiology practices seeking to implement robust monitoring processes for their AI algorithms.

Written by

Publish date

Apr 1, 2024

Topic

  • Artificial Intelligence
  • Pathology

Media Type

  • Blog

Audience Type

  • Clinician
  • Developer
  • Imaging IT
  • Researcher/Scientist
  • Student Member in Training (SMIT)
  • Vendor

podcast

My Informatics Journey with Dr. Marc Kohli

Apr 9, 2025

In this episode of SIIMCast, we chat with Dr. Marc Kohli, professor of radiology and medical director of imaging informatics…

podcast

Portland Deep Dive: SIIM 2025, Donuts, and Data

Mar 18, 2025

SIIMCast takes a detour from its usual deep dives into imaging informatics to explore the quirky, vibrant city of Portland,…

post

Finding Our Future: Reflections on SIIM’s Past, Present, and Future

Mar 14, 2025

Nabile M. Safdar, MD, MPH, FSIIM

In 2005, many of my colleagues and I were still debating whether the new generation of flat-panel monitors could match…