inv
top top2
arrow SIIM Home  arrow Contact Us
SIIM
 
Stay Connected!

 

Twitter

 

Twitter

 

LinkedIn

 

Facebook

 

Facebook

Wordpress

 
CFA 2010
 
Ride to SIIM
 

It's not too late! Your support of the SIIM Research & Education Fund through the 4th Annual "Ride to SIIM" will help fund the SIIM Grant Program and the Samuel J. Dwyer, III, PhD, FSIIM, Memorial Lecture.

Make a per-mile contribution to the SIIM Research & Education Fund today!

 
 
Gateway
 
 
Scientific Abstracts
invisible
Concept Enhanced Searching of Radiology Reports
 
Authors:

Jonathan Thirman, Yale University; Pat Mongkolwat, PhD; Alex Kogan; David Channin, MD

 
Background:

For the past 114 years, and until medical image interpretation moves to structured reporting with controlled terminologies, the majority of the description of image pixel meaning is captured as free text in diagnostic imaging reports. Natural language processing (NLP) of these radiology reports has been the subject of extensive research.

A common request from radiology faculty, absent the knowledge of the complexities of NLP, however, is for a simple radiology report search tool that can be used to identify cases for research and education purposes. This paper describes the creation of such a report searching tool using straight forward, open source, freely available software tools.

 
Evaluation:

We extracted 3.3 million free text radiology reports directly from the Sybase (Sybase Inc., Dublin, CA) database of our GE Centricty PACS (GE Healthcare Integrated IT Solutions, Barrington, IL) test system. To de-identify the reports, metadata headers and footers, added by the radiology information system (RIS; Cerner RadNet, Kansas City, KS), were removed. In addition, the remaining text report was searched for any of the patient's name, medical record number and social security number. Any instance of identifying information was replaced with masking characters. The final de-identified free text report was converted to HTML.

 

We obtained a copy of the Universal Medical Language System (UMLS; National Library of Medicine (NLM), Wahington DC) Metathesaurus and used their Metamorphosys configuration tool to create a subset of the UMLS that contained 260,975 concepts related to medical imaging. We then used the java version of the NLM's MetaMap program, MMTx, to map each radiology report to the concepts in the UMLS subset.[1] These concept mappings were placed into new metadata headers of the HTML text reports. After assigning medical concepts to each report, Lucene (lucene.apache.org) was used to index the reports for search.

 

The Apache Tomcat (Apache Software Foundation, http://tomcat.apache.org) server along with Java Server Pages (JSP) (Sun Microsystems, Mountain View, CA, (http://www.java.sun.com) is used to host the system.

 

Using this infrastructure, our system offers simple text searches of radiology reports, concept searches where UMLS concepts and/or free text may be searched, and an advanced search that allows filtering results by modality, age, sex, and procedure name.

 

In the concept search, the user can enter concept terms and the system dynamically looks them up in the Thesaurus using MMTx. In addition, once a concept is identified, the user interface can display the Thesaurus hierarchy where the concept is found to allow for the rapid selection of related concepts. In order to provide a fast response time for a concept search from a web page, the names of the concepts and the relationships between them were translated to C source files, which were compiled into a native Windows library.

 

The system return results the searches as lists of studies whose report matches the search criteria, ranked by relevancy of the reports to the search terms. The results include the modality, the procedure name, the age of the patient, the date of the study, and the accession number. The display of the results allows browsing of the de-identified reports, along with the mapped concepts for that report. The accession number can only be used manually by authorized individuals via secure, clinical, information systems to retrieve study specifics including patient demographics, images, and complete reports. The de-identified search results may be downloaded as comma separated values (CSV) files. Lastly, the system provides complete audit and logging of use, including the searches performed and the specific reports that were viewed.

 
Discussion:

Using the open source Lucene text search engine library, we have provided simple text searching of radiology reports and limited associated metadata for several years. Mapping the radiology reports to UMLS Metathesaurus concepts adds another level of sophistication to the search, and is a straightforward task using open source tools provided by the National Library of Medicine. The mapping process takes between 0.5 and 180 seconds per report, so mapping a large corpus such as ours took over 1 month of processing time. Using these tools, we mapped 3.3 million radiology reports to 260,975 concepts in a subset of the thesaurus. The maximum number of concepts for a single report was 39; the mean was 20.

This relatively simple search meets many of the needs of faculty, fellows, residents and students in everday research and educational activities. The downside to simple search and indexing is that advanced NLP functionality, such as text classification, entity recognition (including negation), and information extraction are absent.

 
Conclusion:

While not a replacement for sophisticated natural language processing, there are simple search and indexing tools that can be used by faculty to identify cases for research and education purposes. Such a system can provide an interim service until more complete natural language processing systems are more commonly available. Even after the inevitable transition to fully structured reporting, there large collections of radiology reports will still remain for which NLP and even simple searching and indexing will prolong their usefulness.

 
References:

1. Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001;17-21.