Automatically Expanding the Synonym Set of SNOMED CT …

[Pages:26]Automatically Expanding the Synonym Set of SNOMED CT

using Wikipedia

Daniel R. Schlegel, Chris Crowner, and Peter L. Elkin Department of Biomedical Informatics University at Buffalo

Our Use Case for SNOMED

? Language Understanding for:

? Decision support

? Models for clinical prediction rules, disorders, etc.

? Information retrieval

? Inclusion / Exclusion criteria ? "5 minute studies"


Research Problem

? Automatic understanding medical text relies on terminologies

? Coverage is both in primary terms, and synonyms ? SNOMED CT used very commonly

? Has about 400,000 terms, 230,000 synonyms ? -> More synonyms would be helpful

? In some subdomains of medicine, Wikipedia is known to:

? Have excellent coverage ? Have similar accuracy to curated web sources ? Already be used in practice

So, we'd like to leverage Wikipedia to enhance the synonym set of SNOMED CT.



? Wikipedia Redirects ? Na?ve Synonym Harvesting

? Initial Evaluation

? Problems + Refinements

? Wikipedia Categories

? Final Evaluation ? Conclusion


Wikipedia Redirects

? Wikipedia articles must have unique names ? There are "shadow" articles with no content which simply redirect to

another article

? Very often are synonyms of the article's title

? Example: Heart attack redirects to myocardial infarction


Simple Matching Strategy


Initial Evaluation

? 43,580 exact matches between SNOMED CT and Wikipedia

? 42,958 concepts had new synonyms from redirects ? Extracted 446,053 new synonyms

? Random sample of 100 matches, consisting of 988 new synonyms

? 407 synonyms (41.2%) were good ? 360 synonyms (36.4%) were related, but incorrect ? 221 synonyms (22.4%) completely unrelated

This isn't very good ? we need to do better.



? Understand the organization of Wikipedia better

? Category hierarchy

? Analyze initial evaluation results

? Classify matching errors ? Look for solutions

? Implement solutions and re-evaluate



In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download