Test Insertion Procedures



Insertion Procedures

NCI Metathesaurus

June 18, 2009

Test Insertions

Real Insertions

|Test Insertions |

Purpose: To test the accuracy of the data when an insertion of a new/updated source is made into the NCI Metathesaurus.

For example, when a new version of the NCI Thesaurus is created, it is inserted into the NCI Metathesaurus on a test server. Because of the various errors that can occur with the automated process, it is necessary to spot check the data after the test insertion to note any problems and fix them before the data is inserted into the actual Metathesaurus.

Also, the test insertion allows us to see how much editing will be required when the insertion is completed. GOAL: get rid of extra/unnecessary editing. If the numbers look high, we can test the data to see if we can reduce editing by telling Apelon about possible “global fixes” for specific data. Or, if they look too low or off in any other way, that may be an indication that there was a problem with the insertion.

Key Personnel:

Brian Carlsen – Alameda (west)

Laura Roth – LH-MSD Project Manager (on site part-time at NCI)

Lori Whiteman – LH-MSD Deputy Project Manager (on site at NCI)

Carol Creech – LH-MSD (off-site, but good for backup questions)

Process:

An email is sent to the NCI-MEME list by Alameda staff indicating that the test insertion has been completed. The email will contain a series of rows reporting “rough” numbers for different types of records, e.g., demotions, replacements, merges, etc. The numbers are considered “rough” partly because the test database that the test insertion is put into will never match the real database 100% because of ongoing editing.

For NCI Thesaurus insertions:

1. Retired Kind concepts are stripped out (per Gilberto Fragoso – email 12/1/05 on NCI-MEME).

2. Laura and Carol can create checklists to review the insertion from various tables created in the EMS.

a. Full SYN (synonym) terms stay with the PT.

b. Make sure that the previous version appears in {}. For example, if the new insertion is NCI2005_09, make sure that the previous version, NCI2005_05 PT and SY appear in the list of atoms surrounded by {}.

c. Make sure RELs appear to go in the correct direction.

Steps:

1. Change the MID to “test” or whichever test database Alameda specifies (might be testsrc, testsw, etc.)

2. Run the Matrixinit script by clicking that link within the EMS (be sure you are in the test database). This helps to make sure the status of the concepts is correct. No report will appear, but running the script ensures the data is clean and in the right place. NEED TO DO EVERY TIME WE TEST.

a. NOTE: When using the updated EMS version 3.0, you need to use the “big” Matrixinit that is run from the MID Maintenance section of the Additional Online Tools Index. You can access that link from the main MEME Editing News page.

3. Run the ME partition script by clicking that link within the EMS. This ensures that concepts are in the correct bins (e.g., ME, QA, etc.) NEED TO DO EVERY TIME WE TEST.

4. Then begin creating checklists from all the various bins to ensure that the insertion was OK. BE SURE TO CHECK ALL BINS!!

a. For example, on PDQ check, Laura had me check the leftovers bin in ME, and also the nci_mrg (used to be nci_pt_mrg, but on EMS 3.0, it is nci_mrg) bin to make sure the insertion did not merge a lot of NCI together. Numbers should be low – we usually have less than 8-10 items in there (that are pending corrections in TDE…) More than that and there may be a problem with Alameda needing to run the proper script.

b. Also check deleted_cui bin to make sure that the numbers are low. That helps us check to make sure that no big sections of NCIT are missing (e.g., CTCAE data.)

5. If there are problems, email Laura first (until I get more experience) and she will email Brian Carlsen and cc the NCI-MEME list for resolution.

6. LOOK FOR:

a. Did the old version get “turned off”, i.e., appear in {}?

i. NOTE: May not see a lot of “old/previous” versions due to newer script for safe replacements that just updates the SAB, so we won’t necessarily see a lot of old version atoms, unless they were a slightly different string than the current atom.

b. Any bad merging? For example, things that stick out like the time country abbreviations were merged with genes?

c. Are the RELs in the right direction? (e.g., read them from the bottom up…from the REL to the concept)

d. Are “N” atoms “needs review” necessary? You might see only 1 “N” and see that they are all FULL_SYNs with same code, in which case it may not be necessary to edit. Therefore, we could ask Apelon to mark those as Reviewed.

e. Do the counts seem reasonable? Are there too many demotions?

f. SEE NLM Checklist (on paper) for more.

7. Once all problems are resolved, we can tell Apelon to insert for real. Then we begin our regular editing process.

TO FIND CHECKLISTS (e.g., created by Alameda for the 09D insertion…)

- Go to WMS

- Look for Checklists link on left

- Bins should be obvious, based on test insertion email notice.

|Real Insertions |

After Alameda inserts a new source update into the NCI Metathesaurus, whether it is NCI or another source, we must:

• Run Matrixinit from EMS page

o NOTE: When using the updated EMS version 3.0, you need to use the “big” Matrixinit that is run from the MID Maintenance section of the Additional Online Tools Index. You can access that link from the main MEME Editing News page.

• Run ME partition from EMS page

• Check ME bin counts compared to note from MEME insertion to make sure they look OK.

• Spot check various bins to make sure the content looks OK and there are not any weird instances of ‘why is this here’ or ‘why is this separated out’ or ‘this looks strange’.

o If you find something strange, make a note of it, including example concepts that show that strange item – and include that in your response to the MEME message (using the automated form for Real Insertion: In Progress)

• Go to the QA Bin

o Click Generate for the nci_mrg bin (previously used nci_pt_mrg). This checks for multiple NCI PTs. This checks to see if they followed the rule that concepts should not have NCI PTs merged. (There are some in the bin that will be there until NCI Thesaurus is fixed. These have concept notes so you know they are okay.)

• Take any notes of strange things or questions and reply to the MEME message database.

................
................

In order to avoid copyright disputes, this page is only a partial summary.

Google Online Preview   Download