Strategic Matching -- Tech Note #9

Case Selection Versus Case Classification

Applies to: CODES 2000 Version 2.1+, LinkSolv Version 2.2+.
Last updated: Thursday October 21, 2004.

SUMMARY

One or both of your tables to be linked may include fields that identify (almost all) of the records you expect to link.  For example, suppose that you are linking police motor vehicle crash reports to EMS ambulance run reports.  The EMS records might include a field such as mechanism of injury that can be used to identify motor vehicle crash victims.  You can use this information in two different ways.  First, you can select cases for linkage based on the information.  This approach can reduce the size of the tables to be linked, consequently reducing the time to run each match pass and the amount of information required for a high quality match.  

A second approach is to keep all cases, use the information to set a flag that identifies those cases you expect to link, and use this flag as a match variable.  You may have to create a flag in one of the tables that is always "Y".  This will increase match weights for candidate record pairs with flag agreements while allowing for the possibility of a match based on other evidence for those pairs with flag disagreements.  The latter approach can reduce the number of false negatives caused by incorrect or missing information in the fields used for case selection.  It is nearly equivalent to the former approach for the cases set to "Y". 

PROCEDURE

First, create a crash flag field named Crash in the Crash table that is equal to "Y" for all records.  You can do this in your SelectCrash query by including a column in design view like this:

    Crash:"Y"

Second, create a crash flag field named Crash in the EMS or Hospital table that is equal to "Y", "N", or Null for all records.  You can do this with the Access IIf function for simple cases.  For example, suppose that the EMS table includes a field for mechanism of injury named MECHAN, where MECHAN=1 denotes a motor vehicle crash victim.  Then you can create a crash flag field in your SelectEMS query by including a column in design view like this:

    Crash:IIf(MECHAN=1,"Y","N")

You may want to create a Visual Basic function for more complicated cases.

Third, create standard versions of Crash and EMS.  Copy both flag fields "As Is".

Fourth, create frequency tables for all standard Crash and EMS fields.

Fifth, replace the frequency table for the Crash field in the Crash database with the frequency table for the Crash field in the EMS or Hospital database.  Note that the field may be called Crash1 after standardization.  This step is necessary because the current releases of CODES 2000 and LinkSolv assumes that frequency distributions for linked records are equal to frequency distributions for records in Table B (see Tech Note #4).  This is a reasonable assumption if you select only motor vehicle crash victims for Table B but not if you take all cases reported by EMS or Hospitals.

Open the Crash database in your project folder.  Select Tables in the database window.  Rename the table FreqTblCrashCrash as OldFreqTblCrashCrash.  Import the table FreqTblEMSCrash from the EMS database in your project folder.  Or, import the table FreqTblHospitalCrash from the Hospital database.  Rename the imported table as FreqTblCrashCrash. 

Fourth, include the field Crash as a match field in every match pass.  Note that the field may be called Crash1 after standardization.

EXAMPLE

Here are the results of linking Crash to EMS using Injured and Crash flags.  The flags were used only for case classification, not case selection.  Only appropriate Crash records had Injured = "Y" while all EMS records had Injured = "Y".  All Crash records had Crash = "Y" while only appropriate EMS records had Crash = "Y".

You can see that a significant number of linked pairs were found for all possible combinations of flag values, even when both flags were "N" (92 cases, or 2.6% of all linked pairs).  Only 2,110 cases (60.4%) had both flags equal to "Y".  All of the other cases would have become false negatives had both flags been used for case selection.

ANALYSIS

Suppose that the Crash table contains 100,000 records and the EMS table contains 40,000 records.  Also suppose that 10,000 of the EMS records (25%) have MECHAN=1 and we expect that all of them should match to Crash records.

The first approach is to use the Crash flag for case selection.  Then the prior odds that a random pair of records is a true match using our standard formula are:

Matched / Unmatched = 10,000 / (100,000 X 10,000 - 10,000) = 1 / 99,999 

We want posterior odds for a match given observed agreements and disagreements to be at least 9 / 1 (90% probability), so the minimum acceptable odds ratio is

Posterior Odds / Prior Odds = (9 / 1) / (1 / 99,999) = 899,991

The minimum acceptable match weight (cutoff weight) is

log base 2 (899,991) = 19.78

The second approach is the use the Crash flag for matching.  Then the prior odds that a random pair of records is a true match are:

Matched / Unmatched = 10,000 / (100,000 X 40,000 - 10,000) = 1 / 399,999 

Now the minimum acceptable odds ratio is

Posterior Odds / Prior Odds = (9 / 1) / (1 / 399,999) = 3,599,991

The minimum acceptable match weight (cutoff weight) is

log base 2 (3,599,991) = 21.78

If the EMS Crash field value "Y" is fairly reliable, say 0.01 error probability, then its agree weight for "Y" is approximately

log base 2 (0.99 / 0.25) = 1.99

The agree weight from this new match field almost exactly offsets the increase in cutoff weight for a 9 to 1 match due to using the complete EMS table instead of selecting just the motor vehicle cases.

The disagree weight for the Crash field is approximately

log base 2 (0.01 / 0.75) = -6.23

 

 
© Copyright 2000-2008 Strategic Matching, Inc. LinkSolv is a trademark of Strategic Matching, Inc.  Microsoft, Windows, and Access are trademarks of Microsoft Corporation. Last modified: Tuesday July 22, 2008.