TUPE653 - Poster Exhibition
Solving the problem of different individuals sharing a unique identifier in the Ryan White HIV/AIDS program client-level data set by developing an algorithm to calculate 'match scores' for record pairs
D. Isenberg1, G. Fant1, J. Milberg1, M. O'Brien-Strain2, W. Addison2, E. Coombs2
1HRSA, HIV/ AIDS Bureau, Rockville, United States, 2Mission Analytics Group, Inc., San Francisco, United States
Background: We developed a de-duplication algorithm to predict the likelihood that two records sharing the same unique identifier, in fact, belong to the same person. A unique identifier that did not include any personally identifiable information was developed to support the collection of care and treatment data for Ryan White HIV/AIDS Program clients. After deployment, the unique identifier still resulted in different individuals sharing the same identifier.
Methods: The proposed algorithm used the Fellegi-Sunter (F-S) model to calculate match scores for record pairs that share the same unique identifier based on matches in data elements from the record pairs. This methodology involved selecting which data elements to include in scoring, utilizing estimates for mk and uk terms, and setting a threshold for “match score.” Four alternative approaches for implementing the algorithm were considered. The algorithms were built using SAS programming code and implemented on SPSS file provided by HAB that contained Ryan White HIV/AIDS Program Services Report (RSR) data on Ryan White clients submitted by providers.
Results: Using unpublished RSR data from 2009, Ryan White providers submitted 717,927 client records; of this number, 386,507 records were not attached to a uniquely identified individual. When the de-duplication algorithm was applied, the percentage of record pairs accepted as a “match” ranged from 53.0% to 72.6%.
Conclusions: One alternative approach to de-duplication algorithm showed promise for increasing the percentage of record pairs accepted as a “match” after adjusting the false positive and false negative rates such that the total contribution of these components would not exceed 9%. HRSA's HIV/AIDS Bureau (HAB) decided to adopt this de-duplication algorithm, where appropriate, in the preparation of RSR data for analysis. This process identifies an important consideration for developing unduplicated client-counts for PLWHA as the basis for public health decision-making.
Download the e-Poster
Back to the Programme-at-a-Glance