Image generated by “DALL-E AI”
Introduction
From Electronic Health Records (EHR) to claims processing, from clinical trials to research organizations, ICD-10 codes are the backbone of accurately cataloging health conditions, symptoms, and procedures. These codes, though incredibly useful, can also lead to frustrating inaccuracies, wasted time, poor patient health outcomes, and erroneous analysis if not properly validated. This is where the thrilling mission of data quality checks comes into play. This article explains key steps to improve data quality by validating ICD-10 codes.
What is an ICD-10 code?
The International Classification of Diseases (ICD) is currently in its 10th Revision and popularly known as ICD-10. It is a clinical cataloging system used by healthcare providers to classify and code all diagnoses, symptoms, and procedures recorded in conjunction with hospital care in the US. The ICD-10 medical coding system is chiefly designed by the World Health Organization (WHO) to catalog health conditions by categories of similar diseases under which more specific conditions are listed, thus mapping nuanced diseases to broader morbidities.
The US version of ICD-10, created by NCHS (National Center for Health Statistics) and CMS (Centers for Medicare & Medicaid Services), consists of two medical code sets:
- ICD-10-PCS (Procedural Coding System)
- ICD-10-CM (Clinical Modifications)
ICD-10-PCS is used in hospital inpatient settings to report inpatient procedures, while ICD-10-CM is used in virtually all other settings, providing diagnosis codes that represent conditions and diseases, related health problems, abnormal findings, signs and symptoms, injuries, external causes of injuries and diseases, and social circumstances.
Where are ICD -10 codes commonly used?
ICD-10 codes are primarily used in the following settings:
- Medical providers use Electronic Health Records (EHR) systems, where patient health information along with the corresponding ICD-10 codes is collected and stored digitally, enabling healthcare providers to access patient’s medical history and conditions quickly and accurately.
- Insurance claims depend on ICD-10 codes to determine the amount of reimbursement healthcare providers will receive. The codes also help insurance companies identify trends in healthcare and track the prevalence of certain diseases or conditions.
- Research organizations use ICD-10 codes in their studies to ensure consistent and accurate data about diseases, their causes, and effects.
- Pharmaceutical companies use ICD-10 codes to identify potential patients for clinical trials and to track outcomes in drug safety studies.
ICD-10-CM Code Structure
ICD-10-CM is a seven-character, alphanumeric code and is segmented in a standardized fashion, which means they break down into smaller bite-sized pieces. Not all ICD-10-CM codes include all seven-characters.
- The first three characters describe the general type of injury or disease. The first character is an alphabetical letter, and each letter represents a different group of diseases or relevant considerations. For instance:
– A & B represent Infectious and Parasitic Diseases which include conditions like HIV, malaria, and tuberculosis.
– E represents diseases related to Endocrine, Nutritional, Metabolic disorders.
– U is reserved for emergency code additions. This shows the flexibility of the system to accommodate new diseases or conditions that may emerge.
– V, W, X, and Y represent External Causes of Morbidity, indicating how and where a patient was injured.
– Z represents factors influencing health status and contact with health services. This can include codes for aftercare, ongoing care for long-term conditions, or social circumstances that may influence a patient’s health. - Characters 4–6 in the ICD-10 code provide additional information regarding the etiology (cause or origin), anatomical site (part of the body), and severity of a condition. Each of these characters adds more specificity to the diagnosis, allowing for precise classification and treatment.
- Character 7, the final 7th character is called an extension. It is used to provide data about the nature of the specific encounter. It’s not required for all codes and its usage can vary based on the category of codes.
– If the service is for an initial encounter, it is designated by the letter “A”.
– Services provided for a subsequent encounter are designated by the letter “D”.
– Treatment for a condition that arises directly as a result of a previous condition, also known as a sequela encounter, is designated by the letter “S”.
In EHR systems, ICD-10 codes that are longer than three characters will always have a decimal point after the third character. This separates the category of the code from the subcategory or specificity of the condition.
For example, a patient with superficial frostbite on the nose for the first time would be diagnosed with the ICD-10 code T33.02XA.
T indicates injury, poisoning and certain other consequences of external causes. 33 indicates superficial frostbite. 02 indicates that it affects the nose. This code could end here, but the provider included an extension. The X is used to complete the 6 characters, so the extension is in the correct place. A indicates that this is the initial encounter.
EHR Data & Claims Data
Research organizations often procure anonymized data from EHR aggregators (entities that collect EHR or claims data from various healthcare systems and put them together). These aggregators create larger anonymous databases for more diverse and extensive data. The quality of the data provided can sometimes be inconsistent since it’s collected from varied systems. The data may require cleaning and validation to ensure its accuracy and reliability, particularly if it is going to be used for research or analysis.
Data Quality Checks
Assessing the quality of that dataset in the claims or EHR database can be quite a challenging task due to the volume and the complexity of the data.
The following checks can be conducted to confirm the quality of the data set:
- Understanding the Data: The first step in the process involves getting a comprehensive understanding of the structure and content of the claims or EHR dataset. This includes knowing what each field represents and how the data is organized.
- Duplicate Checks: Duplicate records in the dataset can distort the data, leading to inaccurate analyses. Most data sets will have one or more unique keys along with a date column that can be used to check for duplicates.
Validation of the ICD-10 code format
Next check is to validate the ICD-10 codes structure and format that appear in the claims or EHR database. A simple rules engine to check validity can be built in three easy steps:
Define the Rules
Based on our understanding of the structure and format of the ICD-10 codes, we can define the following rules:
- Rule 1: The code should be between 3 to 7 characters long.
- Rule 2: The first character should be a letter, excluding ‘U’.
- Rule 3: The second character should be a number.
- Rule 4: The third character can be alphanumeric.
- Rule 5: The fourth to seventh characters, if present, can be alphanumeric.
Implement the Rules:
We can implement these rules using the programming language of our choice. For instance, in Python, we could use regular expressions to implement these rules or even use regular expressions in SQL.
Test the Rules
Once the rules engine is built, test it with a variety of ICD-10 codes, to ensure it works as expected and then apply to the given dataset to identify the invalid codes.
Date of Service & Code Existence
We need to ensure the codes are valid for the date of service. Codes are updated annually so it is important to use the correct version for the date of service. We also need to check if the code exists within the ICD-10 code set.
To do such a validation, we can build an ICD-10 database locally. This database can be established by downloading the official ICD-10 codes in a format of our choice from a public site and then integrating them into a database table. This table then acts as a reference point to cross-check the validity of the codes.
ICD-10 Database with list of codes
The ICD-10 codes can be referenced and downloaded from any of the following sites:
- The World Health Organization (WHO) is the source of the ICD-10 coding system and provides a searchable database on their website.
- The Centers for Disease Control and Prevention (CDC) provides a comprehensive ICD-10 code lookup tool, including the codes related to mortality on their website.
- The National Center for Health Statistics (NCHS) offers a downloadable version of the ICD-10 coding system.
- The CMS (Centers for Medicare & Medicaid Services) provides a complete list of the ICD-10 codes for the current fiscal year.
- Various medical coding and billing software providers offer searchable databases of ICD-10 codes as part of their services.
Also note that ICD-10 codes are updated every year, so it’s essential to ensure you’re always using the most current version.
Conclusion
In conclusion, the use of ICD-10 codes in healthcare is indispensable and the structure and validation of ICD-10 codes play a crucial role in modern healthcare and medical research. However, the presence of errors or inaccuracies in this data can lead to skewed conclusions and misguided strategies. Maintaining high data quality is a critical focus in research, involving rigorous data collection, validation, and cleaning processes, and robust analytical methods to minimize bias and error. By prioritizing data quality, researchers can ensure that their work yields reliable, accurate, and meaningful results.
Unravelling the World of ICD-10 was originally published in Walmart Global Tech Blog on Medium, where people are continuing the conversation by highlighting and responding to this story.