Quality Care Measurement: Understanding Statistical Significance

February 22, 2023

The amount of healthcare data is growing at an astounding rate. Statistical significance is used by quality improvement administrators to determine whether differences in quality are likely to occur by chance. This data presents a valuable opportunity for healthcare organizations.

Using healthcare data to build better quality improvement programs

Health care organizations use data to improve patient care, reduce costs, predict, and prevent medical issues, and improve patient-clinician relationships. To fairly assess quality performance, statistical significance must be understood.

Statistical significance refers to the level of certainty that deviations in quality are meaningful, beyond normal fluctuations in the data. A statistical significance test is used by quality administrators to determine whether observed outcomes differ significantly from those predicted by risk-adjustment models. Using risk adjustment, quality benchmarks are tailored based on the clinical and demographic characteristics of the patient population evaluated.

Analyzing healthcare data can identify meaningful deviations in outcomes such as mortality, readmissions, complications, and length of stay by comparing observed and expected outcomes. It is crucial that risk-adjusted benchmarks are applied in conjunction with a measure of statistical significance in a robust quality improvement program. In the absence of statistical significance, individual interpretations are used to determine when deviations in performance are meaningful. Statistical significance is a formalized industry-accepted method for quantifying these differences. To better measure statistical significance, healthcare organizations are increasingly turning to artificial intelligence (AI) solutions.

Introducing the hypothesis

Scientific experiments should be conducted using hypothesis testing to determine whether the proportions or averages of observed outcomes differ from what was expected.

During hypothesis testing, the null hypothesis and alternate hypothesis are framed as separate parts of the research question. It isn’t enough to say it is known that the observed and expected outcomes differ, but with analysis it can be said with a degree of certainty that the values are different. The null hypothesis states that observed performance does not differ from expected performance. Null hypothesis will always assume that there is no difference between observed and expected performance. In the alternative hypothesis, there is a difference between observed and expected performance.

The goal is to determine if there is sufficient evidence to reject a null hypothesis, with some level of confidence, and accept our alternate hypothesis or our true research question that observed and expected values are different.

Calculating confidence

It is critical to set a level for rejecting a null hypothesis. This threshold is generally set at 95% confidence level, or 5% significance level.

Confidence level indicates the probability that our null hypothesis is rejected by chance. A Confidence level of 95% indicates a 5% chance that an observed versus expected performance is meaningfully different and not a false positive. The researcher, or quality administrator, is responsible for determining the acceptable level of error for their organization for a hypothesis test. By implementing AI solutions identifying where to focus performance improvement efforts without sifting through data where variation is not significant, and to find areas of opportunity is expedited. Organizations may choose to regularly monitor care to ensure systematic variation is not increasing.

Test Statistics and P-Values

Once the acceptable confidence level is set, a test is conducted to determine if there is sufficient evidence to reject a null hypothesis. A variety of statistical tests are available to answer this question depending on the conditions of the data and the nature of the comparison. In a two-tailed test, a proportion or average of observed versus expected outcomes is compared for positive or negative performance variation.

Null hypothesis acceptance or rejection

To reject the null hypothesis based on a test, the p-value produced must be less than our acceptable accuracy level (5% with 95% confidence). If our p-value is less than our significance level, it is possible to reject our null hypothesis that observed and expected values are identical and conclude there is a statistically significant difference between the observed and expected performance.

Both large and small data sets or populations are used to measure statistical significance. Smaller populations have greater variation and sampling error, and significance is especially important when evaluating these smaller populations. The statistical test not only adjusts for population size but also for standard deviation of the expected values. As the size of the evaluated population increases, the difference between observed and expected values is more likely to be significant.

Improving decisions

Flawed data and analysis often lead to poor decisions. To determine statistical significance, hypothesis testing is used in a systematic and data-driven method. The hypothesis testing process defines a clearly articulated hypothesis statement, or research question, to determine a confidence level, calculate a test statistic, and determine whether the test statistic obtained exceeds the confidence level. Only then accurate decisions can be made regarding the significance of the difference between observed and expected performance.

If you need more insights into quality care measures and data sets, we have a team of specialists and resources to help guide your endeavors. Explore Virtual OfficeWare Healthcare Solutions to learn more. For questions, please contact us at 412.424.2260 | info@vowhs.com.