In the last essay, I focused on the core concept that the data doesn’t speak for itself, and this flawed expectation leads organizations to expect a level of usefulness in the data that it cannot necessarily support. Here, I will discuss the mistaken belief that data-is-data-is-data; that there is an equivalence between all data points can lead an organization to expect a level of consistency that the data cannot necessarily support.
Saying that not all data is created equally won’t come as a surprise to many people reading this. But for those who are a bit perplexed by this sentiment, remember that every number reported on a pillar is the product of some act.
- It can be a very specific act, like when your bank adds up all the charges and payments to your credit card and applies the interest calculation. It is dependent upon certain rules, but it is consistent.
- It can be approximate. When you set your oven to 375, how do you know it is really 375 and simply fluctuating between 365-385? This is why cooks will tell you to stop opening your oven to check on your food, since it slows the cooking process.
- It can even be aspirational. Ask a child how old they are and they will say, “I’m seven… AND A HALF” or “I’m gonna be eleven!” This doesn’t just affect kids, either. Should we all pull out our drivers’ licenses and see if the self-reported weight and height is accurate?
Because there are differences between the origin of numbers does not mean that those numbers are useless. I am not saying that all knowledge is fleeting, truth is hopeless, and we live in a relativist hellscape where the loudest survive and we are all destined to die alone.1 It does, though, mean that on a pillar, when one number is red and another one is green, you should understand the nature of that number before you ascribe a lot of weight to its color or even start building an action plan to address it. Understanding the origin of a number is extremely important if you want to MOVE that number in a positive direction.
All of these concerns revolve around a core concept of accuracy. How confident are we that the number reported is reflective of reality? After all, the pillar is the genesis for many process improvement strategies, so knowing the accuracy of that number is important in determining what we need to do.
Some numbers are the product of a population, and some are the product of a sample. This is the most important of all distinctions, since it tells us if the score reported is an approximation or not, which in turn helps us determine how important any movement in that number is. Many of the numbers on a pillar (of course, depending on what your pillar contains) are hard-counts of actual things. For example, actual financial data, accounting for any outstanding accounts-payable and -receivable, counts as a population number. On the other hand, any survey data is based upon a sample. That is, HCAHPS data may seek to survey all patients who are discharged, not all discharges take the survey. So, the data approximates what the real population number is, plus or minus a margin of error. It is not to say that the number is WRONG, so much as it comes with wiggle room. So, if the survey data says that 75% of patients love us, plus or minus 3%, it means that, were we to poll the entire population we are 95% confident that the population number would be somewhere between 72% and 78%. The reason this is important is that I have seen senior leaders in organizations lose their minds over a two-percent shift in a month of patient experience data, when any statistician would tell you that this shift is well within the margin of error.2
The margin of error is part of another concern over false precision. In one of the Holiday Decoration essays, I spoke of the concern that a leader might scream someone down for poor performance, not appreciating that the 50% they scored was off of two surveys. This is an example of what I call false precision. The calculation is accurate, but it doesn’t satisfactorily explain the real world. If three patients out of seven give you the desired response, it is accurate to say that the percentage is 42.857143% but referencing it in that way implies an importance or accuracy that the number does not possess. This is obviously an exaggeration, but when there are small populations, or very few fall-outs, or both few fallouts in small populations, those percentages can be confusing. Adding decimal places does not make the number more valuable. It just seems that way.
Another concern is the complexity of the data’s source. Above, I mentioned that any pillar number is the product of an act. Sometimes, this is a simple count. Sometimes it is a percentage, which is based upon two counts, numerator and denominator. Sometimes it is a relative percentage or a difference of two percentages, which is the product of four counts. The more steps to a data’s recipe, the more hidden variability there is.
For example, a very popular number for organizations to track is the Net Promoter Score™.3 Organizations will ask a patient to rate their experience on an 11-point scale of 0-to-10. They will then break those responses into three groups: those who gave a 9 or 10, those who gave a 7 or 8, and those who gave a 6 or lower. The logic is that those who gave a 9 or 10 are happy and likely to PROMOTE you in the community. Those who gave you a 7 or 8 are neutral or passive and not likely to talk about you in the community at all. Those who gave you a 0-6 are likely saying NEGATIVE things about you in the community, if they are talking about you. So, the organization will subtract the percentage of 0-6 from the percentage of 9-10 and the resulting percentage is called the net promoter score. So, a hospital scores 78% of patients giving them a 9 or 10, 18% giving an 8 or 7, and 4% at the bottom. So, the organization’s NPS™ is 74%. Not only does this number carry with it one percentage subtracted from another, but it also carries with it a host of assumptions, mostly having to do with how numbers are categorized. I would always get the questions from staff “why isn’t 8 good enough?” or “Shouldn’t 6 and 5 be considered average” to which I would respond “take it up with Fred Reichheld.”
This complexity can be even more confusing when looking at Quality measure definitions. Some quality measures are easy to define. You fit in the age range and either you did or did not get a mammography in the past two years. You are over the target age and did or did not get a pneumonia vaccine. But many have more pieces to the puzzle. For example, consider a measure like optimal diabetes management. To be compliant, diabetic patients need to do several things, such maintaining an appropriate a1c number, being a non-smoker, taking a statin if needed, getting an eye exam, etc. Not meeting any ONE of these would mark a patient as non-compliant. So, a patient who is trying to manage their diabetes, but didn’t get an eye exam, is counted the same as another patient who doesn’t care and is failing at every element in the measure. Both are fall-outs, but their stories are different. This doesn’t mean that the data is wrong but counting that number as equivalent to a mammography or pneumonia vaccine seems obfuscatory verging on dishonest.
When these numbers are all reported side-by-side without paying attention to their varying definitions, people can be forgiven if they think all these nails require the same hammer. Further, because we have close fails (the diabetic patient who met four of five objectives or the patient who gives the hospital an ‘8’ on the Hospital Overall Rating question instead of a ‘9’) sometimes people will argue the definition. “Why did the target a1c value move from 8 down to 7 and now is 6.5 in some cases?” “What if a patient won’t get vaccines because of a deeply held personal belief?” “Why don’t 8s matter?” If you want to have a lively conversation, ask a nurse what they think of their organization’s definition of a patient fall. This does not mean that the data is garbage, but it does ask the question if it is reasonable to compare finance, service and quality numbers so blithely.
Finally, the least appreciated element in the act of generating the data is the human element. This is not to say that people are seeking to lie or deceive, but that human perception, bias and error can impact the resulting data. Workflows only work consistently if they are utilized consistently. Look no further than the hospital’s incident reporting software. Every hospital has a means for capturing nonstandard events, from good catches and near misses, to service issues, to clinical issues, both minor and major. Every time an organization does a global reeducation on the process of logging in their tool, they see a bump in the number of reported events. This is not evidence that things have gotten worse suddenly, but evidence that people were reminded that they need to document events. Or they realized that things that they didn’t think were worthy of documentation actually need to be documented. Human behavior and human error drive many the data points you review, but you often don’t realize it.
Some will say that this error in the data is consistent over time, so it washes itself out. The analogy is if a scale is off by ten pounds, it may not be accurate, but its consistent inaccuracy allows for that data to be meaningfully trended. This is, I suppose true, if all other things are equal. Except that all other things are never equal. Staff turnover, a change in management focus, a refresher training course, a change in how something is documented can all change the level and direction of the inaccuracies.
I will reiterate for the third or fourth time here, this essay is not challenging the use of data, or saying the data is inaccurate. I am simply calling out the fact that all this data sits side-by-side on a pillar or dashboard, and the leaders generally aren’t savvy enough to appreciate the subtle differences, let alone have the time to thoughtfully review these differences. Heck, many leaders may only really understand the data in their pillar piece and have no understanding about other numbers. They then can’t figure out why Quality can’t fix the hospital-acquired infection problem, or why Service can’t just be nicer.
You may not have the power in your organization to correct your senior leaders’ faulty understanding of the data, but in understanding these issues, you can at least ask the right questions about how to improve it. When all else fails, I only ask you to not perpetuate that same knee-jerk reaction as you spread the message to those around you.
1Though if you do believe this, my friend John Gnida would say, “Great. You finally accept that which I have been telling you for decades.”
2All of this can be exacerbated by a number of other factors, leading to margin of error numbers greater than 10%. All of this deserves more attention another day.
3To be clear, the Net Promoter Score (NPS) is a trademark owned by Bain & Company, NICE Systems, and Fred Reichheld. This means you can do the calculation, but you cannot call it an NPS without seeking consent from the trademark owners. Not sure how aggressively they enforce their trademark, though, since I see NPS all over the place and never see NPS™ anywhere. I will try to avoid legal exposure, though, by acknowledging the trademark holders.
Leave a comment