Real World Implications of Statistical Concepts, 4: Bias in Self-Surveying

I remember being shocked when I first realized exactly how much surveying healthcare does. This might sound odd, given that I worked for a survey vendor, because doing surveys on patients and employees of healthcare was exactly what we did. But the more I worked with hospitals, the more I learned that what surveying a contracted vendor would do was often just the tip of the iceberg. In fact, back when I was working at a patient experience vendor, even if a client was utilizing all our products, we were not doing that much outreach.¹ But when I was on-site with clients, I was surprised how much data-collection was happening. Whether it was surveying targeted to specific patient populations, surveying done on fellow employees or internal validation work designed to measure compliance, it felt like everyone was running an instrument to evaluate performance. I was also surprised by how little oversight that these home-brew surveys had.

You might think that this horrified me and it did, though perhaps not for the reasons you might imagine. Discovering how many times a hospital was putting themselves and their patients under the magnifying glass was surprising. But what really surprised me was how little self-reflection was going into this research. It was not the work itself, though if you read my essays on transactional relationships, you know my broad concern with oversampling patients. It was that, without understanding let alone addressing the biases associated with this self-surveying, how much of this data was useless.

Now don’t get me wrong. I know that all these surveys could not easily be farmed-out to an independent research firm. Some cover experiences that the big firms are not equipped to survey. And in some cases, the area needing the survey would manage this individually because they didn’t know any other way. They would not even reach out to the hospital’s own PX team, marketing team, or analytics team for help or oversight.

Some departments see surveying more as a means to an end, rather than an end in-itself. Some departments survey simply to maintain accreditation. For example, lab services can get accredited through a few different agencies. Last time I checked all but one accrediting agency required some sort of satisfaction measurement as part of their process. The language is often vague—no standard question lists, cadence or target audience. So even within one health system, each individual lab, satisfying its own individual accrediting body, would conduct a survey to essentially check a box. Since this work is done infrequently and usually focused on a specific department, the data was useless for benchmarking or for collaboration even across a single hospital system.

Further, many of these departments do not have the resources to hire an outside vendor. Most research firms will establish flat pricing for surveys, where you pay a set amount for every targeted area. The research firm will then turn around and do census sampling across that entire patient population. For large patient populations this can be extremely attractive and cost-effective.² But the same math that can make this attractive to large populations also prices out small populations. I once worked with a series of Skilled Nursing Facilities (SNF) and the flat fee for census sampling of their entire SNF patients was greater than most of the SNF’s yearly operating margin. In other words, the act of surveying their patients would send them from black to red on the balance sheet.

In other cases, the desire is to just create a short-term validation tool. An organization has started an initiative, and they just need a way to evaluate whether there is compliance with that measure. For example, I am not sure that there has ever been a hand-hygiene initiative that did NOT include some sort of log or documentation of compliance. These generally run for a couple of months to verify people are doing what they need to do, though it is also true that these surveys can take on a life of their own. Senior leader rounding is another popular initiative that generates a log. These, though, generally suffer a different fate—they are surveys that never die and instead get loaded with all sorts of different side-quests, like measuring whiteboard validation or hourly rounding or collaborative rounding. These surveys suffer from urban sprawl, where there is so much ancillary stuff that the core of the survey gets lost. But more on that on a different day.

There is a rabbit hole³ I could descend into regarding how so many of these surveys do not do what you want them to do or are constructed poorly and don’t give you what you want to get or are not really telling you what you want to know. I will skirt this for the moment and instead focus on one of the biggest issues that is rarely discussed in this universe of home-brews. It is the bias associated with self-documentation.

The Hawthorne Effect

The Hawthorne Effect is the concept that the very act of measurement can change behavior.⁴ That is, if I know my behavior is being monitored, I am likely to do everything according to the textbook standard. I will strive to be the best version of myself that I can be. So, if a nursing team is told that there will be an audit during their shift checking on whiteboard completion, it is likely that every nurse will make sure their whiteboards are updated and complete. Not that they don’t already, but you have stated that this is a measurement priority today, you have elevated it in their consciousness. Plus, no one wants to have their manager say, “I told you we were going to be reviewing this, and you STILL screwed it up!” Even if you don’t announce the review in advance, it does not take long before nurses notice people with clipboards and the word gets spread.

Once, I worked with an outpatient lab that was doing their own hand-out/mail-in home-brew survey with their patients to measure must-haves for accreditation. To their credit, in order to avoid constant resurveying of frequent flyers and to avoid having to data-enter thousands of surveys every week, the lab decided to only distribute surveys at certain times of certain days. So, it was not surprising that from 10-11 on Mondays, from 12-1 on Wednesdays and from 2-3 on Thursdays, the staff was more attentive to validating patient names, labeling lab samples in the patient’s view, and discussing when lab results would become available. How do I know this? Because when I had a chance, I observed behavior at the lab and compared their performance during the survey window and outside the survey window. While it was not night/day difference, I did notice that staff were much better at narrating their behaviors by saying, “I am labeling this specimen in your presence to assure accuracy” or, “John, I know we see you here for lab work every week, but just to be formal would you please state your name and birthdate?” when the stack of surveys was next to the other paperwork. It may have simply been the fact that the survey’s presence reminded them of the best practice behavior, and not some Machiavellian plan to skew the data. That doesn’t matter. What matters is that the behavior was not consistent.

Now, if this focus on formality would breed muscle-memory and over the long run get staff to be more aligned with the best-practice ideals, we would gladly accept this Hawthorne effect, but too often, this focus on best-practice behaviors is only temporary. Indeed, by shining a light on the measurement windows, the staff can be relieved when the window closes. “Shoot! I forgot to tell them when the lab results would drop. Thank goodness it is after 3pm.”

The Sorry/Not Sorry Omission

The behaviors associated with the Hawthorne Effect are at least positive. While there may be no research showing that it can have lasting impacts on behavior, at least the behaviors themselves are aligned with best-practice. But what if there is a different way to sculpt the survey outcomes? In the lab survey protocol I just mentioned, the idea was that every patient who had lab work done in the measurement window would get a survey. The lab tech was supposed to hand the survey to the patient, with a postage-paid envelope and encourage them to provide their feedback.

But did everyone get a survey who was supposed to? No. And, as you might have guessed, it may have been at least at times intentional. When I was observing lab behavior, I noticed a distinct difference in how surveys were distributed. Some patients would get the survey with a “We really would like to hear how we did today, so please feel out this survey,” while others would get the survey, but no comment encouraging participation, and still others would not get the survey at all. Again, while some of this may be simple lack of consistent execution of process, staff members joked with me and other staff about how sometimes they would “forget” to hand out the survey, if the patient was especially grouchy, if the needle-stick went poorly, or if there were other reasons why this patient might respond with negative scores. This confession contained no remorse. This sorry/not sorry response indicated that the staff did not take this data collection seriously, or, perhaps they were concerned about being dressed down for low scores. Either way, the resulting data was not valuable.

The evil “ish” in observations

You might think that, even with the Hawthorne Effect, you can capture useful data with a self-documentation approach to the survey. I can agree with this sentiment, provided you do one key thing. You train the observers in how to observe. The problem is that often we don’t train the observers how to observe. Either we think that the observation and documentation is easy, or we are uncomfortable telling a clinician how to observe and document clinical behaviors. While clinical staff may know clinical behaviors, they don’t necessarily know how to document them using this tool in this situation. The reality is that in most settings, clinical or otherwise, there are not two possible outcomes, right and wrong, but three possible outcomes, right, right-ish, and wrong. This middle gray zone can capture a lot of variances, like

OK, it wasn’t right-right, but no harm, no foul.
They did what they were supposed to do but missed an opportunity to connect with a patient.
Yeah, they didn’t do it perfectly THAT TIME, but that staff member usually does do it perfectly, so…

I am not a clinician, so I am not going to determine if these grey zone actions constitute normal variation or policy violations. I only know that they ruin consistent documentation.

Let us take a simple example. Hand hygiene protocols are often the subject of audits in healthcare. Observers will walk around with a clipboard inserting tick-marks when staff are using the proper protocol or not. This could not be simpler. I walk around the hallways and watch when people go into a patient’s room and see if they use the hand sanitizer (1) upon entry (2) before touching a patient, (3) before beginning an clean procedure, (4) after risk of bodily fluid exposure, (5) after touching a patient, and (6) after touching a patient’s surroundings.

This seems super-simple, as taking a pump or two of foam from the dispenser on the wall with every triggering event is obvious. At least until it isn’t. Since not every patient contact is going to involve each of those actions, most forms have a box for COMPLIANT, NOT COMPLIANT and perhaps a box for notes. *Spoiler alert* since the notes section is not quantitative, they are rarely reviewed by any leadership.

So, consider:

Is this a PASS or a FAIL if a nurse sanitizes immediately upon entering the room, but not use foam after looking at the Get Well Soon card that the patient got from their grandkids?
Is it a PASS or FAIL if the phlebotomist did not foam in, but put on rubber gloves before taking a blood sample?
Is it a PASS or FAIL, if the nurse manager comes into the room, does not hand-sanitize, but also doesn’t do anything other than talk to the patient?
Is it PASS or FAIL, if it is behavior observed of food service or housekeeping?

Of course, these are all trick questions, since it doesn’t matter what you would call them. It matters whether you think EVERYONE doing the documentation will score all of these events the same way.

If you do not have protocols for the observations—what you should be looking for, what is more important than other things, what to do if someone is in the “ish-zone”—then you will have a significant amount of data collection error, since the ‘letter of the law’ folks and the ‘spirit of the law’ folks will score things differently.

This variation extends to any rules about execution. Most organizations will tell a leader that they need to perform, say, 100 observations a week, but not tell the leader how they should spread those out. Should they be validating every staff member every week, or just at least once a month? Should they be observing on third-shift or weekends, or just 0800-1700 Monday through Friday. Are all staff included, or only employed staff? CNAs as well? Doctors as well?

Another variance is that leaders often impose a different standard, when rounding on their A-team staff, versus their C-team staff. Some of this can be intentional, because you trust the high-performers more than the rest, so you focus on your middle-performers. Other times, this can be unintentional, where you might not catch an oversight by a high-performer because you just miss it in their seamless delivery, but are eagle-eyed with other staff, because their delivery is less organic, so you find it easier to pick it apart.

None of this means that these self-surveys are garbage. It is that any observation is subject to humanness. If you don’t create clear rules, don’t train those clear rules to those administering the survey, and don’t create a mechanism that will allow for capturing accurate, if “ish” behaviors, you will not generate useful data. My recommendations are:

Tap into in-house expertise on survey design. Certainly there is someone in PX, business analytics, or marketing who can give you tips on useful construction.
Train your observers (including those giving the surveys to patients) in what the proper protocols are, including explaining WHY this work is being done.
Review the process as you go. Just because a survey looked good on paper doesn’t mean that there won’t be any problems when it is executed.
Just because a survey is functioning well now, doesn’t mean that it will always work. Sometimes people will want to add a question “just for a quarter” that ends up on the survey forever.

There is no self-observation strategy that is too easy to fail. Don’t fall into the trap of thinking that measures or behaviors are so simple, so ingrained as to not need clear protocols in measurement.

¹Long before the days of rampant census sampling (where any patient who had any encounter with the hospital would get a survey) we did random sampling surveying. This meant that, depending on the volume of the care space, it was unlikely that an emergency department or primary care clinic patient would even get outreach to take the survey. Like so many other endnotes in so many other essay, this one could be an essay someday. But not today.

²Or not. There is a lot of twists and turns in this pricing, which, again, would be a good essay topic for another time.

³For the record, whenever I realize that there is an aside on statistics and measurement that threatens to derail my essay and I call out the potential spiral, you can be sure that 100% of the time, a line from Elbow’s song, Grounds for Divorce is playing in my head: “There’s a hole in my neighborhood down which of late I cannot help but fall.”

⁴I would call this the Heisenberg Principle for social science, but (a) that probably doesn’t help people understand the concept, and (b) my parents, both chemists and loyal readers of this website, would roll their eyes at this comparison and send me a long email telling me how this is not exactly right.

Real World Implications of Statistical Concepts, 4: Bias in Self-Surveying

Share this:

Leave a comment Cancel reply