As a self-proclaimed numbers nerd, there is not much I like more than doing secondary analysis on data. Once you have identified broad patterns in your data, like the overall scores are going up, or the doctors’ scores are going down, the next logical question is to determine the reason why the data is moving in that way. For me, this is where the fun begins. It is also where the challenge begins. I have promised (or threatened, depending on your point of view) to talk about some interesting avenues for secondary analysis, but it occurred to me that it would be valuable to discuss the key elements for a successful analytics effort. Or, in the language of the cooking shows my wife and I love, we should start with discussing the mise en place. So, in order to execute successful secondary analysis, you need to have a few things, some more obvious than others.1
Data
Some might think that this is too obvious to mention, but not all data is constructed to answer all questions. Even if this seems obvious to you, it is often not obvious to those who ask for any analysis. I have lost count of the times that someone said, “Well, can’t you just get that from the EHR?” as if it is an oracle or magic eight-ball that will magically spit out an answer instead of a dataset with a complex architecture that does not necessarily lend itself to secondary analysis outside of what it thinks is interesting or important.
There are several important issues with data—its structure, its variables, its code dictionary—but the most basic issue is the unit of analysis for the data. This is the focus of the data set. If you want to understand what patients think, then you need to look at patient data. If you want to understand what patients in a med/surg unit think, then you need both patient and discharge unit data. If you want to understand what patients discharged from med/surg on the weekend think, then you need patient, unit, and day of discharge. So, if I told you that General Hospital’s score for the year was 78% and asked you how important the med/surg discharges were to that score, you couldn’t answer that question without being able to break the data out by unit and, ideally, also knowing the number of responses. This may seem obvious and that I am needlessly pedantic, but the biggest challenge a data analyst has is getting requests for analysis from people who think data-is-data-is-data. You cannot manage expectations or collaborate on useful questions if you forget the limitations of the data you have.
A lot of secondary analysis looks at a comparison of two datasets, or two datasets linked together, or one dataset against some external milestones. Therefore, another thing to remember with the data is the timeframe of its collection. For example, one might want to know if the act of a clinician relicensing or taking any continuing education units (CEUs) has any impact on clinical, quality or even patient experience measures. To analyze that, one needs to make sure the timelines match up. This is one challenge if, like in Wisconsin, those relicensing windows are set in stone and a different challenge, if that window is staggered according to other factors. This is compounded by the fact that real-time data isn’t usually real-time and the most recent data-drop may not be particularly recent. I spoke about this in one of the essays on pillars and pillar data, but it bears repeating. For example, as of this writing, the most recent HCAHPS data from CMS available for download is for calendar year 2024.
Data engineer
This is one of the terms tossed around in analytics that can have fuzzy definitions, so, for me, a data engineer is the person who can assemble the dataset for analysis. They are important, especially if you want to link multiple datasets together, recode variables to meet your needs, or create new variables. Often the questions you want to answer involve elements that are not in your main dataset. For example, I have been asked to break out data by:
- Whether the patient was in the ICU at some point during their stay
- Can we predict readmission within 30 days with the HCAHPS data
- Whether the patient received care from any of the hospitalists during their stay
All these questions have value but also involve variables that are not likely contained within the patient survey data. A data engineer, then, is responsible for connecting data from multiple sources to answer these questions. Even now, many organizations don’t have anyone on staff who can do this. Even if you have an analytics team staffed with people who have the skills, the data in hospitals can be so compartmentalized that they may not have access to all the datasets they need. This is not an insult to IT departments keeping a tight rein on things; HIPAA and hackers require IT departments to have intense scrutiny of data safety. I usually do most of this myself, not because I am awesome, but because most organizations do not have these folks around and it is easier to submit a data request and construct my own database than it is to train someone else (who likely did not have the time to learn anyway.)
Creative data analyst
I mean no disrespect to my brothers and sisters who work as data analysts, but not all data analysts are created equally. All analysts can answer a direct question they are given. Not all function when they are given a broad question to answer. This is the difference between
- Tell me if patients over the age of 65 give statistically significantly different answers to the nursing questions than those under the age of 65.
- Explain why some patients are happier with our nurses than others?
The first question is straightforward and, depending on the state of the data, easily answered. The second question is a fishing expedition. What the analyst will find depends on what they are looking for. They may look at patient demographics, or discharge units, or weekday vs. weekend patients, or employed nurses vs. staffing agency nurses, hospitalists vs. other doctors, etc. They were charged with a broad task, and the only limits are their own ability to conceive of a possible explanation and the limit of the data to answer that question. Part of this exploration is being able to prioritize some possibilities over others. Part is being able to look at early results and determine if the differences observed are only noise, or if there is some signal in the mix as well. Most of this comes from experience, past explorations, or who is asking for the analysis.2 This is not a dig at those who are green or who have never been given the opportunity to spread their wings. After all, most of what I have learned is by simply asking questions, building datasets, and answering those questions. Using statistical tools can be taught but knowing how they can be used is a learning process.
Time
With everything said above, time may seem an obvious need. Even so, it will generally take you more time than you think. I estimate that 80% of the time I spend doing secondary analysis is in preparing the dataset. Whether it is adding additional data from another source, or recoding variables to match your needs, or even creating new variables based upon other variables, prepping and validating the data takes a lot of time and this doesn’t even cover the write-up at the end to explain the analysis to your audience. (Please see earlier rants on ‘the data speaks for itself.’) Oddly, usually will be less of a problem on big projects, because you (and your boss) will know that this is a heavy lift. It is on the little things that your boss will IM you on as they are going into a meeting. Things that you think “will just take ten minutes” and then when you look at the clock you have been at it for an hour. This is where having the reputation for being a data-whisperer can bite you, since your boss is likely putting “?” in the chat window every five minutes. Or maybe that is just me.
Open-minded leadership
All analysis can be interesting to a numbers nerd, but it only has value if it resonates with leadership. To that end, you need to always keep in mind your audience as you do this work. The obvious concern is stats literacy, but there are other important elements in evaluating an audience.
Everyone wants easy analysis that ends with a clear smoking gun. The real world, though, is not that helpful. So, one thing to evaluate is your leadership appetite for analysis that is informative, but not obvious, otherwise known as the grey area in analysis. One time, I identified an element that had an important impact on how a minority of patients felt about their primary care clinic. I will save the specifics of this research for another time, but essentially, the presentation initiated a battle between those who saw the IMPORTANT IMPACT and those who saw MINORITY OF PATIENTS. This generated a lot more heat than light and it was my fault for not properly preparing for what I thought would be a no-brainer improvement focus.
Further, everyone has their own set of previously established beliefs. Some accurately reflect reality. Some may have been true at one point but are no longer true. Others have never been true. They all can get in the way of any conversation about analysis. There is no analysis that won’t challenge someone’s sacred cow or pet project, so there is always someone in the room who may feel challenged by what you reveal. An open-minded leadership will remove the blinders, though, and consider what the data tells them and incorporate it into their world view. I was working with a large hospital on their sepsis mortality.3 Their clinical leadership was adamant that their problem with sepsis mortality was due to the fact that other hospitals were shipping their sepsis patients to them that were ‘circling the drain’ so they would die on this hospital’s watch. I pulled the data and showed that they had great mortality numbers for sepsis patients that were transferred. They had a problem with the sepsis patients who came through their own emergency department. Months later, I still heard that debunked claim being cited for poor performance. It is hard to improve if you are working on the wrong things.
Statistics package
One might think that this entry would be higher on the list. Instead, I was tempted to not even add it. I am only adding it because otherwise some might think it was an oversight. I certainly am a fan of SPSS (formally known as IBM SPSS Statistics when IBM bought them in 2009, but they will always be plain-old SPSS to me.) I use MSAccess for most of my data engineering needs. But a lot of exploration can happen in Excel or even on a vendor’s portal. In fact, I rarely need all of the bells and whistles that SPSS provides. Too often, people get enamored with packages with big price tags. Your best move, though, is to demonstrate proof of concept with the Microsoft suite of software and, then, having demonstrated value, it is easier to get leadership to spend money on a package. It is definitely easier than trying to squeeze funding out based simply on a promise of future value.
With all of these things in place, you can explore all sorts of interesting questions. Like with everything, moderation is the key, but some good secondary analysis can help inspire people to work smarter or help explain that behaviors do matter. It can identify the real opportunities and create more targeted and precise action plans. It cannot answer all questions, but it can reveal to leadership that they are currently only scratching the surface with all the data they have at their disposal.
1I acknowledge on the outset that I am opening myself up to slings and arrows from both the MEGOs who will find this overly detailed and complex as well as the statisticians who will say, “yeah, that is not exactly right…” For now, just know that I am aiming for the middle ground. If I miss, well, I encourage you to lodge them in the comments section.
2A Chief Nursing Officer, for example, may be more interested in elements that have a connection to the nursing staff. This doesn’t mean that this is only where you look, but it probably should be where you start looking.
3I feel like I may have told this story before but am too lazy to check. My apologies if you already know how the story turns out.
Leave a comment