As healthcare companies continue to extend their distributed systems, observability becomes necessary for ensuring the system’s proper performance and remediating problems quickly. However, implementing observability can create additional challenges concerning Protected Health Information (PHI), which may lead to compliance issues for the company.
In this article, we’ll examine some of the challenges of implementing observability in healthcare and offer suggestions for addressing those challenges.
“Wait, what is observability?”
Simply put, observability is using your system’s metrics and logs to evaluate and reason about the health of your system in real-time. In this context, ‘the health of your system’ refers to how well the entire system can handle the demand put on it.
Observability differs from application performance monitoring, which is primarily concerned with notifying the operations team when something is wrong. Observability goes further, correlating data about the application, the service requests and responses, the network, and the underlying infrastructure across all distributed components so that the team has a more complete view to help them understand why the issue is occurring.
“Ok, and why is observability crucial in healthcare?”
The large distributed systems maintained by healthcare companies have a pivotal role not just in the functioning of the business but in the health and well-being of the patients and families who rely on them. Let’s face it, if your favorite online retailer or social media platform has an issue and becomes unusable for a while, the company may lose money, but for you, it’s mostly just frustrating and inconvenient.
But suppose your healthcare provider or insurance company has an issue that prevents you or the doctors from logging in, scheduling an appointment, getting lab results, etc. In that case, it can have a tangible impact on you.
People rely heavily on these systems. When they aren’t functioning correctly, it can affect people’s life, well-being, and even security.
“Got it! So why can observability be challenging in healthcare?”
In simple terms, an observability platform gathers and exposes as much data as possible to facilitate reasoning about the system’s performance. Still, HIPAA regulations dictate that certain pieces of information be protected from exposure, and failing to protect this information adequately may result in significant liability for the company.
Various factors contribute to the challenge of protecting this information. We’ll briefly examine how four converge to exacerbate the problem.
Greedy data ingestion
Observability platforms tend to be greedy, pulling in as much data and metadata as possible to facilitate remediation. This is good because it allows us to understand the context around performance issues. The ingested data comes from the traces, logs, metrics, and events of several sources: front-end applications, back-end applications, infrastructure resources, and the network itself. Metrics from these sources aren’t an issue because they’re numeric aggregates (e.g., CPU%). However, logs, traces, and events are essentially different text-based logs and may contain sensitive information.
PHI leakage
HIPAA defines 18 different data points that are considered personal identifiers, which, when combined with health information, are considered PHI and must be considered confidential. However, many of these data points are so common in application development (e.g., name, email, IP address) that they are often included in system telemetry either directly by logging the information or indirectly as system parameters and metadata. Because these are so common, this data is typically included without a second thought and can easily slip through code reviews.
Communication gaps
Distributed systems comprise several applications typically developed by several teams across the organization. In a sprawling delivery organization, continually communicating data governance policy and enforcing that policy can be challenging, especially if the organization focuses on feature delivery. Even with the best intentions, communication gaps can occur, resulting in PHI leakage.
Limited operations and platform engineering personnel
Operations and platform engineering teams are typically the ones who would use an observability platform and, in the case of PHI exposure, would be the ones to raise a red flag. But with so many teams writing and deploying code and so few operations personnel, protected data can easily slip through and be exposed.
Remember the I Love Lucy episode where Lucy and Ethel work the conveyor belt in the chocolate factory? I’m dating myself; I know. But if you’ve seen it, you get the picture.
“We’ll just get a BAA. Then we don’t have to worry about it.”
A Business Associate Agreement (BAA) is a contract between a HIPAA-covered entity like a healthcare company and one of its partners. It states that the partner company will maintain PHI security and overall HIPAA compliance. This is required by HIPAA for any partner company that handles PHI, enabling the healthcare company to send PHI to the partner company without taking on additional liability.
I always recommend partnering with an observability platform provider that will sign a BAA, but simply having the BAA in place may not be sufficient. Depending on your company’s policies and personnel, it may be necessary to closely govern access to that data and data transfer to and from the platform.
“So, how can we address these challenges?”
Having implemented observability for a large healthcare provider, I suggest implementing a ‘zero-PHI’ policy for your observability system, even if you have a BAA with a platform provider.
Observability is primarily about maintaining or improving the performance of a system overall and addressing issues at scale quickly and efficiently. PHI doesn’t help accomplish this.
I have heard objections like, “We need to have the user’s email address in these logs for troubleshooting.” This shows a misunderstanding of the scope of what observability is trying to address. Observability isn’t concerned with why a particular user can’t log in but why 35% of all logins failed between 7-8 pm, for example.
Determining why a particular user can’t log in is a tier 1 support issue where knowing the email address may help the support team resolve the customer’s problem. The latter scenario is a use case for observability where knowing the email address of the failed attempts does not help us understand or address the issue.
Or consider the difference between determining why Mrs. Jones encountered an error while trying to schedule an appointment vs. discovering that 90% of appointment scheduling attempts fail when the system receives more than XX requests per minute and being able to determine quickly why that’s happening.
This is what we mean when we say observability enables us to address performance issues “at scale.” In each comparison, the latter scenario is an example of where observability becomes valuable to companies with large distributed systems, and neither of these is helped by the presence of PHI in the system.
A better solution is to implement a zero-PHI policy for your observability platform, and for those tier 1 issues where PHI is helpful, write those errors and logs to a separate system instead of mixing it with observability data. This may seem like a lot, but exposing protected information leading to a HIPAA compliance incident can be a huge liability to a healthcare organization. Conducting a risk assessment to determine if this is right for your company is advised.
“What else should we be aware of?”
Here are a few more things to consider.
Access
If your observability platform contains PHI, HIPAA and internal policies may dictate who may access that data. You’ll need to consider how to structure the data in the platform to ensure unauthorized personnel do not gain access to it.
I also strongly suggest de-identifying the data before its ingestion in the platform (see ‘Centralized Ingestion’ below). This is especially important if broad access to the data has been granted.
Service design
There’s often a legitimate reason for a business to view performance data alongside user analytics, and observability platforms continue to roll out features that allow companies to do this. Storing this data for a retail business may not be an issue, but it can cause compliance issues in a healthcare scenario.
For instance, a username used as a URL path parameter in an API may be considered PHI when saved as metadata with the search term’ breast cancer’ from a query string.
You’ll want to pay attention to how your services are designed. Watch for any of the PHI data points being used as URL path parameters, query string parameters, or being included in HTTP headers. These fields are open to an observability platform and will be exposed as part of the service request details.
Centralized Ingestion
Log entries and log files are places where user data is frequently found. Developers naturally write error messages to log files containing user information which then become a primary data source for observability along with metrics, traces, and events.
Consider a central ingestion point within the control of your own company where business rules can be run against logs, traces, and events to de-identify the data before sending them to your observability platform. A centralized OpenTelemetry collector could be one option.
Since this ingestion point handles sensitive data, there is a risk of compromise or failure, so ensuring the security and resilience of this endpoint is essential.
Also, for those systems where you are intentionally storing PHI, make sure the storage is scalable, and if there are max limits on the available space, implement alerts that will notify you before reaching those limits.
Platform output
Observability platforms don’t just ingest data. A full-featured platform will include mechanisms for raising alerts and sending notifications when the system’s performance degrades. The data surfaced in the alerts and notifications is often customizable, but if your system is storing PHI, the default settings may render that sensitive data into the body of the notification.
Also, if PHI is included in the notification and you’re integrating with a 3rd party platform for notifications (e.g., PagerDuty or Slack), do you have a BAA with that 3rd party? What about the notification recipients; are they authorized to view PHI?
Monitor, prevent, remove
Perhaps you’re already using an observability platform and want to prevent PHI ingestion or get it under control. You may be able to use the platform’s alert features to identify when and where PHI enters the system and notify you immediately so that you can take action. And if the platform includes pre-ingestion rules for dropping data, you may be able to prevent the ingestion there.
However, if preventing ingestion is your goal, I wouldn’t recommend these data-dropping features as your primary tool. Use them as a safeguard while you work toward ensuring the data isn’t sent to the platform to begin with.
Finally, monitors can still miss data being ingested, so it’s suggested you audit your data regularly. If you find PHI already in the system, your observability platform should allow you to either delete the data or reconfigure data retention periods so that it can be removed as soon as possible.