Color Health Elevates Healthcare Analytics with a Documentation-First Approach

This health technology company improved data lineage and boosted team productivity with Coalesce Catalog

Company:
Color Health
HQ:
Burlingame, CA
Industry:
Healthcare
Top Results:
1
full day
of work saved every 2 weeks by automating data lineage tracing for table columns
50%
of data assets
archived after identifying unused or redundant tables, dashboards, and reports
Improved
data trust and governance
through better visibility and alignment

“Prior to implementing a proper data catalog, establishing data lineage for just one column would take up to a full day.”

Suzette Puente
Senior Analytics Engineering Manager, Color Health

This customer story was contributed by Suzette Puente, Senior Analytics Engineering Manager at Color Health. Color Health revolutionizes healthcare by making preventive care more accessible and streamlined. From cardiovascular health to cancer prevention, Color Health aims to provide at-home tests, vaccinations, and tailored care suggestions based on a patient’s health history and risk factors.

 

Introduction

At Color Health, data fuels our mission to help everyone live the healthiest life that science and medicine can offer. As the Senior Analytics Engineering Manager, I lead a subset of the data team that focuses on creating intuitive and user-friendly data sources, empowering stakeholders with access to valuable insights for making informed decisions. With our collection of health histories, risk factors, and other essential data points, we transform raw data into something meaningful that improves individualized healthcare and measures the impact of our product on large populations.

Our data team is composed of business analysts, data engineers, and analytics engineers. We’re driven to make healthcare data as accessible and actionable as possible, not just for the data team, but for everyone from product managers to genetics counselors to lab specialists.

We have to be agile as we’re a smaller team in comparison to other departments. Our scrappy operating model allows us to be efficient, yet flexible, in responding to the ever-changing landscape of healthcare data.

Challenges: Shortcomings in data documentation

When I first joined Color Health, our internal data documentation practices were in their infancy. Our spreadsheet version of a data catalog, while well intentioned, had become challenging to maintain. This wasn’t aligned with our data-driven culture and its effectiveness had room for improvement. As a new data scientist, I had a lot of questions about the data landscape and dependencies I wasn’t getting easily with our existing system.

Color was scaling fast, especially during the pandemic, as we spearheaded COVID-19 testing initiatives. The data team was pushing out a wealth of data to meet the company’s need for informed decision-making. The lack of a structured data catalog posed a significant bottleneck. External stakeholders—from product managers to lab specialists—were unable to self-serve with data, even though they were more than capable of doing so. Simple issues, like missing field definitions, became major roadblocks.

From a development perspective, the absence of a catalog was costing us time—a lot of it. Prior to implementing a proper data catalog, establishing data lineage for just one column would take up to a full day. Especially during early phases of new service offerings, it is common for our team to make daily model changes, so tracking lineage was consuming time that could have been better spent on more strategic tasks.

Solution: Choosing the right data catalog

Selecting a data catalog was a decision with far-reaching implications for our data infrastructure. I initiated the process by identifying potential solutions that could integrate with our existing BI tools, like Metabase. Coalesce Catalog (formerly CastorDoc) emerged as a strong contender, aligning well with our tech stack.

We ran brief demos with other tools, but Catalog’s UI was a game-changer. The focus on usability extended beyond the data team to non-technical users within the organization, aligning with our aim to make data accessible and impactful across departments.

Adoption rates serve as a key metric for assessing the value of a new tool. In our case, both high-level and more technical stakeholders outside of the data team have become frequent users of Catalog, including product and program managers who leverage the solution to customize existing reports for their specific needs. To further drive adoption, we introduced a series of “Data Masterclass” tutorials, which led to a measurable increase in usage across different departments.

“The UI in Catalog is so much cleaner and inviting, making it user-friendly not just for my team but also for non-technical folks.” —Suzette Puente, Senior Analytics Engineering Manager, Color Health

Results: Improved productivity, streamlined governance

Integrating Catalog into our infrastructure had an important impact on how we handle data. From an operational standpoint, we’ve seen a drastic reduction in the time spent on lineage mapping. What used to take a day now takes mere minutes, freeing up valuable resources for other high-impact tasks, like generating insights about the patient experience and how our services can be improved.

Warehouse cleanup

Before Catalog, our collection of data assets was much messier. With Catalog, we now have a clear picture of report and dashboard volume, what’s being used, and what’s just sitting there. We decluttered, eliminating redundant and unused data assets, and making it easier for our teams to focus on generating actionable insights. It’s not just about lineage—it’s about purposeful data usage. We’ve since archived approximately 50% of our data assets, which has greatly reduced search stress for our stakeholders.

Documentation discipline

Although documentation is often viewed as a tedious task, Catalog has transformed it into a more manageable process. It identifies gaps, prompts updates, and holds us accountable. This resulted in a disciplined approach to data governance that aligns our team in the right direction.

Proactive operations

Before, when there were changes to our production models, we’d find out the hard way. Now, with Catalog, we see the ripple effects instantly. This has enabled us to be proactive, fixing downstream issues before they become problems. It’s made a big difference in how we operate and has established data trust within the company.

Naming standards

Ensuring consistent naming conventions was a key concern for us at Color because it is the foundation of clear communication across the organization. Catalog makes it easy to search for fields and rigorously enforce naming standards. This level of consistency removes any barriers to data usage and ensures that everyone is speaking the same “data language.”

What’s next?

This quarter, we are leveraging Catalog’s knowledge pages more effectively, linking them to other internal resources for a consolidated view. Additionally, we’re exploring how to best document our business metrics within the solution and how we can make Catalog that starting point for information about our data.

“Building lineage for columns used to eat up a full day, every two weeks—that’s a 10% productivity hit. Thanks to Catalog’s automatic tracing, I’ve gained that time back.” —Suzette Puente, Senior Analytics Engineering Manager, Color Health

Note: This customer success story was originally published on November 1, 2023.