Challenges: The need for a central data repository
When I joined JW Player in 2021, the company held a summit where all employees involved with data gathered to discuss data issues and ways we could enhance our data strategy. During the summit, colleagues highlighted the pain of not having a central repository for documentation in the company.
To address this, we conducted interviews with stakeholders, including customer support, account managers, support engineers, data engineers, product managers, marketing, and anyone who might benefit from a data catalog. One of the most pressing issues that emerged was data discovery, particularly for non-technical people who found it difficult to navigate the data warehouse.
“One of the most pressing issues that emerged was data discovery, particularly for non-technical people who found it difficult to navigate the data warehouse.” —Emily Hopper, Senior Data Scientist, JW Player
As a new employee, I personally struggled with this problem. Even after six weeks of onboarding, I had no idea how to find new information other than asking, and different people often provided different answers to my questions.
Given requests like “I need this information split out by account key,” I would have to coordinate with colleagues across several teams to find all the right tables to join backend-relevant segmentation to performance monitoring data to customer-support-relevant segments.
We were primarily looking for lineage tools so that data producers would understand what downstream processes could be impacted by any changes made upstream.
We thought lineage would also be useful for building a single source of truth. People in the organization use different names and terms for the same things, which can be hard to follow. For example, metric X may be known by two different names across two different teams. We thought that if we could easily trace the source of metrics, it would be easier to know when metrics were the same (or different).
We were also looking for a central repository for documentation since our company had dispersed documentation across various platforms, such as GitHub markdown pages, Google Docs, Confluence, custom knowledge websites, and Slack. The scattered documentation made it difficult to find the answers we needed when we had questions.