When the data platform transitions from a vision, development, and consumption to maintenance, the scope of data quality changes.
If you observe discussions about data quality, it is a topic valid for three groups of people: data engineering, data governance, and data operations. However, their expectations and the use cases are different.
It is not that there are three types of data quality. Well, there are, but that is not the point. If your data platform will serve users for years, it will transition across different platform lifecycle (maturity) levels.
At first, the data owners will be the only group that understands their data sources. The data quality at the discovery phase can help profile and validate these data sources to estimate the effort required for ingestion and data cleaning correctly.
Once the development of the platform begins, data engineering will be the most engaged team, and they will prefer to integrate data quality into the data pipelines.
The platform is handed over to the consumers: data analysts and data scientists. They would like to analyze the quality of the cleansed data they consume. They need a simple way to connect and run a few data quality checks.
Now, the real testing starts. Data stewards and data quality engineers take over and want a no-code tool to validate the data from the business perspective.
Finally, when the platform matures and transitions to regular usage, the data operations and support teams will take over. They will need even simpler tools to configure data observability and the incident management workflow.
My hint: do not view the current requirements from the perspective of your role. Instead, consider the whole lifecycle and choose a data quality process that will work long term.
The infographic shows the engagement level of each role during the data platform lifecycle and which steps require managing data quality.
Comments