The famous "4 types of data analytics" can be applied to the maturity level of data quality.
A few years ago, I was asked to provide a value proposition for implementing data quality. The business sponsor wanted to measure the completeness of the vision on the scale of data analytics levels. I just found my notes and created this presentation.
First of all, the steps shown on the stairs reflect what is feasible with the current toolset and can be achieved within a reasonable time and budget. You are welcome to add more ideas in the comments, but this approach was enough to secure a budget for a big data quality initiative.
The key takeaways:
𝗗𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝘃𝗲 data quality is like basic reporting. After your data quality checks are executed, you can look into the logs to find issues. If you use a data observability platform, you can look at the summary screens that show a list of recent issues and a list of affected data assets.
𝗗𝗶𝗮𝗴𝗻𝗼𝘀𝘁𝗶𝗰𝘀 data quality takes the whole exploration one step further. You will need a data quality data warehouse to store all data quality metrics. Your data consumers (users) will use data quality dashboards to explore the list of affected assets further, analyzing trends and finding similar assets that were also affected.
𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝘃𝗲 data quality adds a new set of data tools to "see the future". That is anomaly detection over time series. In fact, many data observability vendors provide both the descriptive and predictive levels, but the diagnostics side lacks enough customization to create data domain-specific data quality dashboards that would allow drilling down into the issue.
Finally, 𝗽𝗿𝗲𝘀𝗰𝗿𝗶𝗽𝘁𝗶𝘃𝗲 data quality is about turning many repeating and similar data quality issues into data quality incidents. A well-implemented prescriptive data quality can limit the number of incidents assigned to data teams.
If you found this post interesting, you can visit my website by clicking the "Visit my website" link below my profile. You will learn about DQOps, an open-source data quality platform that focuses heavily on analytics and exploring the root causes of data quality issues by setting up a data quality data warehouse for every user, even free users.
Comentarios