Data products should be treated as independent solutions with clear input and output, an owner, and provisioned by a self-service platform.
by Piotr Czarnas
A data product is a logical unit that encapsulates all knowledge about a use case. If we look at it within a data mesh, it is one of its building blocks, responsible for one of the steps. We can have source oriented data products that expose data from business applications, aggregation oriented products in the middle of the stack, and consumer oriented data products that publish transformed and cleansed data to their consumers: business users, data analysts, and data scientists.
All data products within a data mesh should adhere to federated computational governance, ensuring they meet the same standards. They should also leverage the reuse of infrastructure components. However, it's important to note that this doesn't mean engaging the data operations team for every deployment. Instead, data products can benefit from the economy of scale by using a self-service data platform. This platform publishes all the necessary assets within the infrastructure, empowering the data product team to work independently and efficiently.
After the data product is published, it should behave like a product you can find in a shop. When looking for a product, you look at the box, read the list of ingredients, and follow the recommendations about the product you have heard because you want to buy trustworthy products.
The products you would prefer would be developed using a product thinking paradigm. The product owners care for customers and want the best customer experience. In the context of data products, that means documentation, an easy-to-access interface, trustworthy data that is monitored with data quality checks, and a data quality score published as a data trust score.
To achieve that, the data product needs a team that feels empowered to own and manage the solution. The product team must have tools to avoid reinventing new tools and processes. That is the purpose of the bottom part of the data stack, a self-service data platform that will provision data products and configure all required integration points.
The data product can focus on the code to deliver its value, the data, and interfaces to receive and publish transformed data.
If you want to explore more about data quality, visit the DQOps website at https://dqops.com/. It is an open-source data quality platform that fits well into the data product model. DQOps can be fully automated, and the configuration of data quality checks is stored in YAML files, making provisioning much simpler.
תגובות