Determined to improve operations and processes with data, learn how Mario Barajas and his team were able to increase the value of data products and drive data engineering excellence at Vista.
A decentralized data architecture capable of simplifying data organization and management based on business-specific domains can be constructive in scaling data needs at Vista. With data mesh, it is possible to execute this distributed data management approach, which is also enabling us to promote a culture of data engineering excellence across our organization.
Unlike traditional and monolithic data infrastructures, data mesh offers a domain-driven, self-serve design. It also simplifies the data product design, offering flexible data integration and interoperable functionality, empowering every domain across Vista.
Since data product teams within a domain can extract data from the data mesh to create their data products, common query patterns and tables will inevitably emerge. Drawing a parallel with the Pareto Principle for illustration purposes, 80% of the requirements will likely come from 20% of the tables and associated queries.
We leveraged this simple observation by creating a beneficial data layer for the data product teams in the Manufacturing, Supply Chain & Product (MSCP) domain which led to several advantages:
- Simplification and speeding up data product development: Data product teams no longer require reinventing the wheel around the core, non-ambiguous data concepts. Instead, the data mesh approach will translate into faster development and more accessible data product logic.
- Add informative value: Every base table enforces primary keys and aims at providing a higher level of data quality that data product teams can rely upon. Data integrity and correctness are a priority while making sure its value is carefully preserved.
- Address domain-wide business logic and underlying calculations: When possible, the MSCP domain can specify key definitions to be shared and formalize these in the base table layer.
In the MSCP domain at Vista, core concepts constitute the basic building blocks of most data products. For instance, consider a basic customer order-manufacturing-shipping process flow. In this case, customers order items from the Vista website. The respective plant then fulfils these orders by organizing various items into packages using printed sheets.
Regarding MSCP’s master data, the base tables provide clear and intuitive data organization. However, these tables provide more than that. Data product teams can also access entities, domain information, and events around this master data, like manufacturing events and domain-specific views of outside processes such as orders or complaints.
How do base tables differ from classic data warehouses?
A collection of master and transactional tables will be reminiscent of classic concepts such as dimensions, facts, data marts, and data warehouses. However, having a specific data warehouse for the domain could create a bottleneck, especially if the data product teams are forced to create their data products only out of the available data warehouse. And this is precisely what makes base tables different.
Another key difference is that, unlike the classic data warehousing modelling techniques, each base table provides a highly denormalized and independent data set so that users do not require joining facts and dimension artefacts. As a result, it dramatically improves the query performance for analysts and data product teams vs. querying MIS Data Contracts directly.
While we are actively working towards enriching and improving the base table data layer by adding more information and enhancing its performance, the purpose of the base tables is not to force data product teams to make use of it. Instead, these base tables serve as an optional set of tables for the teams. Moreover, as a part of our data engineering excellence, our goal is to drive base tables adoption by making added value obvious and not through a mandate.
Base tables must also evolve
Similar to any other data layer, base tables are meant to evolve. And thankfully, MSCPs have already gone through this process. In the previous version, data product teams quickly understood the value of base tables. They drove changes that, unfortunately, turned the base tables into a repository of the code for specific data product teams. It led to an ambiguous data model and created dependencies that, over time, impacted performance and reliability.
We learned from these mistakes and developed the latest version of the base tables. Every base table now encapsulates an intuitive and unambiguous domain concept.
Besides, the tables are operationally and architecturally independent. While we may be making a mistake, it will hopefully surface soon. As more information and learning unfold, we will correct the mistake in the following iterative journey. Thus, this approach forms a constant learning loop where we can focus on the quality of the solutions and drive new standards wherever possible.
As the base tables spread throughout the domain, the data product teams’ dependencies will inevitably grow. Making core changes and updating these dependencies will evidently become challenging. However, we aim to alleviate this by providing base tables through an independent and intuitive model that makes better sense to the business users. This translates into concepts as tables and straightforward primary keys.
Using base tables to drive data engineering excellence
A key challenge for domains in the data mesh architecture is to specifically and practically drive data engineering excellence practices across the data product teams at Vista. These practices cannot be vague in the domain, and the DE teams must define core standards and experiment. The fact that data product teams are empowered adds another challenge when it comes to implementing domain-wide DE standards. This might be crucial in cross-team support and knowledge sharing. Different data product teams will manage priorities and deployment velocities differently.
At MSCP, we have used the base tables as a critical platform to drive data engineering best practices. Some of these include:
- Getting to the detail: MSCP’s base tables have their own repository organized in a certain way, following agreed-upon standards. This code represents an actionable and palpable example for implementation and comparison.
- Covering Vista’s core modern data stack: Vista’s modern data stack keeps expanding and includes Gitlab, Snowflake, DBT, Looker, and Airflow. The base tables code covers these aspects, too, and is a detailed implementation template for all such advanced technologies.
- Reaching consensus: All DEs in the domain are involved in developing the base tables. When it comes to making decisions, the DE team discusses these in the context of the following: do we think this is the best thing we can do worth replicating in your data product teams?
- Encouraging conversations: The data model is essential to any analytical solution. Every merge request is an opportunity to promote healthy and educational debates across the domain. We have implemented a code review process that is strictly enforced in the base tables repository, allowing data product teams to build on it by trying extensions or minor adjustments that make sense to them.
The code base of the base tables embodies core standards for the domain, and it also highlights those spaces worth experimenting with at the data product team level. Thus, this code base is a living organism that helps the DE team achieve DE excellence. It is also evolving continuously as the data product teams experiment and move what works into the base table layer–whether these are technologies and/or processes.
Want to help us apply data and analytics to solve more data engineering challenges? Explore our career opportunities in Vista Data & Analytics.
Interested in data engineering? Learn more in previous instalments of this series!