Type to search

How To Ensure the Data Mesh Doesn’t Create A Data Mess?


Firstly, many new data concepts have emerged in the last few years, such as Data Mesh and Data Fabric (The subject of a future post) that seek to solve the problem that Data need to be distributed to the entire organization and users want to access it faster. The idea that we need a more integrated and distributed data environment is well accepted and makes sense in Data Analytic Circles. One of the main reasons for this is to create more citizen Data Scientists and power users in various business lines who can potentially generate their own insights.

The Data Mesh as defined, in this case by Wikipedia, which for Data Mesh is as good a source as any: a sociotechnical approach to building a decentralized data architecture by leveraging a domain-oriented, self-serve design. With data mesh, the responsibility for analytical Data is shifted from the central data team to the domain teams, supported by a data platform team that provides a domain-agnostic data platform. Other definitions of the Data Mesh from around the web include: The Data Mesh is currently defined as a highly decentralized self-service architecture in which data business lines across the enterprise control data sets

To achieve the promise of distributing data to drive insight presumes that the Data is of quality and that business domains have the readiness and maturity/skills to harness the power of the data and to “self-serve” to create insights to drive business impact. The vision of distributing data and insights to increase business impact is one that most CDAOs, CAOs, and CDOs embrace; in fact, most of us advocate for centralization first to stabilize and to quality assure data and to create platforms with gold standard data only to then create a hybrid model where the platforms are well maintained but access and teams are decentralized/linked throughout and to the business lines. This is and has always been the goal, but it needs to be heavily debated whether a conceptual architecture and theory such as Data Mesh can get us there. This is exciting and, in the spirit of transformation, warrants further study and research with CDAOs.

A few observations include the fact that it appears that Data Mesh has been put for rdas a conceptual or theoretical idea without defining it well and pointing out its strengths and weaknesses. There is a history in technology circles from failed adoption of CRM platforms and more, so as we journey into this, we don’t want to “build it, and they will come” or go from Data Mesh to Data Mess. Ok, so let’s define the Data Mesh from what is known so far, and hopefully, we can debunk some of the nebulous shiny object syndromes related to the Mesh so we can go forward with our eyes wide open asking good questions and adopting the best parts of the Mesh wherever possible. First and foremost, and I will say this throughout this piece, it will be necessary to put forward a tested commercially viable Data Mesh solution, which does not exist to date. Well, that’s the spirit of test and learn, I suppose. Ok, so here we go. Are you ready to fasten your seat belts? If I could bring back Janice(OMG Lady) from the series friends right now, I would.

The Data Mesh is a theoretical concept or constructs which says the following:

Data Mesh is a philosophy or a theory to drive architectures. I have not yet seen how this architecture manifests in a transparent way.

Data is a strategic asset. Ok, no issue with that premise.

– There is no technological solution prescribed for the Data Mesh as of yet. This could be problematic as Data Mesh is not a tested construct, especially across industries.

Data can be self describing. The idea that Data can be discovered and understood in the product sense, can be problematic in some industries as it presumes that the business users know and understand the data and can back up the data engineers and analysts in a centralized platform team. I can buy this one if you are in a Silicon Valley software company, but not if you are in Banking or Financial Services, where some product managers don’t even have advanced excel skills. The end user maturity is still evolving. Data Mesh advocates should define these and other dependencies.

Provisioned for access. Ok, I can buy this part, but just because you can supply data doesn’t mean that the end users understand the data and know how to use it.

FAIR Data: Findable, Accessible, Interoperable, and Reusable. Ok, it certainly sounds good if it works smoothly. However, if it results in tons of duplication and the Data isn’t well defined as promised. What we were trying to solve with Data Mesh may cause what I call the “Wild West Data Effect (WWDE)” with data replicated and flying around the organization. It is easy to say oh, go ahead and duplicate the data but shouldn’t it be planned duplication? Does duplicated data exist in the “Mesh-o-Verse” forever?

– Some experts use the term knowledge graph interchangeably with Data Mesh. No issue with this, but I prefer a well-defined technology solution.

Whether or not Data Mesh (DM) is an authentic architecture remains to be seen.

DM assumes centralized database structures/teams don’t work. Not sure I agree that Centralized teams and platforms don’t work or are a roadblock; I think it is more about how CDAOs and CDOs link the team through the operating model and governance through partnerships. Also, in these statements, we don’t seem to mention that Operational Data was generally more in the CIO’s and CTO’s remit and not always managed within the central data team, except for governance and records management. Also, the use cases for Operational “running the business” data tend to be different from Analytical Data Use-Cases. Using a word like Mesh may not really describe what is actually happening. Can we really Mesh everything together? Again, this is a case of show and tell and testing.

Data Pipelines are Fragile. I agree they are and are difficult to manage. Many new tools should be discussed in the context of Data Mesh which most vendors don’t discuss. Where is the discussion of RPA, Pega, Immuta, Matillion, and more?

Data Engineers in the COE for Data don’t know the Data well as they aren’t using it. My POV is that it depends on the talent architecture and if it considers experiences and industry. Vendors’ making statements like this may be an over-generalization that needs to be revisited.

Analytical Data is Different from Operational Data.This point I agree with. But not all data needs to be returned to the Datawarehouse or data lake. It depends on what you want to do and where you want to do it. Many source systems have operational reporting for operation data, and many also have dashboards. So, this goes back to defining use cases and having a blueprint/data strategy for what you want to do and where. I believe some of the vendor commentaries around this point need to be analyzed, and firms need to go back to basics and probe Data Mesh vendor roadmaps for completeness of vision. What parts of the DM actually exist in any ecosystem?

There are many monolithic and centralized data repositories. I don’t think many firms have even gotten to ETL especially not globally, let alone ELT and Data Mesh; much of the dialogue deals with Fortune 50 companies and not even Fortune 1000 companies. Maturity and skill sets need to be factored in to assess if the firm is Mesh-Ready.

Data Mesh seems to lessen the fact that Data Analytics is a professional competency. It is believed DA is a bottleneck and is not connected to execution, which in most cases is far from the case. If the skill sets genuinely existed in the business lines, this would have happened by now. So we need to examine all of the connected roles in IT and Operations to really understand the full picture of bottlenecks and centralize versus. Decentralize. Also, Dash Mesh proponents disregard the reason why centralized functions evolved and that includes creating a career path, knowledge sharing, and best practices.

Domain Driven Data Ownership Architecture: I agree with this point if the domains via data stewards can drive their architecture, but I have not seen this often. Domains are often familiar but have no idea how to create data products or do analytics, let alone data modeling. I chuckle when I hear simple comments like let’s change the paradigm. I wish we could have a world where everyone knew analytics and engineering. That would genuinely be nirvana.

Data as a product (Data domains create the products themselves). This is a great idea, but how do we connect these products across all the data as we still want to becustomers centric? As long as this doesn’t create data product silos, then fine. Most vendors who talk about data products don’t’ think about enterprise or customer centricity. Having Data Mesh advocates and researchers explain how to connect customer data to product data and cross multiple domains (LOB data areas) would be good. Using the word Data Product could be very confusing to business users as we have been talking about customer views for a long time. This needs to be better defined than I have seen in the business press.

Data should be served and useable at the source. It sounds great. I would like to see how this will work without recreating the processes/tools in DA COEs. I would love to see how vendors push these capabilities upstream to the source. I agree that this would be a significant step change when and if this is technically possible and domains/product owners have the skills to manage this.

Data moves around, and we can’t get to one source of truth. Agree that it has been an elusive goal and only partially achieved (It’s more mature in the marketing domain). I would love to understand how vendors who are commenting on data pipelines are coming up with an architecture to make the internal implementation of domains and domain-oriented distribution a reality. How do we ensure consistent data definitions?

We don’t need the data catalog to have usable data. Alternatives?

Too many misunderstood terminologies, such as Metadata. In the Data Mesh, the Metadata Layer still exists. However, DM advocates suggest using simple English and less jargon to describe terms like Metadata, master data, catalog, etc. Amen to this one; I agree, but you still need to meet the parameters of what Metadata provides.

The Data Engineering team still sets up the infrastructure. Yes, they will need to, but Data Mesh seems to accuse Data Engineers of holding the business back from using the data, and I disagree with this idea. This depends on the org and engagement models and governance. It also goes back to the level of leadership and advocacy for the function and top management support.

Domain teams in the business can put their data into the lake themselves. I look forward to this day.

Decentralize storage with centralized infrastructure. How will data governance, policies, and controls work in this DA environment?

From Specialists to Generalists. This will require a massive push in training and education. This will work better in tech companies. I would love business and domain users to have the statistical and technical skills to create data products. This change will require new jobs, families, education, and training with significant investment. Also, Academic Institutions are not currently up to speed on these bleeding-edge ideas to provide a training source and talent pool. Vendors and Firms will need to develop their curriculum and training,

Responsibility for Quality and Security shifts back to the business lines under the Data Mesh. It will be interesting to see how the Data Mesh assures standards and defines security and quality aspects going forward. I agree with this trend as an extension of the Data Steward concept already in progress under Data Governance.

In Summary, if we are serious about the Data Mesh, we need to do an entirely new business case and rationalize all of the global concerns that the Data Mesh presents. For me, Data Mesh is currently a theory that could turn into an official architecture or a set of guiding principles, many of which exist to day but are still evolving. As of now, the Data Mesh has raised more questions than answers. The Data Mesh does not necessarily point out the differences and use cases between Operational and Analytical data, which in my mind, still have different fit- for- purpose use cases. Changing everyone’s mind will take more than just one vendor coining a term to flip the current paradigms on their head without significantly more research and testing. Said differently, we need Data about the Data Mesh(Case Studies, Success Stories, and more). I recommend that a new model be put in place that assesses data analytics maturity like the one that EDM or IIA uses but then overlays when the organization should consider the Data Mesh. For now, I am not recommending this be a go-to construct for low-maturity organizations. I would almost say that the organization must have built at least one data warehouse and had centralized data governance and a centralized team for Gen1 before it can even think about turning the world upside down in a Data Mesh Paradigm. There are examples of similar constructs like Agile at Scale that did not go so well for lower maturity firms or firms that were non-tech companies. I think we need to keep an open mind to Data Mesh but define it better and test our way into this. As you can see from the above, there are many gaps to fill and questions to answer.

I look forward to your thoughts and comments. What has your experience to date been with the Data Mesh and how far away do you think you are from adopting this concept?


Tony Branda




You Might also Like

Leave a Comment

Your email address will not be published. Required fields are marked *