Customer-centric companies like Vista are in one long wrestling match – sweating over how to protect personal data and explore analytics in a way that’s good for everyone. Add in the exponential nature of data mesh architecture, an ever-changing data world, and balancing innovation alongside privacy protocols requires fancy footwork. DnA’s Jannik Podlesny, Principal for Data Governance, Architecture and Technology, and Andrew Graziani , VP, Chief Security & Privacy Officer, reflect on where the Vista team is at.
A shared problem
Startups and tech players are focused on fresh tooling around data privacy and anonymization. Yet there isn’t an established solution that ensures scalability and resilience that we know of. And no one is immune to the issue. We’ve seen the impact of privacy compromises across multiple industries, with the exposure of tens of thousands of health records from a clinical laboratory network and the de-anonymization of search history resulting in a class-action lawsuit against a sizeable internet company. Users trust us with their data, and Vista dedicates huge resources to mitigate any potential risk of exposure. It’s what we owe our customers and being trustworthy is part of who we are as a brand.
The reality of being first
Not many – if any – companies have fully implemented a large-scale data mesh structure. Vista is recognized as one of the first: adding another dimension to the data privacy challenge. Here, we weigh the flexibility and autonomy of data product teams higher than anything else and love the mesh, but complexity comes with it. You wave goodbye to the ‘easier’ central governance afforded by the traditional and centralized Relational Database Management System (RDMS), yet the need to meet EU General Data Protection Regulation (GDPR) or US CCPA remains. What replaces the RDMS defines the kind of company you want to be, and data privacy and security protocols become the backbone.
Mesh-specific challenges
Pulling up data lineage and scanning for personally identifiable information (PII) ceases to be straightforward. A decentralized structure comes with a highly fragmented data landscape: customer details and design preferences need to be found and evaluated. Harvard Professor Latanya Sweeney’s work long since demonstrated that a mix of age, gender and zip code could uniquely identify 87% of the US population, so our vigilance has to be inexhaustible given data might be duplicated by autonomous domain teams. And PII could be processed as a side product of what Vista does best – seamless, customized design at scale for small businesses.
The nub of the problem is to locate combined pieces of information that qualify as PII, yet aren’t suspect alone – AKA, quasi-identifiers, because they can re-identify individuals. Finding them gets really hard – NP-hard or W[2]-complete, , which will excite fans of computational complexity theory. And then there’s the tricky issue of data deletion. Will that become impractical in a distributed mesh topology as data circulates minus a central gateway?
Engineering a solution
If the central question is how to discover attributes that uniquely identify people, the first step is transparency – highlighting their existence in Vista’s data meshes. One option is to deploy a continuous PII scanner to search for hidden quasi-identifiers, label them correctly in the data catalogue, and proactively raise awareness of their presence. With this knowledge, our ‘role-based access’ paradigm (RBAC) sanctions access to data depending on what job you do – reducing ‘blast radius’ if oversharing can ever occur, and enforcing high, zero-trust security standards with two-factor authentication (2FA).
Step two is reducing the amount of information we hold. Do we need this data point for analytical purposes? To recommend a new personalized product, for instance? If it’s not valuable, let’s delete or descope it – archive it in a bunker and lock the door.
The exponential growth of finding privacy exposing quasi-identifiers
Document1