GenAI is to data visibility what absolute zero is to a hot summer day

CIO.com (a sister publication) had an intriguing story about ChatGPT and a potential conflict with the European Union’s GDPR rules. The sad reality is that although the story accurately describes the EU issue, the GDPR kerfuffle amounts to barely a rounding error when it comes to generative AI (genAI) and how it is obliterating all compliance rules — especially those involving privacy.

Enterprise IT leaders have never had an especially strong mastery of IT visibility for apps, data and tools. But the instant a company welcomes genIA tools into its environment — an event that for most enterprises happened last year — they can kiss any beliefs that they still control their assets good-bye. And without comprehensive data control and strong data visibility, regulatory compliance is impossible.

Let’s start with the GDPR situation. As CIO.com explained it: “The EU’s strict privacy rules require that companies allow individuals access to personal information held about them, as well as ensuring that such data is accurate. This requires a long audit trail to every piece of information stored about European citizens. When it comes to AI-generated content, such trails often go cold. With regard to information generated by ChatGPT, (the entity suing) alleges there is no legal redress for so-called hallucinations when it comes to personal information.”

That’s true, but hallucinations are just the most obvious examples. The part of GDPR at issue here is the Right To Be Forgotten. If an EU citizen (5 million of whom live in the US) officially asks for something personal about them to be removed, companies are supposed to comply.

Long before genAI, including the large language models that underpin it, became popularized, enterprise IT teams were already struggling to comply with such rules. The usual culprits for enterprise IT visibility issues are cloud, IoT, mobile, third-parties, remote sites including home offices and all manner of Shadow IT.

The Right To Be Forgotten is simply not compatible with how modern enterprises function, especially in the United States. Let’s consider a typical scenario: you’re an analyst working for the marketing department and trying to make sense of some unusual customer patterns. It’s getting late, so you decide to finish number-crunching at home. So you back up the data to a personal cloud account for easy access at home later when you resume working.

The next morning, on the train heading to the office, you remember something you wanted to try with the data. You access that cloud folder on your phone, do a little more analysis and then save the files.

When you get back to your desk, you download that file to your work desktop machine and continue working on the latest files.

Think of the ways that data can go astray and be out of reach for IT. Your home computer uses a consumer-grade backup service, and overnight, those files were copied to that service. That consumer-grade backup service has its own offsite backup mechanisms, along with a separate disaster recovery service. That file with sensitive PII about customers is now in all those locations.

That sensitive data was also on your phone. That data also gets automatically backed up to that handset manufacturer, which also has its own backup and disaster recover arrangements.

Two days later, an EU citizen (who happens to be one of your customers) submits a right to be forgotten request and your team eventually learns of it. They delete the references they can locate on key enterprise systems, including a half-dozen corporate cloud environments they know about.

But what about all of those other locations?

That example involves just an employee trying to get work done. Let’s try one with a customer or a prospect. One of your senior sales representatives is working with another company on $1 billion sales deal. They start discussing sensitive contract points in text threads between them and will likely also review preliminary contract drafts.

Once the draft gets to a late stage, it goes to the legal department and everything is hopefully captured. But what about all of those text discussions between two personal smartphones? Compliance is not only about regulatory issues. What if the deal later goes bad and there is litigation? The opposing counsel will seek discovery, including all discussions about contract terms. Are you even remotely able to fully comply? (I’ll save you time: No, there is no way you can fully comply.)

That was all true back in 2019. In 2020, the pandemic hit and cloud and remote activity soared. In 2023, genAI tools — which had been around in various forms since the 1960s — grabbed headlines and was suddenly on the must-have list of every enterprise board member. The less about AI decision-makers knew, the more they wanted genAI.

Full data visibility was difficult in 2019, virtually impossible in 2020, and now, with genAI cropping up in just about every division and working group, it’s crossed the line into fully impossible

Why does it so fully obliterate data control? There are five relevant elements, all distinct, :

What enterprise data can genAI access?
Who (among employees, contractors, customers and anyone with any level of privileged access) can access that data? Is there some attempt, any attempt, to limit who can access what segments of that data?
How was the genAI system trained? What information did it examine to answer queries? Did any of that training information include data about EU citizens? (Yes, EU data was almost certainly within the training set.)
Is the genAI training on new data you share? Will it try and learn from your queries and potentially share that information with competitors?
What about the massive databases behind genAI tools. Can enterprises that license that system review the database? Can they delete details from it? If not — and the answer will almost always be “No” — how is GDPR compliance even possible? How is any compliance possible?

With arrival of genAI virtually everywhere, the compliance genie is already out of the bottle.

Compliance, Data Privacy, GDPR, Generative AI

GenAI is to data visibility what absolute zero is to a hot summer day

Data Privacy + Cybersecurity Insider – May 2024

Counting Is Not Causation – Plaintiffs’ Flawed Reliance on the So-Called “10 Key Characteristics” of Cancer