AI spots legal problems with tech T&Cs in GDPR research project

Technology is the proverbial double-edged sword. And an experimental European research project is ensuring this axiom cuts very close to the industry’s bone indeed by applying machine learning technology to critically sift big tech’s privacy policies — to see whether AI can automatically identify violations of data protection law.

The still-in-training privacy policy and contract parsing tool — which is called ‘Claudette‘: Aka (automated) clause detector — is being developed by researchers at the European University Institute in Florence.

They’ve also now got support from European consumer organization BEUC — for a ‘Claudette meets GDPR‘ project — which specifically applies the tool to evaluate compliance with the EU’s General Data Protection Regulation.

Early results from this project have been released today, with BEUC saying the AI was able to automatically flag a range of problems with the language being used in tech T&Cs.

The researchers set Claudette to work analyzing the privacy policies of 14 companies in all — namely: Google, Facebook (and Instagram), Amazon, Apple, Microsoft, WhatsApp, Twitter, Uber, AirBnB, Booking, Skyscanner, Netflix, Steam and Epic Games — saying this group was selected to cover a range of online services and sectors.

And also because they are among the biggest online players and — I quote — “should be setting a good example for the market to follow”. Ehem, should.

The AI analysis of the policies was carried out in June, after the update to the EU’s data protection rules had come into force. The regulation tightens requirements on obtaining consent for processing citizens’ personal data by, for example, increasing transparency requirements — basically requiring that privacy policies be written in clear and intelligible language, explaining exactly how the data will be used, in order that people can make a genuine, informed choice to consent (or not consent).

In theory, all 15 parsed privacy policies should have been compliant with GDPR by June, as it came into force on May 25. However some tech giants are already facing legal challenges to their interpretation of ‘consent’. And it’s fair to say the law has not vanquished the tech industry’s fuzzy language and logic overnight. Where user privacy is concerned, old, ugly habits die hard, clearly.

But that’s where BEUC is hoping AI technology can help.

It says that out of a combined 3,659 sentences (80,398 words) Claudette marked 401 sentences (11.0%) as containing unclear language, and 1,240 (33.9%) containing “potentially problematic” clauses or clauses providing “insufficient” information.

BEUC says identified problems include:

Not providing all the information which is required under the GDPR’s transparency obligations. “For example companies do not always inform users properly regarding the third parties with whom they share or get data from”
Processing of personal data not happening according to GDPR requirements. “For instance, a clause stating that the user agrees to the company’s privacy policy by simply using its website”
Policies are formulated using vague and unclear language (i.e. using language qualifiers that really bring the fuzz — such as “may”, “might”, “some”, “often”, and “possible”) — “which makes it very hard for consumers to understand the actual content of the policy and how their data is used in practice”

The bolstering of the EU’s privacy rules, with GDPR tightening the consent screw and supersizing penalties for violations, was exactly intended to prevent this kind of stuff. So it’s pretty depressing — though hardly surprising — to see the same, ugly T&C tricks continuing to be used to try to sneak consent by keeping users in the dark.

We reached out to two of the largest tech giants whose policies Claudette parsed — Google and Facebook — to ask if they want to comment on the project or its findings.

A Google spokesperson said: “We have updated our Privacy Policy in line with the requirements of the GDPR, providing more detail on our practices and describing the information that we collect and use, and the controls that users have, in clear and plain language. We’ve also added new graphics and video explanations, structured the Policy so that users can explore it more easily, and embedded controls to allow users to access relevant privacy settings directly.”

At the time of writing Facebook had not responded to our request for comment.

Commenting in a statement, Monique Goyens, BEUC’s director general, said: “A little over a month after the GDPR became applicable, many privacy policies may not meet the standard of the law. This is very concerning. It is key that enforcement authorities take a close look at this.”

The group says it will be sharing the research with EU data protection authorities, including the European Data Protection Board. And is not itself ruling out bringing legal actions against law benders.

But it’s also hopeful that automation will — over the longer term — help civil society keep big tech in legal check.

Although, where this project is concerned, it also notes that the training data-set was small — conceding that Claudette’s results were not 100% accurate — and says more privacy policies would need to be manually analyzed before policy analysis can be fully conducted by machines alone.

So file this one under ‘promising research’.

“This innovative research demonstrates that just as Artificial Intelligence and automated decision-making will be the future for companies from all kinds of sectors, AI can also be used to keep companies in check and ensure people’s rights are respected,” adds Goyens. “We are confident AI will be an asset for consumer groups to monitor the market and ensure infringements do not go unnoticed.

“We expect companies to respect consumers’ privacy and the new data protection rights. In the future, Artificial Intelligence will help identify infringements quickly and on a massive scale, making it easier to start legal actions as a result.”

For more on the AI-fueled future of legal tech, check out our recent interview with Mireille Hildebrandt.