Thursday, November 21, 2019
Home > ICO > Enabling access, erasure, and rectification rights in AI systems

Enabling access, erasure, and rectification rights in AI systems

Reuben
Binns, our Research Fellow in Artificial Intelligence (AI), discusses the
challenges organisations may face in implementing mechanisms in AI systems that
allow data subjects to exercise their rights of access, rectification and
erasure.

Under
the General Data Protection Regulation (GDPR) individuals have a number of
rights relating to their personal data. These rights apply to personal data used
at the various points in the development and deployment lifecycle of an AI
system, including personal data:

However,
even if it lacks associated identifiers or contact details, and has been
transformed through pre-processing, training data may still be considered personal
data, because it can be used to ‘single out’ the individual it relates to, on
its own or in combination with other data (even if it cannot be associated with
a customer’s name).

Organisations
do not have to collect or maintain additional personal data to enable
identification of data subjects in training data for the sole purposes of
complying with the regulation (as per Article 11 of the GDPR). There may be times,
therefore, when the organisation is not able to identify the data subject in
the training data (and the data subject cannot provide additional information
that would enable their identification), and therefore cannot fulfil
 a request.

Complying
with a request to delete training data would not entail erasing any ML models
based on such data, unless the models themselves contain that data or can be
used to infer it (situations which we will cover in the section below).

For instance, the product offers
a customer sees on a website might be driven by the output of the predictive
model stored in their profile. Where such data constitutes personal data, it would be subject to the rights of access,
rectification, and erasure. Whereas individual inaccuracies in training data
will usually have only a negligible effect, an inaccurate output of a model could
directly affect the data subject.
 

Requests for rectification of model outputs
(or the personal data inputs on which they are based) are therefore more likely
to be made, and should be treated with a higher priority, than requests for
rectification of training data.

Fulfilling requests about
data contained by design
When
personal data is included in models by design, it is because certain types of models,
such as Support Vector Machines (SVMs), contain some key examples from the
training data in order to help distinguish between new examples during
deployment. In such cases, a small set of individual examples will be contained
somewhere in the internal logic of the model.
The
training set would typically contain hundreds of thousands of examples, and
only a very small percentage of them would end up being used directly in the
model. Therefore, the chances that one of the relevant data subjects makes a
request are very small; but it is possible.
Depending
on the particular programming library in which the ML model is implemented, there
may be a built-in function to easily retrieve these examples. In such cases, it
might be practically possible for an organisation to respond to a data
subject’s request. If the request is for access
to the data, this could be fulfilled without altering the model. If the request
is for rectification or erasure of the data, this would not be
possible to achieve without having to re-train the model (either with the rectified
data, or without the erased data), or deleting the model altogether.
Fulfilling requests about
data contained by accident
Aside
from SVMs and other models that contain examples from the training data by
design, some models might ‘leak’ personal data by accident. In such cases, unauthorised
parties may be able to recover elements of the training data or infer who was
in it by analysing the way the model behaves.
The
rights of access, rectification, and erasure may be difficult or impossible to exercise
and fulfil in these scenarios. Unless the data subject presents evidence that
their personal data could be inferred from the model, the organisation may not be
able to determine whether personal data can be inferred and therefore whether the
request has any basis.

Organisations
should regularly and proactively evaluate the likelihood of the possibility of personal
data being inferred from models in light of the state-of-the-art technology, so
that the risk of accidental disclosure is minimised.

We would like to hear your views on this
topic and genuinely welcome any feedback on our current thinking. Please share
your views by leaving a comment below or by emailing us at AIAuditingFramework@ico.org.uk

Dr Reuben Binns, a researcher working on AI and data protection, joined the ICO on a fixed term fellowship in December 2018. During his two-year term, Reuben will research and investigate a framework for auditing algorithms and conduct further in-depth research activities in AI and machine learning.

Original Source

If You Liked This Article Click To Share

Leave a Reply

Your email address will not be published. Required fields are marked *