The development of AI systems requires, in some cases, the realisation of a DPIA if the envisaged processing is likely to create a high risk to the rights and freedoms of natural persons (Article 35 GDPR).
In its guidelines on the DPIA, the European Data Protection Board (EDPB) has identified nine criteria to assist data controllers, i.e. the AI system providers, in determining whether a DPIA is required. Any processing of personal data fulfilling at least two criteria on this list should be presumed to be subject to the obligation to carry out a DPIA. Some of these criteria are particularly relevant for processing taking place during the development phase:
In all cases, it is necessary to consider the existence of risks for persons as a result of the establishment of a training dataset and its use: if there are significant risks, in particular due to data misuse, data breach, or where the processing may give rise to discrimination, a DPIA must be carried out even if two of those criteria are not met; conversely, a DPIA does not have to be carried out if several criteria are met but the controller can establish with sufficient certainty that the processing of the personal data in question does not expose individuals to high risks.
On the basis of these criteria, the CNIL has published a list of personal data processing for which the realisation of a DPIA is mandatory (for more information, see the CNIL’s website). Of these, several may rely on artificial intelligence systems, such as those involving profiling or automated decision-making: in this case, a DPIA is always required.
Is the use of an artificial intelligence system an “innovative use”?Innovative use is one of the 9 criteria that can lead to the realisation of a DPIA: it is assessed in the light of the state of technological knowledge and not only of the context of the processing (a processing can be very “innovative” for a given organism, because of the technological novelty it brings to it, without, however, being an innovative use in general). The use of artificial intelligence systems is not systematically a matter of innovative use or the application of new technological or organisational solutions. All processing using an AI system will therefore not meet this criterion. In order to determine whether the technique used falls within such uses, it is necessary to distinguish between two categories of systems:
Example: certain regression or clustering techniques or model architectures such as random forests, in cases where the risks associated with their use are known;
generative AI systems trained on large amounts of data whose behaviour cannot be anticipated in all situations.
By way of illustration, a research project aimed at developing automatic language processing tools for clinical applications in the medical field, based on large volumes of data (transcript of audio data, clinical studies, medical results, etc.), can be an innovative use, especially given the uncertainty as to the results to be obtained.
Is the training of an artificial intelligence system a “large-scale” processing?Large-scale collection is one of the 9 criteria that can lead to the implementation of a DPIA: while the development of an AI system often relies on the processing of a large amount of data, this does not necessarily fall within the scope of large-scale processing which aims to “process a considerable amount of personal data at regional, national or supranational level [and which may] affect a significant number of data subjects” (recital 91 GDPR). For AI systems, in particular, it will be necessary to determine whether the development involves a very large number of people.
Examples:
A research organisation wants to build a large dataset of landscape photos (mountain, ocean, desert, cities, etc.) to improve the performance of computer vision systems. Some of these images feature images of individuals, sometimes recognizable.
Even if the dataset has millions of images covering the entire surface of the planet, if the number of images containing recognizable individuals (and therefore personal data) is limited (for example to a few thousand), the processing will not be called “large-scale processing”. However, it is not excluded that a DPIA may be required according to the other criteria to be verified.
Where a provider of a conversational agent constitutes a dataset to train its language model (LLM) from a considerable volume of publicly accessible personal data on the Internet collected through web scraping techniques, the processing can be described as “large-scale processing”.
Risk criteria introduced by the EU AI ActThe European AI Act aims to provide a legal framework for the development and deployment of AI systems within the European Union. It distinguishes several categories of systems according to their level of risk: prohibited systems, high-risk systems, systems requiring transparency guarantees and minimum risk systems. The CNIL considers that for the development of all the high-risk systems covered by the AI Act, the realization of a DPIA will be presumed necessary when their development or deployment involves the processing of personal data.
The realization of the DPIA may be based on the documentation required by the AI Act provided that the elements required by the GDPR (Article 35 GDPR) are included. The elaboration of more precise rules on the relationship between these requirements is the subject of European work in which the CNIL actively participates and which will be the subject of subsequent publications. This work will aim, in particular, to avoid any duplication of obligations on actors by prioritising the reuse of the elements constituted from one framework to another.
Moreover, the CNIL considers that the development of a foundation model or a general-purpose AI system, in that their uses cannot be exhaustively identified in the majority of cases requires the realisation of a DPIA when it involves the processing of personal data. Indeed, although these models and systems are not considered to be high risk by default by the AI Act, their dissemination and their future uses could entail risks for those whose data were processed during development, or for the persons concerned by their use.
The realization of a DPIA for foundation models and general purpose AI systems will facilitate the compliance of the processing implemented by their users. In this respect, the sharing or publication of the realized DPIA may facilitate the compliance of all the actors involved, in particular in the case of the dissemination of open source models, or the provision of systems for all.
Defining the scope of the DPIAThe scope of the DPIA may differ depending on the provider’s knowledge of the use that will be made, by itself or by a third party, of the AI system it develops.
Where the operational use of the AI system in the deployment phase is identified from the development phaseIf the system provider is also the data controller for the deployment phase and if the operational use of the AI system in the deployment phase is identified from the development stage, it is recommended to carry out a general DPIA for the entire processing. The supplier will then be able to supplement this DPIA with the risks associated with both phases.
If the provider is not the data controller for the deployment phase but identifies the purpose of use in the deployment phase, it may propose a model of DPIA accordingly. This may allow it, in particular, to take into account certain risks that are easier to identify during the development phase. However, the user of the AI system, as controller, remains obliged to perform a DPIA, for example on the basis of the provider’s template.
It should be noted that, in some cases, it is not possible to determine precisely and in advance the supervision of the deployment phase. For example, some risks can be reassessed after a calibration phase of the AI system under its deployment conditions. The DPIA will then have to be modified iteratively as the characteristics of the processing are defined at the deployment stage.
Where the operational use of the AI system in the deployment phase is not clearly identified in the development phaseIn this case, the provider of the system will only be able to carry out its impact assessment on the development phase. It will then be up to the controller of the deployment phase to analyse, with regard to the characteristics of the processing, whether a DPIA is necessary for that phase. If the purposes of the deployment phase are multiple, the controller may decline the same general DPIA for each of the specific use cases
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.3