Skip to content

Latest commit

 

History

History
1163 lines (803 loc) · 88.4 KB

File metadata and controls

1163 lines (803 loc) · 88.4 KB

Responsible and Trustworthy Data Science - Evaluation framework

The evaluation framework below is the result of the participatory work initiated in the spring of 2019 by Labelia Labs (ex- Substra Foundation) and ongoing since then. It is based on the identification of the risks that we are trying to prevent by aiming for a responsible and trustworthy practice of data science, and best practices to mitigate them. It also brings together for each topic technical resources that can be good entry points for interested organisations.

Last update: 1st semester 2023.

Evaluation framework to assess the maturity of an organisation

The evaluation is composed of the following 6 sections:


Section 1 - Protecting personal or confidential data and comply with regulatory requirements

[Data privacy and regulatory compliance]

The use of personal or confidential data carries the risk of exposure of such data, which can have very detrimental consequences for the producers, controllers or subjects of such data. Particularly in data science projects, they must therefore be protected and the risks of their leakage or exposure must be minimised. Additionnally, AI models themselves can be attacked and must be protected. Finally, regulatory requirements specific to AI systems but be identified, known, and the data science activities of the organization must be compliant.

[⇧ back to the list of sections]
[⇩ next section]


Q1.1 : Applicable legislation and contractual requirements - Identification
With regard to personal or confidential data, the legal, statutory, regulatory and contractual requirements in force and concerning your organisation are:

R1.1 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 1.1.a Not yet identified
  • 1.1.b Partially identified or in the process of identification
  • 1.1.c Identified
  • 1.1.d Identified and known by our collaborators
  • 1.1.e Identified, documented and known by our collaborators
Expl1.1 :

It is crucial to put in place processes to know and follow the evolution of applicable regulations (very specific in certain fields, for example in the banking sector), as well as to document the approaches and choices made to comply with each data science project. Interesting example(s) : Welfare surveillance system violates human rights, Dutch court rules.

Resources1.1 :

Q1.2 : Applicable legislation and contractual requirements - Compliance approach
In order to meet these requirements, the approach adopted by your organisation is:

R1.2 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 1.2.a Informal, based on individual responsibility and competence
  • 1.2.b Formalized and accessible to all collaborators
  • 1.2.c Formalized and known by collaborators
  • 1.2.d Formalized, known by employees, documented for each processing of personal or confidential data
Expl1.2 :

It is a question of questioning the management of personal or confidential data (storage, access, transfer, protection, responsibilities...), and documenting the choices made.


Q1.3 : Applicable legislation and contractual requirements - Regulatory surveillance
Is a regulatory surveillance process in place, either internally or via a specialised service provider, to find out about applicable changes that have an impact on your organisation?

R1.3 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 1.3.a We do not really monitor the regulatory environment
  • 1.3.b We keep an informal watch, each employee sends back information via internal communication channels
  • 1.3.c We have a formal surveillance, with identified collaborators in charge and a documented process
Expl1.3 :

In addition to identifying regulations and compliance approaches, it is important to set up a surveillance processe to know and follow the evolution of applicable regulations (which can be very specific in certain sectors). Interesting example(s) : Welfare surveillance system violates human rights, Dutch court rules.

Ressources1.3 :

Q1.4 : Applicable legislation and contractual requirements - Auditing and certification
Has the organisation's compliance with personal and confidential data requirements been audited and is it recognised by a certification, label or equivalent?

R1.4 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 1.4.a Yes
  • 1.4.b No
  • 1.4.c We are currently preparing an upcoming audit or certification of our organisation's compliance with personal and confidential data requirements
  • 1.4.d Not at the organization level, but it is the case for at least one project
Expl1.4 :

In many sectors there are specific compliance requirements. It is generally possible to formalise an organisation's compliance through certification or a specialised audit, or by obtaining a label (e.g. AFAQ "Protection des données personnelles", ISO 27701).


Q1.5 : Data minimisation principle
In data science projects, the data minimisation principle should guide the collection and use of personal or confidential data. How is it implemented in your organisation?

R1.5 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)
(Specific risk domain: use of personal or confidential data)

  • 1.5.a We take care not to use any personal or confidential data. We are not concerned by this risk area
  • 1.5.b We need to use personal or confidential data in certain projects and the data minimisation principle is then systematically applied
  • 1.5.c Employees are aware of the data minimisation principle and generally apply it
  • 1.5.d The "who can do the most can do the least" reflex with regard to data still exists here and there within our organisation. In some projects, we keep datasets that are much richer in personal and confidential data than what is strictly useful to the project
  • 1.5.e Employees are aware of the data minimisation principle, but it is not applied as a general standard. However, we give a particular attention to implementing personal data-related risk mitigation measures (i.e. pseudonymising some features by identifiers with a separate correspondence table, split datasets in multiple tables kept apart)
Expl1.5 :

The data minimisation principle is sometimes also referred to as "privacy by design". It is one of the pillars of the RGPD in the European Union.


The following elements within this section apply only to organisations that did not select the first response to R1.5. Organisations not concerned are therefore invited to move on to Section 2.


Q1.6 : Project involving new processing of personal or confidential data
(Condition: R1.5 <> 1.5.a)
For each processing of personal or confidential data required in the framework of a data science project within your organisation:

R1.6 :
(Type: multiple responses possible)
(Select all the answer items that correspond to practices in your organisation)

  • 1.6.a We elaborate a Privacy Impact Assessment (PIA)
  • 1.6.b We implement data protection measures (in particular concerning the transfer, storage and access to the data concerned)
  • 1.6.c We contractualise relations with suppliers and customers and the responsibilities that arise from them
  • 1.6.d We have not yet set up an organised approach to these subjects
Expl1.6 :

The Privacy Impact Assessment (PIA) is a method for assessing the impact of a data processing, similar to traditional risk assessment methods. In certain cases, for example where a processing operation presents high risks to the rights and freedoms of natural persons, the GDPR makes it obligatory to carry out an PIA before the processing operation is carried out.


Q1.7 : Machine Learning security - Knowledge level
(Condition: R1.5 <> 1.5.a)
Machine Learning security (ML security) is a constantly evolving field. In some cases, AI models learned from confidential data may reveal elements of that confidential data (see articles cited in resources). Within your organisation, the general level of knowledge of collaborators working on data science projects about vulnerabilities related to ML models and the techniques to mitigate them is:

R1.7 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 1.7.a Complete beginner
  • 1.7.b Basic
  • 1.7.c Confirmed
  • 1.7.d Expert
Expl1.7 :

The state of the art in ML security is constantly evolving. If data scientists are now familiar in general with the membership inference attack (see proposed resources), new ones are being published regularly. While it is impossible to guard against all vulnerabilities at all times, it is crucial to be aware of them and to keep a watch on them. The article Demystifying the Membership Inference Attack is for example an interesting entry point in the context of sensitive data.

Ressources1.7 :

Q1.8 : Machine Learning security - Implementation
(Condition: R1.5 <> 1.5.a)
Still on the subject of vulnerabilities related to ML models and techniques to mitigate them:

R1.8 :
(Type: multiple responses possible)
(Select all the answer items that correspond to practices in your organisation)

  • 1.8.a We keep a technical watch on the main attacks and measures to mitigate them
  • 1.8.b Employees receive regular information and training to help them develop their skills in this area
  • 1.8.c In some projects, we implement specific techniques to reduce the risks associated with the models we develop (for example: differential privacy, distillation, etc.)
  • 1.8.d On each project, the vulnerabilities that apply to it and the techniques implemented are documented (e.g. in the lifecycle documentation of each model, see Section 4 and Element 4.1 for more information on this concept)
  • 1.8.e We have not yet set up an organised approach to these subjects
Expl1.8 :

The state of the art in ML security is constantly evolving. If data scientists are now familiar in general with the membership inference attack (see proposed resources), new ones are being published regularly. While it is impossible to guard against all vulnerabilities at all times, it is crucial to be aware of them and to keep a watch on them. The article Demystifying the Membership Inference Attack is for example an interesting entry point in the context of sensitive data.

Depending on the level of risk and sensitivity of the projects, certain technical approaches to guard against them will be selected and implemented. It is important to follow the evolution of research and state-of-the-art practices, and to document the choices made, to constitute a model lifecycle documentation.

Resources1.8 :

Q1.9 : Notification of safety incidents to the regulatory authorities
(Condition: R1.5 <> 1.5.a)
In the event that a model that the organisation has developed is used or accessible by one or more external stakeholders, and a new vulnerability is published, there is a risk that it may apply to them and thus create a risk of exposure of personal or confidential data:

R1.9 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 1.9.a We have not yet put in place a procedure for such cases
  • 1.9.b We have a process describing the course of action in such cases
  • 1.9.c We have a process describing the course of action in such cases, which references the authorities to whom we must report
  • 1.9.d We have a process describing the course of action in such cases, which references the authorities to whom we must report, and which includes communication to the stakeholders of whom we have contact details
Expl1.9 :

In some sectors there are obligations to report safety incidents to the regulatory authorities (e.g. in France: CNIL, ANSSI, ARS, etc.). An interesting entry point: Notifications of safety incidents to regulatory authorities: how to organise and who to contact on the CNIL website.



Section 2 - Preventing bias, developing non-discriminatory models

[Biases and discrimination]

The use of AI models learned from historical data can be counterproductive when historical data are contaminated by problematic phenomena (e.g. quality of certain data points, non-comparable data, social phenomena undesirable due to the time period, etc.). A key challenge for responsible and trustworthy data science is to respect the principle of diversity, non-discrimination and equity (described for example in section 1.5 of the EU Ethics Guidelines for Trustworthy AI). It is therefore essential to question this risk and to study the nature of the data used, the conditions under which they were produced and collected, and what they represent. Among other things, in some cases a specification of the equity sought between populations must also be defined. The equity of a model can be defined in several ways that may be inconsistent with each other, and the interpretation of performance scores must therefore be made within the framework of one of these definitions.

[⇧ back to the list of sections]
[⇩ next section]


Q2.1 : Gathering and assembling data samples into training and validation datasets
Often an initial phase of data science projects consists in gathering and assembling data samples intro training and validation datasets. In many cases this presents difficulties and is a source of risks. About this particular activity, has your organization defined, documented and operationalised an approach or a method taking into account in particular the following:

R2.1 :
(Type: multiple responses possible)
(Select all the answer items that correspond to practices in your organisation)

  • 2.1.a We operate informally on this subject and rely on the practices of each collaborator involved
  • 2.1.b Our approach includes methods to prevent poisoning attacks when collecting and gathering data samples
  • 2.1.c Our approach includes methods to check and make sure when necessary that datasets include samples of rare events
  • 2.1.d Our approach includes methods to complete missing values in datasets
  • 2.1.e Our approach includes methods to handle erroneous or atypical data samples values
Expl2.1 :

Obtaining and preparing datasets is a core acitivity in every data science project. Each data point can have an impact on the learning, and it is thus crucial to define and implement a conscious, coherent, concerted approach to mitigate the risk of learning and testing on problematic datasets.

Resources2.1 :
  • (Technical guide) Tour of Data Sampling Methods for Imbalanced Classification
  • (Software & Tools) Pandas Profiling: Create HTML profiling reports from pandas DataFrame objects. The pandas df.describe() function is great but a little basic for extensive exploratory data analysis. pandas_profiling extends the pandas DataFrame with df.profile_report() for quick data analysis

Q2.2 : Analysis of the training data
Within data science projects and when developing training datasets, reflection and research on problematic phenomena (e.g. quality of certain data points, data that are not comparable due to recording tools or processes, social phenomena that are undesirable due to time, context, etc.) can be crucial to prevent bias that undermines the principle of non-discrimination, diversity and equity. Your organisation:

R2.2 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 2.2.a Operates informally on this subject and relies on the practices of each collaborator involved
  • 2.2.b Does not have a documented approach to the subject, but the collaborators involved are trained on the risks and best practices on the subject
  • 2.2.c Has a documented approach that is systematically implemented
Expl2.2 :

It is a question of ensuring that oneself considers these subjects and therefore questions the training data, the way in which it was produced, etc. For example:

  • sensors or capture bias, e.g. if sensors used to get and record data points are not identical all along the capture process and lifecycle, or inbetween controlled training data and real data;
  • paying special attention to data labels and annotations: how where they generated? what level of quality, reliability? who are the authors of these annotations or labels? Labels have to be coherent with the modelling objectives and the intended domain of use of the model.
Resources2.2 :

Q2.3 : Evaluation of the risk of population bias and discrimination against certain social groups
In the context of data science projects, the nature of the project, the data used for the project and/or the thematic environment of the project can foster a risk of population bias against certain social groups (gender, origin, age, etc.). Evaluating first for each project if it is subject or not to such a risk seems key (in which case mitigation measures can be then contemplated). On that topic, your organisation:

R2.3 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)
(Specific risk domain: discrimination against certain social groups)

  • 2.3.a Operates informally and relies on the practices of each collaborator involved to evaluate if there is a risk
  • 2.3.b Does not have a documented approach to the subject, but the collaborators involved are knowledgeable and trained on the risks and best practices on the subject
  • 2.3.c Has a documented approach that is systematically implemented to evaluate this type of risk
Expl2.3 :

Configurations with risks of potential discriminations against social groups are particularly sensitive for the organisation and its counterparts. It requires special attention and the use of specific methodologies. In certain cases it is obvious if this risk has to be considered or not (e.g. projects on behavioral data on a population of users or customers, vs. projects on oceanographic or astronomical data), whereas in some cases it might be less obvious. It is therefore important to consider the question for each project.


Q2.4 : Preventing population bias and discrimination
(Condition: R2.3 <> 2.3.b)
In cases where the AI models your organisation develops are used in thematic environments where there is a risk of population bias or discrimination against certain social groups (gender, origin, age, etc.):

R2.4 :
(Type: multiple responses possible)
(Select all the answer items that correspond to practices in your organisation)
(Specific risk domain: discrimination against certain social groups)

  • 2.4.a We are not involved in cases where AI models are used in thematic environments with risks of population bias or discrimination against certain social groups (gender, origin, age, etc.) | (Concerned / Not concerned)
  • 2.4.b We pay special attention to the identification of protected attributes and their possible proxies (e.g. studying one by one the variables used as model inputs to identify the correlations they might have with sensitive data)
  • 2.4.c We carry out evaluations on test data from different sub-populations in order to identify possible problematic biases
  • 2.4.d We select and implement one or more justice and equity measure(s) (fairness metrics)
  • 2.4.e We use data augmentation or re-weighting approaches to reduce possible biases in the data sets
  • 2.4.f The above practices that we implement are duly documented and integrated into the model lifecycle documentation of the models concerned
  • 2.4.g We have not yet put in place any such measures
Expl2.4 :

It is a question of systematically questioning, for each data science project and according to the objective and target use of the model that one wants to develop, the features that may directly or indirectly be the source of a risk of population bias discriminatory bias. The term "protected attribute" or "protected variable" is used to refer to attributes whose values define sub-populations at risk of discrimination. Complement on the use of synthetic data and data augmentation, re-weighting approaches in order to reduce possible biases in the data sets: when such techniques are used it is important to make them explicit, otherwise there is a risk of losing information on how a model was developed.

Resources2.4 :

Q2.5 : Links between modelisation choices and bias
(Condition : R2.3 <> 2.3.b)
Recent work has shown the role that modeling and learning choices can play in the formation of discriminatory bias. Differential privacy, compression, the choice of the learning rate, early stopping mechanisms for example can have disproportionate impacts on certain subgroups. Within your organisation, the general level of knowledge of collaborators working on data science projects on this topic is:

R2.5 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)
(Specific risk domain: discrimination against certain social groups)

  • 2.5.a We are not involved in cases where AI models are used in thematic environments with risks of discrimination against certain social groups (gender, origin, age, etc.) | (Concerned / Not concerned)
  • 2.5.b Complete beginner
  • 2.5.c Basic
  • 2.5.d Confirmed
  • 2.5.e Expert
Expl2.5 :

If datasets used to train and evaluate a model require a particular attention to prevent discriminatory biases, recent work shows that modeling choices have to be taken into account too. The article "Moving beyond “algorithmic bias is a data problem”" suggested in resources synthesizes very well how the learning algorithm, the model structure, adding or not differential privacy, compression, etc. can have consequences on the fairness of a model. Extracts:

  • A key reason why model design choices amplify algorithmic bias is because notions of fairness often coincide with how underrepresented protected features are treated by the model
  • [...] design choices to optimize for either privacy guarantees or compression amplify the disparate impact between minority and majority data subgroups
  • [...] the impact of popular compression techniques like quantization and pruning on low-frequency protected attributes such as gender and age and finds that these subgroups are systematically and disproportionately impacted in order to preserve performance on the most frequent features
  • [...] learning rate and length of training can also disproportionately impact error rates on the long-tail of the dataset. Work on memorization properties of deep neural networks shows that challenging and underrepresented features are learnt later in the training process and that the learning rate impacts what is learnt. Thus, early stopping and similar hyper-parameter choices disproportionately and systematically impact a subset of the data distribution.

These topics require a strong expertise and few practitioners are familiar with them yet. In the context of this evaluation element, the recommendation is to learn about them and become aware of the complex trade-offs it implies, consider them during concrete projects rather than hiding them away, and follow how the state-of-the-art evolves and what best practices emerge.

Resources2.5 :


Section 3 - Assessing model performance rigorously

[Performance evaluation]

The performance of the models is crucial for their adoption in products, systems or processes. Performance evaluation must therefore be rigorous.

[⇧ back to the list of sections]
[⇩ next section]


Q3.1 : Separation of test datasets
In data science projects and when developing test datasets, it is of utmost importance to ensure non-contamination by training data. Your organisation:

R3.1 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)

  • 3.1.a Operates informally on this subject and relies on the competence and responsibility of the collaborators involved
  • 3.1.b Has a documented and systematically implemented approach to isolating test datasets
  • 3.1.c Uses a tool for versioning and tracing the training and test datasets used, thus enabling the non-contamination of test data to be checked or audited at a later stage
  • 3.1.d The train-test split technical choices implemented are evaluated, documented and integrated into the model lifecycle documentation of the concerned models
Expl3.1 :

Ensuring that training and test datasets are kept separated is a principle known and mastered by most organisations. It can however be tricky in some particular configurations (e.g. continuous learning, privacy-preserving federated learning...).


Q3.2 : Privacy-preserving distributed learning projects
In the case of data science projects based on distributed or federated learning on multiple datasets and whose confidentiality must be preserved:

R3.2 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)
(Specific risk domain: federated leraning on sensitive data)

  • 3.2.a We do not participate in privacy-preserving distributed learning projects | (Concerned / Not concerned)
  • 3.2.b We master and implement approaches to develop test datasets in such a way that there is no cross-contamination between training and test data from different partners
  • 3.2.c At this stage we do not master the methods for developing test datasets in such a way that there is no cross-contamination between training and test data from the different partners
Expl3.2 :

In this type of distributed learning project under conditions where the data is kept confidential, the question arises of how to compose a test dataset, making sure that it does not also appear in the training dataset (e.g. at another partner's premises).

Resources3.2 :

Q3.3 : Analysis of validation and test data
Within data science projects and when developing validation or test datasets, reflection and research on problematic phenomena (e.g. quality of certain data points, data that are not comparable due to recording tools or processes, social phenomena that are undesirable due to time, context, etc.) can be crucial for the meaning of performance scores. Your organisation:

R3.3 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 3.3.a Operates informally on this subject and relies on the practice of each collaborator member involved
  • 3.3.b Does not have a documented approach to the subject, but the collaborators involved are trained on the risks and best practices on the subject
  • 3.3.c Has a documented approach that is systematically implemented
Expl3.3 :

The use of AI models that have been validated and tested on historical data can be counterproductive when the historical data in question is contaminated by problematic phenomena. It seems essential to question this risk and to study the nature of the data used, the conditions under which they were produced and assembled, and what they represent.


Q3.4 : Performance validation
Does your organisation implement the following approaches:

R3.4 :
(Type: multiple responses possible)
(Select all the answer items that correspond to practices in your organisation)

  • 3.4.a When developing a model, we choose the performance metric(s) prior to actually training the model, from among the most standard metrics possible
  • 3.4.b The implementation of robustness metrics is considered and evaluated for each modelling project, and applied by default in cases where the input data may be subject to fine-grain alterations (e.g. images, sounds)
  • 3.4.c The above practices that we implement are documented and integrated into the model lifecycle documentation of the concerned models, including the performance metrics chosen
  • 3.4.d We have not yet introduced any such measures
Expl3.4 :

On choosing metrics upstream, see for example the risk of p-hacking / data dredging. On robustness, an intuitive definition is that a model is robust when its performance is stable when the input data is disturbed. For more information see the technical resources indicated.

Resources3.4 :

Q3.5 : Monitoring model performance over time
In cases where AI models developed by your organisation are used in production systems:

R3.5 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)
(Specific risk domain: use of AI models in production systems)

  • 3.5.a The models we develop are not used in production systems | (Concerned / Not concerned)
  • 3.5.b Performance is systematically re-evaluated when the model is updated
  • 3.5.c Performance is systematically re-evaluated when the context in which the model is used evolves, which may create a risk on the performance of the model due to the evolution of the input data space
  • 3.5.d The distribution of input data is monitored, and performance is regularly re-evaluated on the basis of updated test data
  • 3.5.e Random checks are carried out on predictions to check their consistency
  • 3.5.f We do not systematically set up this type of measure
Expl3.5 :

Even on a stable model, there is a risk that the input data will no longer remain in the target distribution after a certain time (population & distribution), for example, a variable that would no longer be filled in at the same frequency as before by users in an IS. It is therefore necessary to regularly re-evaluate the performance of a model used in its context of use. Monitoring the performance of models over time is also particularly important in cases of continuous learning, where there is a risk of model degeneration.

Resources3.5 :

Q3.6 : Decision making and ranges of indecision
For the definition of decision thresholds for models or automatic systems based on them, your organisation:

R3.6 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)

  • 3.6.a Operates informally on this subject, depending upon the collaborators involved
  • 3.6.b Has a documented approach that is systematically implemented
  • 3.6.c Takes into account the possibility of maintaining ranges of indecision in certain cases
  • 3.6.d The choices made for each model and implemented are documented and integrated into the lifecycle documentation of the models concerned.
Expl3.6 :

The study and selection of relevant decision thresholds for a given data science problem (threshold selection) is linked to the metrics selected. As discussed in the resources section of this evaluation issue, in some cases it may be worth considering the possibility of defining ranges of indecision.

Resources3.6 :

Q3.7 : Audits by independent third parties and verifiable claims When your organization communicates on the results or performance of an AI system, and makes it a marketing and communication argument to its stakeholders:

R3.7 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)
(Specific risk domain: external communication on the performance of AI systems)

  • 3.7.a We do not communicate or do not need to communicate on the results or the performance of our AI systems, or do not use the results or performance of our AI systems as an argument to our stakeholders, we are not concerned by this assessment element | (Concerned / Not concerned)
  • 3.7.b We communicate on the results or the performance of our AI systems and rely on them for our development without first having our work audited by an independent third party, without making evidence available
  • 3.7.c We have our work audited by an independent third party, or we make evidence available, before communicating our results and using them to communicate and rely on with our stakeholders
Expl3.7 :

Developing an AI model, and determining a meaningful and reliable benchmark performance measure, is a complex challenge. It is therefore often difficult for an organisation to assert that it has achieved excellent results and to claim them with certainty. Where possible, however, it may be even more difficult to make evidence publicly available without revealing valuable information about the organisation's intellectual property and the value of the work carried out. In such cases, it is recommended to have an audit carried out by an independent third party (e.g. security, privacy, fairness, reliability...), in order to secure the results the organisation wishes to claim.

Resources3.7 :


Section 4 - Ensuring model reproducibility and establishing the chain of accountability

[Model documentation]

An AI model is a complex object that can evolve over time. Tracing the stages of its development and evolution allows one to create a model lifecycle documentation, which is a prerequisite for reproducing or auditing a model. Furthermore, using automatic systems based on models whose rules have been "learned" (and not defined and formalised) questions the way organisations operate. It seems essential to guarantee a clear chain of responsibility, of natural or legal persons, for each model.

[⇧ back to the list of sections]
[⇩ next section]


Q4.1 : Lifecycle end-to-end documentation of ML models
Ensuring the traceability of all steps of the development of an AI model enables building up a model lifecycle documentation. Within your organisation, a lifecycle documentation of models is fed and maintained within the framework of data science projects, throughout the phases of data collection, design, training, validation and exploitation of the predictive models:

R4.1 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 4.1.a At this stage we have not implemented any such approach
  • 4.1.b This information exists and is recorded so as not to be lost, but it may be scattered and it is not versioned
  • 4.1.c They are compiled in a single document which systematically accompanies the model
  • 4.1.d They are gathered in a single document that systematically accompanies the model and is versioned
Expl4.1 :

This concept of "model lifecycle documentation" of a learned AI model can take the form, for example, of a reference document containing all the important choices and the entire history of model development (data used, pre-processing carried out, type of learning and model architecture, hyperparameters selected, decision thresholds, test metrics, etc.), and the internal processes organising this activity. In particular, it is interesting to include the trade-offs that have been made and why (e.g. trade-offs precision-specification, performance-privacy, performance-computing cost, etc.).

Ressources4.1 :
  • (Software & Tools) Substra Framework: an open source framework offering distributed orchestration of machine learning tasks among partners while guaranteeing secure and trustless traceability of all operations
  • (Software & Tools) MLflow: an open source platform to manage the ML lifecycle, including experimentation, reproducibility, deployment, and a central model registry
  • (Software & Tools) DVC: an Open-source Version Control System for Machine Learning Projects
  • (Software & Tools) DAGsHub: a platform for data version control and collaboration, based on DVC a platform for data version control and collaboration, based on DVC
  • (Software & Tools) Model lifecycle template: template for Data Scientists to help collect all the information in order to trace the lifecycle from end to end of a model, 2020, Joséphine Lecoq-Vallon
  • (Academic paper) System-Level Transparency of Machine Learning, 2022, Meta AI: System Cards aims to increase the transparency of ML systems by providing stakeholders with an overview of different components of an ML system, how these components interact, and how different pieces of data and protected information are used by the system

Q4.2 : Conditions and limitations for using a model
In the context of data science projects, the "conditions and limits of validity" of a model designed, trained and validated by the organisation:

R4.2 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)

  • 4.2.a Are not documented systematically, it relies on the practices of each collaborator involved
  • 4.2.b Are systematically explicited and documented
  • 4.2.c Are versioned
  • 4.2.d Contain a description of the risks involved in using the model outside its "conditions and limits of validity"
  • 4.2.e The documents presenting these "conditions and limits of validity" systematically accompany the models throughout their life cycle
Expl4.2 :

The aim is to make explicit and add to the model the description of the context of use for which it was designed and in which its announced performance is significant. This concept of "conditions and limits of validity" can take the form of a synthetic document or a specific section in the model lifecycle documentation.

Ressources4.2 :
  • (Academic paper) Model Cards for Model Reporting, M. Mitchell, S. Wu, A. Zaldivar, P. Barnes, L. Vasserman, B. Hutchinson, E. Spitzer, I. D. Raji, T. Gebru, January 2019
  • (Web article) Model Cards from Google is an open and scalable framework, and offers 2 examples: To explore the possibilities of model cards in the real world, we've designed examples for two features of our Cloud Vision API, Face Detection and Object Detection. They provide simple overviews of both models' ideal forms of input, visualize some of their key limitations, and present basic performance metrics.
  • (Web article) Model Cards for AI Model Transparency, Salesforce: examples of Model Cards used and published by Salesforce
  • (Software & Tools) AI FactSheets 360, an IBM Research project to foster trust in AI by increasing transparency and enabling governance: Increased transparency provides information for AI consumers to better understand how an AI model or service was created. This allows a consumer of the model to determine if it is appropriate for their situation. AI Governance enables an enterprise to specify and enforce policies describing how an AI model or service should be constructed and deployed.

Q4.3 : Analysis and publications of incidents reports
In data science projects, when unexpected behaviour of a model is observed:

R4.3 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)

  • 4.3.a At this stage we do not analyse the incidents or unexpected behaviour observed
  • 4.3.b We analyse incidents or unexpected behaviour encountered, but don't publish or share it
  • 4.3.c We analyse incidents or unexpected behaviour encountered and publish them when relevant (e.g. article, blog)
  • 4.3.d We get involved in clubs, networks or professional associations in the field of data science, and give feedback on incidents of unexpected behaviour that we observe
Expl4.3 :

Understanding or even mastering the behaviour of a learned AI model is a complex challenge. Lots of research is being done to develop methods and tools in this area, but much remains to be done. The sharing by practitioners of the unexpected incidents and behaviours they encounter contributes to the progress of the community.

Ressources4.3 :

Q4.4 : Value chain and chain of accountability
In the case of data science projects where several players, including internal to the organisation (teams, departments, subsidiaries), are involved throughout the value and accountability chains:

R4.4 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)
(Specific risk domain: roles and responsibilities in data science projects are divided up multiple actors)

  • 4.4.a Within our organisation, data science projects are carried out end-to-end by autonomous teams, including the elaboration of datasets and the exploitation of models for its own account. Consequently, for each project, an autonomous team is solely responsible | (Concerned / Not concerned)
  • 4.4.b We systematically identify the risks and responsibilities of each of the internal and external stakeholders with whom we work
  • 4.4.c We systematically enter into contracts with upstream (e.g. data suppliers) and downstream (e.g. customers, model-using partners) players
  • 4.4.d We do not systematically implement this type of measure
Expl4.4 :

It is important to ensure that organisations upstream and downstream the chain identify and take responsibility for their segments of the value chain.


Q4.5 : Subcontracting of all or part of the data science activities
Data science activities subcontracted to a third party organisation(s) are subject to the same requirements your organisation applies to itself:

R4.5 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)
(Specific risk domain: subcontracting of data science activities)

  • 4.5.a Not concerned, we do not subcontract these activities | (Concerned / Not concerned)
  • 4.5.b Yes, our responses to this evaluation take into account the practices of our subcontractors
  • 4.5.c No, our answers to this evaluation do not apply to our subcontractors and on certain points they may be less advanced than us
Expl4.5 :

As in the reference frameworks of IS management (ISO 27001) or GDPR in the European Union, it is important not to dilute responsibilities in uncontrolled subcontracting chains. This should apply, for example, to consultants, freelancers who come to reinforce an internal team on a data science project. For example, it is possible to ask sub-contractors to carry out the same evaluation on their own account and share their results with you.


Q4.6 : Distribution of the value creation
In the case of data science projects where several partners work alongside your organisation to develop a model, and that model is or will be the subject of an economic activity:

R4.6 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)
(Specific risk domain: roles and responsibilities in data science projects are divided up multiple actors)

  • 4.6.a Our organisation carries out its data science activities autonomously, including the development of datasets and the exploitation of models for its own account. It is therefore not concerned | (Concerned / Not concerned)
  • 4.6.b At this stage we have not structured this aspect of multi-partner data science projects
  • 4.6.c In these cases, we contract the economic aspect of the relationship with the stakeholders involved upstream of the project
  • 4.6.d Our organisation has a policy that responsibly frames the sharing of value with the stakeholders involved
Expl4.6 :

When several partners work together to develop a model, it is important that the distribution of value resulting from an economic activity in which the model plays a role is made explicit and contractualized. In some cases this can be a complex issue, for example when a model is trained in a distributed manner over several datasets.

Ressources4.6 :


Section 5 - Using models responsibly and in confidence

[Using the models]

An AI model can be used as an automatic system, whose rules or criteria are not written in extenso and are difficult to explain, discuss or adjust. Using automatic systems based on AI models whose rules have been "learned" (and not defined and formalised) therefore questions the way organisations design and operate their products and services. It is important to preserve the responsiveness and resilience of organisations using those AI models, particularly in dealing with situations where AI models have led to an undesirable outcome for the organisation or its stakeholders. In addition, efforts are therefore needed on the interpretation and explanation of the choices made using these systems.

[⇧ back to the list of sections]
[⇩ next section]


Q5.1 : Exploitation of AI models for one's own account
If your organisation uses AI models on its own behalf:

R5.1 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)
(Specific risk domain: use of AI models, provision or operation of AI model-based applications for customers or third parties)

  • 5.1.a Our organisation does not use ML models on its own behalf | (Concerned / Not concerned)
  • 5.1.b An AI models register identifies all the models used by the organisation and is kept up-to-date
  • 5.1.c For each model there is an owner defined, identifiable and easily contactable
  • 5.1.d For each model, we systematically carry out a risk assessment following any incidents, failures or biases
  • 5.1.e Monitoring tools are put in place to ensure continuous monitoring of systems based on AI models and can trigger alerts directly to the team in charge
  • 5.1.f For each model, we define and test a procedure for suspending the model and a degraded operating mode without the model, in order to prepare for the case where the model is subject to failure or unexpected behaviour
  • 5.1.g For each model, we study its entire lifecycle (all the steps and choices that led to its development and evaluation), as well as its conditions and limits of validity, in order to understand the model before using it
  • 5.1.h We always use the models for uses in accordance with their conditions and limits of validity
  • 5.1.i We have not yet put in place such measures
Expl5.1 :

Using automatic systems based on models whose rules have been "learned" (and not defined and formalised) questions the way organisations design and operate their products and services. It is important to assess the consequences and reactions in the event of an incident. Furthermore, it is important that persons in charge is clearly identified so that no stakeholder is left helpless in the face of an unexpected or inappropriate consequence. Finally, it is important to consider the "conditions and limits of validity" of the models used to ensure that the intended use is appropriate.


Q5.2 : Development of AI models on behalf of third parties
If your organisation provides or operates AI model-based applications to customers or third parties:

R5.2 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)
(Specific risk domain: use of AI models, provision or operation of AI model-based applications for customers or third parties)

  • 5.2.a Our organisation does not provide its customers or third parties, nor does it operates on behalf of third parties, with applications based on ML models | (Concerned / Not concerned)
  • 5.2.b An AI models register identifies all models or applications used by its customers and/or by the organisation on behalf of third parties, and is kept up-to-date
  • 5.2.c For each model or application for a customer or a third party we have a defined, identifiable and easily reachable owner
  • 5.2.d For each model or application for a customer or a third party, we systematically carry out a risk assessment resulting from possible incidents, failures, biases, etc., in order to identify the risks involved
  • 5.2.e Monitoring tools are in place to ensure continuous monitoring of ML systems and can trigger alerts directly to the responsible team
  • 5.2.f For each model or application for a customer or a third party, we define and test a procedure for suspending the model and a degraded operating mode without the model, in order to prepare for the case where the model is subject to failure or unexpected behaviour
  • 5.2.g For each model or application for a client or third party, we study its entire lifecycle and its conditions and limits of validity to understand the model before using it
  • 5.2.h We supply our customers or operate on their behalf with models or applications for uses in accordance with their conditions and limits of validity
  • 5.2.i We have not yet put in place such measures
Expl5.2 :

Using automatic systems based on models whose rules have been "learned" (and not defined and formalised) questions the way organisations design and operate their products and services. It is important to assess the consequences and reactions in the event of an incident. Furthermore, it is important that persons in charge is clearly identified so that no stakeholder is left helpless in the face of an unexpected or inappropriate consequence. Finally, it is important to consider the "conditions and limits of validity" of the models used to ensure that the intended use is appropriate.


Q5.3 : Management of problematic predictions, bypass process, human agency
Automatic systems, especially when based on AI models, are used in production generally to gain efficiency. By nature, they occasionally generate undesirable results for the organisation and its stakeholders (e.g. wrong prediction), as they will never achieve 100% performance.

R5.3 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)
(Specific risk domain: use of AI models, provision or operation of AI model-based applications for customers or third parties)

  • 5.3.a Our organisation does not use AI models on its own behalf or on behalf of its clients, and does not provide its clients with applications based on AI models | (Concerned / Not concerned)
  • 5.3.b We implement AI models in integrated automatic systems, without mechanisms to overcome or avoid undesirable results due to model predictions
  • 5.3.c We integrate, in automatic systems based on AI models, the functionalities to manage these cases of undesirable results. For such cases, we set up mechanisms allowing a human operator to go against an automatic decision to manage such undesirable results or incidents
  • 5.3.d In addition to incident management mechanisms, in automatic systems based on AI models, when the confidence interval for the automatic decision is not satisfactory a human operator is called upon
  • 5.3.e We systematically apply the principle of "human agency", the outputs of the AI models that we implement are used by human operators, and do not serve as determinants for automatic decisions
Expl5.3 :

Using automatic systems based on models whose rules have been "learned" (and not defined and formalised) questions the way organisations design and operate their products and services. It is important to preserve the responsiveness and resilience of the organisation.

Ressources5.3 :

Q5.4 : Explicability and interpretability
Within data science projects aiming at developing AI models:

R5.4 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)

  • 5.4.a Our organisation is not yet familiar with the methods and tools for explaining and interpreting AI models
  • 5.4.b We are interested in the explicability and interpretability of AI models and are in dialogue with our stakeholders on this subject
  • 5.4.c We ensure that the models we develop provide, when relevant, at least a level of confidence together with each prediction made
  • 5.4.d We determine the best compromises between performance and interpretability for each model we develop, which sometimes leads us to opt for a model that is simpler to explain to the stakeholders
  • 5.4.e We master and implement advanced approaches for the explicability and interpretability of models
Expl5.4 :

Explanability and interpretability are key issues, in line with the growing demands for transparency, impartiality and accountability. In some cases, regulations even impose it. Technical resources such as SHAP or LIME provide a first-hand introduction to the topic (see resources associated with this assessment element).

Ressources5.4 :

Q5.5 : Transparency towards stakeholders interacting with an AI model
Your organisation uses for its own account, provides to its customers or operates on behalf of its customers applications based on AI models with which users can interact. What measure does it implement to inform users?

R5.5 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)
(Specific risk domain: use of AI models, provision or operation of AI model-based applications for customers or third parties)

  • 5.5.a Our organisation does not use AI models on its own behalf or on behalf of its clients, and does not provide its clients with applications based on AI models | (Concerned / Not concerned)
  • 5.5.b Users are not informed that they are interacting with an AI model developed with machine learning methods
  • 5.5.c An information notice is made available in the terms and conditions of the system or an equivalent document, freely accessible
  • 5.5.d The system or service is explicit to the user that an AI model is being used
  • 5.5.e The system or service provides the user with additional information on the results it would have provided in slightly different scenarios (e.g. "counterfactual explanations" such as the smallest change in input data that would have resulted in a given different output)
  • 5.5.f We are pionneers in using public AI registers, enabling us to provide transparency to our stakeholders and to capture user feedbacks
Expl5.5 :

Using automatic systems based on models whose rules have been "learned" (and not defined and formalised) questions the functioning of organisations but also the relationship of users to digital systems and services. In most cases it is important to inform users that they are not interacting with conventional business rules.

Ressources5.5 :

Q5.6 : Logging predictions from AI models
If your organisation provides or operates AI model-based applications to customers or third parties, to enable auditability of such applications and facilitate their continuous improvement, it is key to implement predictions logging. On that topic:

R5.6 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic) (Specific risk domain: use of AI models, provision or operation of AI model-based applications for customers or third parties)

  • 5.6.a Our organisation does not use AI models on its own behalf or on behalf of its clients, and does not provide its clients with applications based on AI models | (Concerned / Not concerned)
  • 5.6.b Logging predictions from AI models used in production is not yet systematically implemented
  • 5.6.c We systematically log all predictions from AI models used in production (coupled with the input data and the associated models references)
Expl5.6 :

Using automatic systems based on AI models whose rules have been learned questions the way organisations design and operate their products and services. It is important to preserve the responsiveness and resilience of organisations using those AI models, particularly in dealing with situations where AI models have led to an undesirable outcome for the organisation or its stakeholders. To that end, logging predictions from AI models used in production (coupled with the input data and the associated models references) is key to enable ex-post auditability on concrete use cases. It should be noted that predictions might involve personal data and be regulated by GDPR. Anonymization of processed data, when logged & made available to customers or internal operators, could be part of a solution to avoid leaking sensitive information.



Section 6 - Anticipating, monitoring and minimising the negative externalities of data science activities

[Negative externalities]

The implementation of an automatic system based on an AI model can generate negative social and environmental externalities. Awareness of this is essential, as well as anticipating, monitoring and minimising the various negative impacts.

[⇧ back to the list of sections]


Q6.1 : Environmental impact (energy consumption and carbon footprint)
About the environmental impact of the data science activity in your organisation:

R6.1 :
(Type: multiple responses possible)
(Select all the answer items that correspond to practices in your organisation)

  • 6.1.a At this stage we have not studied specifically the environmental impact of our data science activity or our AI models
  • 6.1.b We have developed indicators that define what we want to measure regarding the energy consumption and the carbon footprint of our data science activity or our models
  • 6.1.c We measure our indicators regularly
  • 6.1.d We include their measurements in the model identity cards
  • 6.1.e Monitoring our indicators on a regular basis is a formalised and controlled process, from which we define and drive improvement objectives
  • 6.1.f We consolidate an aggregated view of the energy consumtion and carbon footprint of our data science activities
  • 6.1.g This aggregated view is taken into account in the global environmental impact evaluation of our organization (e.g. carbon footprint, regulatory GHG evaluation, Paris Agreement compatibility score...)
  • 6.1.h The energy consumption and carbon footprint of our data science activity or our models is made transparent to our counterparts and the general public
Expl6.1 :

It is important to question and raise awareness of environmental costs. In particular one can: (i) measure the environmental cost of data science projects, (ii) publish transparently their environmental impact, expliciting the split between train and production phases, (iii) improve on these indicators by working on different levers (e.g. infrastructure, model architecture, transfer learning, etc.). It has been demonstrated that such choices can impact the carbon footprint of model training up to x100-x1000 (see resources below).

Ressources6.1 :

Q6.2 : Social impact In some cases, the implementation of an automatic system based on an AI model can generate negative externalities on upstream stakeholders (e.g. annotation of data), and on downstream stakeholders (e.g. automation of certain positions). Whenever you plan to develop or use an AI model:

R6.2 :
(Type: single answer)
(Select one answer only, which best corresponds to the level of maturity of the organisation on this topic)

  • 6.2.a At this stage we are not looking at the social impact of our data science activity or our AI models
  • 6.2.b In some cases we study the social impact
  • 6.2.c We study the social impact in each project
  • 6.2.d We study the social impact in each project and it is documented in the lifecycle documentation of each model
  • 6.2.e We study the social impact in each project, it is documented in the lifecycle documentation of each model, and we systematically engage in a dialogue with the relevant stakeholders upstream and downstream the value chain.
Expl6.2 :

It is important for an organisation to question and exchange with its stakeholders. This applies both downstream (e.g. automation of certain jobs) and upstream (e.g. data annotation tasks that can be very violent) the value chain.


Q6.3 : Ethics and non-maleficence
Within your organisation:

R6.3 :
(Type: multiple responses possible)
(Select all response items that correspond to practices in your organisation. Please note that some combinations would not be coherent)

  • 6.3.a At this stage we have not yet addressed the ethical dimension of our data science projects and activities
  • 6.3.b We are studying the ethical dimension of our data science projects and activities, it is a work in progress
  • 6.3.c Employees involved in data science activities receive training in ethics
  • 6.3.d Our organisation has adopted an ethics policy
  • 6.3.e For projects justifying it, we set up an independent ethics committee or ask for the evaluation of an organisation validating the ethics of the projects
Expl6.3 :

Working with large volumes of data, some of which may be sensitive, using automatic systems based on models whose rules have been "learned" (and not defined and formalised) raises questions about the way organisations function and the individual responsibility of each contributor. It requires considering carefully the uses of such sytems. It is therefore important for the organisation to ensure that ethical issues are not unknown to its collaborators. A recurring example is that some AI systems or services designed to adapt to user behaviour may influence users (e.g. by seeking to maximise their time of use or the money they spend) and present significant risks of manipulation or addiction.

Ressources6.3 :