Data Management for Data Protection (GDPR)

Data protection regulations, such as GDPR, are fundamentally about data management

The GDPR Capability Model developed by the CC CDQ provides an action-oriented view of capabilities that need to be built up and established in order to comply with GDPR's set of requirements. The model also assists data managers also in identifying (re)design areas and, with the help of a maturity assessment model, monitoring the progress.

One of the key requirements of data protection is for an organization to achieve a higher level of data transparency throughout the entire data lifecycle. In early stages, organizations must precisely state what data they collect and why, and make sure that they have the necessary authorizations to do so. As they process data, they must be able to provide proof that they do so according to what they stated in the beginning and they must be in a position to communicate all relevant data records to individuals. In the final stages, they must delete personal data, and cope with various retention requirements.

These obligations apply differently throughout different data processing steps; therefore, the research team of the Competence Center Corporate Data Quality (CC CDQ) therefore set out to provide a data-centric view on data protection regulatory requirements, by analyzing their implications on the data lifecycle.

Data Lifecycle Management Regarding Data Protection

As a general rule, data lifecycle models depict the entire lifespan of data within an organization and can be broken down into three major steps:

  • The first one, which we call onboarding, covers all activities related to bringing data objects into an organization’s architecture. It contains data collection steps, from the sourcing of data objects to their deployment in data management systems.
  • The second one, which we call data usage, covers all activities related to the usage of data within an organization. It typically includes steps such as managing data quality, updating data objects and putting them to use through business processes.
  • The last step, which we call end-of-life, contains all activities concerning the phasing out of data objects, such as archiving and deletion.

CC CDQ reference data lifecycle model for data protection

The following table shows the impact of main data protection rights and requirements throughout the data lifecycle:

Requirement Impacted Data Lifecycle Phase(s)
  Onboarding Usage End-of-Life
Informational duties X X  
Right of access   X  
Right of deletion     X
Right of rectification   X  
Right of consent X X  
Documentation requirements   X X
Authorization reqirements   X  

Research News

Data lifecycle management for data protection is a key topic in CC CDQ's research activities in 2019 and a model that addresses the aforementioned requirements is currently under development. Its first iteration has been presented to and evaluated by CC CDQ members during our workshop in Hamburg in May 2019. It is currently being finalized, and augmented with (meta)data model extensions.

During the CC CDQ workshop in June 2019 in Zürich, we will hold a session dedicated to the topics of data identification and classification, and will further investigate data model extensions for personal data management.

At the time of data collection, GDPR introduces informational duties that organizations should fulfill. This means that organizations should clearly explain how they intend to use the collected data and ask individuals for consent if the planned processing activities require it.

This has three main implications:

  • Data usage policies must be established before data is collected (purpose of processing) and consent items should be defined beforehand (Define data requirements).
  • These defined purposes and consent items must be presented to the individual in written form at the time of data collection (Source data components).
  • The individual's agreement as well as their answers to consent items must be recorded; ideally, as separate data attributes (Create data).

Observing these three requirements requires organizations to rethink their data collection practices, carefully defining in advance what data will be collected and why, as well as planning for demonstrating compliance of processing. All information that relates to data processing compliance must be captured in order to fulfill accountability requirements. Thus, data models should be amended with new attributes containing compliance information, such as bases for processing, purposes, consent items and duration of processing. Furthermore, the purpose of processing as well as potential consent items must be translated into authorization mechanisms, so that collected data is used in subsequent business processes as agreed upon by the individual (Create data).

Once data objects are created, they must be deployed on relevant systems within an organization (Deploy data). Depending on the operating model, previously defined authorizations can impact the deployment. For instance, if processing purposes and consent items imply that only one business unit should process the data, then it should only be deployed on this unit's specific systems. If the organization operates a central data store, access should only be given to appropriate users and systems, e.g., in case of automated analytics scenarios (Use data).

During the time of processing, GDPR grants several data-related rights to individuals, which introduce new activities into the data lifecycle:

  • Right of access (Disclose data): At any time, individuals can request a complete extraction of their data records. This means that all storage instances and locations must be known and accessible and can be communicated to individuals.
  • Right of rectification (Change data): At any time, possibly following a right of access inquiry, individuals can request that changes be made to their data records. While existing lifecycle models usually consider data updates as an internal activity, it is imperative to include external triggers for updating data in order to comply with data protection regulations.
  • Right of portability (Disclose data): At any time, individuals can request to transfer their data to a third party. In this case, organizations may transmit data records to the individual in a machine-readable format – this response would be similar to that of an access request. In other cases, organizations may be required to transfer data records directly to the third-party recipient, in which case a dedicated process should be set up.

Here, the challenges are twofold. First, in order to operationalize these rights, organizations must gain a precise overview of where personal data objects are stored within their system landscape, and be in a position to pinpoint all storages instances related to one specific individual. This could be achieved, for instance, by recording storage systems as attributes in personal data objects, as well as by indicating a leading one (if applicable). Second, companies must establish channels to communicate with their users/customers about their personal data and implement their rights. Third, in the cases where data portability must occur directly between data processors, organizations must set up data transmission standards and channels for third-party communication.

As a rule, GDPR requires that data must be deleted once the processing purposes associated to them have ended. In practice, this translates into one of the following situations:

  • If the data processing agreement contained an expiration date, data must be deleted once said date has been reached. This can be recorded as an attribute at the time of collection, if applicable.
  • In certain cases, deletion will be triggered by a deletion request from the individual, which is one of GDPR’s data-related rights.

Fulfilling the right of deletion can be achieved by deleting the related data objects or by stripping them of attributes that enable the isolation of one individual, i.e., name, birth date, address. In fact, data that cannot be associated with a single individual (directly or indirectly) exits in the scope of data protection regulations. This second option is commonly referred to as anonymization (“pseudonymization” in GDPR).

The data end-of-life activities pose significant challenges for two main reasons:

  • Although an individual may request deletion of their data, an organization may be required to preserve it in compliance with other legal requirements (Archive data).
  • While deletion of in-production data objects may be achieved relatively easily, the same cannot be said for data that is contained in archives or backups.

Therefore, in order to facilitate compliance of end-of-life activities, organizations should set out to define a clear retention policy by reviewing all data retention requirements applicable to personal data objects and recording them as attributes. One other possibility would be to define data classifications based on retention rules and to be recorded as per-attribute metadata.

When it comes to backups, organizations can think in terms of the “beyond use” criteria. If backups are stored on magnetic tapes, data cannot be used in production and the argument could be made that organizations have a legitimate interest in operating with such a backup policy. When organizations process data deletion requests, they should keep a record containing a unique identifier along with a description of what types of data objects and attributes were deleted. In this way, if a backup is restored, the unique identifier can help identify data that had since been deleted.

If, however, organizations use cloud-based backup systems, more elaborate measures need to be put in place, e.g., pseudonymization functions for data that has been deleted.

The GDPR Capability Model

GDPR requires from organizations to consistently document the following:

  • Their customers' personal data that they hold (i.e. the scope – e.g. list of recorded attributes),
  • How the data was acquired (i.e. the origin – e.g. online form or e-mail),
  • How it was and is processed (i.e. the modalities – e.g. advanced analytics),
  • To what end it is processed (i.e. the purpose – e.g. targeted marketing),
  • With whom it is shared (i.e. the transmission – e.g. third parties, such as cloud service providers).

This information must be available for disclosure to authorities (describing an organization’s overall data processing practices) and individuals (e.g. when exercising the right of access), alike, and at any time.

Organizations must also overhaul the way they seek consent for processing their customers' personal data and make sure that consent can be renewed or withdrawn. Consent requests must be presented separately from general agreements, use simple language (i.e. legal and technical terms must be avoided) and feature visuals/pictograms, if possible. Consent to non-essential processing activities must be proposed as granular, opt-in items (e.g. one unticked box per processing activity in the case of an online form). Finally, organizations must document when and how consent was given.​

The GDPR Capability Model, developed by the CC CDQ, helps data managers identify areas for action and structure projects and initiatives. The model builds on a review and interpretation of legal texts, official guidelines, and industry practice reports. It was validated during break-out sessions and in bilateral projects. The model provides an action-oriented view of capabilities that need to be built up and established in order to comply with GDPR’s set of requirements. The capabilities refer to both processes (e.g. handling access requests or documenting processing activities) and technical measures (e.g. data identification or data removal). They feature legal references and a description of implications; which data managers can use to communicate their ideas to a broader audience.

The model also assists data managers also in identifying (re)design areas and, with the help of a maturity assessment model, monitoring the progress.

Capability Model for GDPR

More information on the GDPR Capability Model is available in the following:

For any inquiries, or if you would like to take part in one of our sessions, please contact Clément Labadie.

Our Authors

Free Webinar Series: How to Manage Data as an Asset

During our free webinar series, our experts will explain how to assess, develop and improve your data management activities using the Data Excellence Model, why a data strategy is key to transforming into a data-driven organization, and how to organize data assets with a data catalog. Register now for free!
Go to top