Why Use Machine Learning to Improve Data Quality?
Increased data volumes put companies under pressure to systematically manage and control their data assets. In addition, common data management practices lack sufficient scalability and do not have the capacity to manage ever-increasing data volumes. Companies, therefore, need to rethink their data management. The good news is that substantial progress in artificial intelligence (AI) and machine learning (ML), in terms of learning from data and automating repetitive tasks, can support you in your data management activities.
Machine learning has proven its potential in real-world business settings: With an ML enabled data curation system, the curation costs for data cleansing, data transformation and deduplication could be reduced by 90%. (Stonebraker, Bruckner, and Ilyas 2013).
How Can Machine Learning Support our Data Management and Help us Improve our Data Quality?
In order to assess the role and potential, the Competence Center Corporate Data Quality (CC CDQ) collected and analyzed ML use cases from academic research, software vendors, and data management experts (Fadler & Legner, 2018). With our study, we aim to identify typical application scenarios that can help data managers find potential areas of application for ML in data management. By now, we developed a taxonomy for classification of uses cases and derived 11 typical application scenarios for machine learning in data management from 44 collected use cases.
Our study reveals that ML can be applied in all phases of the data life-cycle to achieve the following:
- To create and enrich data assets in an efficient, user-friendly way
- To maintain high-quality data by supporting proactive and reactive data maintenance as well as for data unification
- To manage the data life-cycle, especially when it comes to sensitive data and retiring data
- To increase the use of data by improving data discovery by users, specifically by data scientists
The following overview provides more details about the data life-cycle phases and ML application scenarios.
Although ML's use in data management is only in an early stage, the first implementations are very promising! For instance, Bosch has been able to almost completely automate the manual process of commodity code assignment in product master data creation with the help of machine learning. With this approach, Bosch can fulfill the increasing demand for this assignment task across the enterprise with a scalable solution. Find detailed information in Bosch's winning application for the CDQ Good Practice Award 2018.
The bottom line of our analysis is that ML has the potential to significantly enhance data management practices and improve data quality. ML allows for managing data assets in an intelligent and more scalable way, but also disrupts the way data is managed.
How Can I Get Access to Further Research Results?
In order to get access to further information, we kindly ask you to evaluate the benefits and implementation status of the identified application scenarios. Answering our questions should not take longer than 10 minutes.
Your benefits of participating in the survey:
- You will get immediate access to a presentation with ML application scenarios and a summary of our current findings
- You will receive the upcoming research paper "Managing Data as an Asset with the Help of Artificial Intelligence" in your inbox as soon as it is published
- You will receive the results of this study
Managing Data as an Asset with the Help of Artificial IntelligenceAs soon as our new study is available, you can request it here: Request Study now
Any questions about Machine Learning or comments on our study?