Therefore, in any data system, data pipelines connecting sources and consumers are in N-to-N relation as the above. It is not an easy task to get the right data at the right time. Because of this complexity, data engineers so far used empirically proven queries while refusing to accept any changes depite consumer requirements demand so. This made data scientists’ job a lot tougher, having them spend nearly 80% of their time in data wrangling, since they need to access diverse datasets freely to run multiple experiments. So static data distribution can be a huge roadblock in the path toward data-driven organization.
Data hub is a measure to switch N-to-N structure into N-to-1-to-N structure by adding a mid-layer between data producer and data consumer. Data producer publishes data to data hub, and data consumers subscribe to topics they are interested stored in data hub. Data engineers can manage all data pipelines in a single view by connecting to data hub.
Kafka, developed by LinkedIn, is a widely known open source platform which implemented this idea. One of the commercial versions of Kafka is Informatica Data Integraion Hub.
In MDM, product matters less. What is more important is to figure out the schema of golden record. Where are the sources? Which feature comes from which source? When and how to extract data? How to transform and prepare them? and more. Answering these questions by interviewing related parties and experts within or out of organization is the key mission of MDM.
In order to prevent data abuse, many governments are busy to strengthen policy or law measures to safeguard private data. Collecting or analyzing personal data that can specifically target certain each person is widely forbidden by punitive regulations. In Korea, resident registration number, similar to social security number in the US, is not allowed to be gathered and used for any application. Also, features like masking, which hides sensitive data automatically, are almost essential requirements for all data-related products.
Soft Line Co., Ltd.
434 Sangam IT Tower, Mapo-gu, Seoul World Cup North Road
Email : email@example.com