MPP has another benefit. A number of MPP systems use commercial hardwares enabling easy scale-out to flexibly increase capacity and performance. Due to this easy availability, many MPP systems have multiple copies of operating data as a backup. HDFS carries two copies of original data by default as backup, and Vertica provides ‘k-safety’ function which enables data in one node gets copied in another adjacent node. Of course, backup is not an exclusive option for MPP. But having a backup system for SMP entails heavy investment in either scaling-up or additional backup system, and the price goes exponentially up as data size grows.
The above image is an example where SSOT and MVOT are in good balance because one data warehouse is SSOT and multiple data marts are MVOT sourcing their data from data warehouse. However, in reality, you can easily find cases where the boundary between SSOT and MVOT collapses such as data marts with multi-sources other than data warehouse, and data warehouse taking data marts as source data. All data must be easily traceable from SSOT to MVOT and vice-versa. In order to do so, keeping a history of data transformation is important, and this is called ‘data lineage’.
As a central piece of bigger platform, all data storages must be compatible with other parts of the platform such as upstream data ingestion and preparation tools, downstream analysis and visualization tools, and other fellow data storages. If not, they become isolated data silos, which organizations must avoid to have because data becomes strategic asset only when they are well-circulated and well-enriched.
SOFTLINE Co., Ltd.
2F~3F door, 3 Yonghyeon-ro, Dukyang-gu, Goyang-si, Gyeonggi-do, Korea
Email : email@example.com