The concept of data warehouse may be traced back to last century. with the continuous growth of big data and developments of Hadoop ecosystem, offline data warehouse based on Hive/HDFS architect can rise. And recently years, Storm/Spark(Steaming)/Flink etc... real-time frameworks go up and have a rapid development. Every company need a real-time data solution in their system. In this article, we will talk about some typical real-time data architect in Chinese internet companies like Meituan,Netease and OPPO; selections of their storage and computing engines, also with layers division may inspire us in some point.

Four examples:

Meituan Flink based real-time data warehouse platform

from a functional perspective, Meituan's real-time computing platform contains jobs config\publish\status managements and resource managements. Resource management means multi-tenant resource isolation, delivery and deployment.

traditional data warehouse model

There are always 4 layers from bottom to top. ODS(Operational Data Store), DWD(Data Warehouse Detail), DWS(DWS, Data Warehouse Summary), ADS(ADS´╝îApplication Data Store) with Hive or spark for query.

real-time data warehouse

In real-time model, DWD & DWS always based on Kafka. considering of performance, dimensional data are always placed in HBase or Tair KV storage.

Quasi-real-time data warehouse model