The concept of data warehouse may be traced back to last century. with the continuous growth of big data and developments of Hadoop ecosystem, offline data warehouse based on Hive/HDFS architect can rise. And recently years, Storm/Spark(Steaming)/Flink etc... real-time frameworks go up and have a rapid development. Every company need a real-time data solution in their system. In this article, we will talk about some typical real-time data architect in Chinese internet companies like Meituan,Netease and OPPO; selections of their storage and computing engines, also with layers division may inspire us in some point.
Four examples:
from a functional perspective, Meituan's real-time computing platform contains jobs config\publish\status managements and resource managements. Resource management means multi-tenant resource isolation, delivery and deployment.
There are always 4 layers from bottom to top. ODS(Operational Data Store), DWD(Data Warehouse Detail), DWS(DWS, Data Warehouse Summary), ADS(ADS,Application Data Store) with Hive or spark for query.
In real-time model, DWD & DWS always based on Kafka. considering of performance, dimensional data are always placed in HBase or Tair KV storage.