跳到主要内容

DataOps博客

欢迎改变的地方

企业级数据仓库或电火花强化? 数据湖,仓库或湖屋?

By 张贴在 电子游戏厅新闻 2021年2月4日,

正如约翰·扎达所说:事情不可能只有一种解释. 因此,电子游戏网址大全的范例可能过于简单, incomplete or inaccurate – removing the complexity from the world which is actually one of its defining qualities.“所以电子游戏网址大全真的能以如此简单的方式评估这种有影响力的工具吗?

数据库到数据仓库:它是如何开始的 

Companies have long posited the merits of building large scale data infra结构 to support their analytics teams 和 goals. Operational databases are perfect for collecting data from applications 和 storing it for reference, 但对于对相同的数据进行分析查询,效果很差. 这就是为什么许多公司投资单独的数据平台来运行分析. 1993年,拉尔夫·金博尔发布了第一版《 数据仓库工具包. The previous modeling practice was adequate for accounting for the linear placement 和 changing of data but lacked the ability to represent complex relationships between data. This was the area where dimensional modeling really excelled 和 for that became the fundamental principle 为构建 a data platform for analytics. 

数据仓库, 也被称为企业数据仓库或EDW, 是否有一个可以进行分析以做出更明智决策的中央信息库. 数据流入数据仓库 from transactional systems, relational databases, 和 other sources, typically on a regular cadence. 业务分析师, 数据科学家, 决策者通过商业智能(BI)工具访问数据, SQL的客户, 以及其他分析应用程序. 如果您立志成为一个数据驱动的公司,它是一个关键的规范架构. 软件巨头如IBM和甲骨文设计了大型, 提供这两种服务器的数据仓库基础设施的昂贵产品, 软件, 以及描述性分析所需的服务. 与此同时, companies rushed to hire EDW administrators who were in charge of building the schema 和 strategizing how all data should flow into the 数据仓库. 然而, 数据仓库很快就与它的局限性联系起来了, 随着公司越来越渴望利用越来越多的数据. The 数据仓库 became crowded 和 bogged down with requests which killed its performance 和 tested its ability to deliver on service level agreements. 这样企业数据仓库就变成了四个字母的单词. 这在很大程度上推动了数据湖的发明,旨在解决规模问题. 但是数据湖真的取代了数据仓库吗?

数据湖的兴起

数据湖是半结构化的数据平台, 结构化, 非结构化, 和二进制数据, 在任何规模, 其特定目的是支持分析工作负载的执行. A data lake often refers to a data storage system built utilizing the HDFS file system 和 commonly referred to as Hadoop. The founders of Hadoop were all practitioners of the enterprise 数据仓库 ecosystem at tech companies (Google 和 Yahoo). They wanted analytics at a larger scale 和 implemented in a more cost effective way than traditional 数据仓库 solutions. Companies with a data lake could now collect all the data they wanted without worries of capacity or schema uniformity 和 the rush to transition to a data lake architecture was on. Take for instance this graphic below which shows the Google search trends for the two topics between the years of 2005 和 2014.

Hadoop和数据仓库搜索趋势乍一看,Hadoop似乎取代了数据仓库市场, 但在实践中, 从未发生过的. Ralph Kimball在2013年修改了数据仓库工具包,加入了数据湖的概念, 这是验证的关键点. 然而, most companies chose to keep their 数据仓库 和 build a data lake for largely 非结构化 和 streaming data. This was actually a smart decision because in reality a 数据仓库 和 data lake are good for slightly different things, 这两者都与现代数据体系结构有关. 此外,Hadoop也带来了它自己的一系列挑战. 它通常很难操作,需要非常专业和高要求的技能. Many companies struggled to get quick value 和 retain data lake professionals which made the cost of owning a data lake heavy on other dimensions. 所以这些公司犯了一个错误? 或者说,这是当时可能还不清楚的事情.

数据仓库与数据湖的问题 

The problem with this paradigm is that it considers one approach wrong while the other is right when in practice companies may choose to leverage a data lake or 数据仓库 both for foundationally sound reasons. 以下是一些想法……

何时使用数据仓库

  • 查询性能
  • 事务报告
  • 指示板
  • 结构化数据
  • 数据完整性

何时使用数据湖 

  • 大数据量
  • 非结构化和半结构化数据 
  • 流和时间相关的数据 
  • 数据归档

使用数据湖和数据仓库进行分析

另一种思考方法是从分析的角度. Let’s take an example of a retail store that wants to know more about their customers so they can provide personalized offers. 为了整理出客户资料, 公司可以使用交易历史等数据, 购买历史, address, 的名字, 等. These are all 结构化 data sources that often live in the enterprise 数据仓库 (System of Record) 和 might feed things like company dashboards. 其他数据,如网站流量, 社交媒体数据, 地理位置数据, 和 mobile app clickstream data are all 非结构化 sources 和 would likely live in the data lake (Systems of Engagement). 竖井中的每组数据只揭示了故事的一部分. 例如, 知道人们是否在社交媒体上称赞你是件好事, 但知道约翰·史密斯是否对你有好感会让你采取行动. 为了了解这一点,您需要将孤立的数据结合在一起.

EDW和EDL客户360个性化通过合并这些数据源,公司可以识别用户, 他们的行为, 并设计自动化操作来提供个性化的响应. 通过利用两个平台的优势, 公司可以更好地利用他们的人际交往能力, 他们的平台预算, 他们的数据治理.

从企业数据平台到云数据平台    

公共云的出现,改变了数据和分析的一切 云数据仓库集成. Many of the constraints of the enterprise 数据仓库 were associated with hardware server limitations. 当服务 雪花亚马逊红移 were launched they provided a level of scale 和 performance that were uncharacteristic of traditional 数据仓库 solutions. 云数据湖服务 也为用户消除了许多常见的障碍, 包括管理复杂的节点架构, 此外,提供的服务也大大减少了运营数据湖的复杂性. 这让位于EMA研究记录的概念 统一分析仓库 州:

“Within a few years, nearly every organization that ran a 数据仓库 also stood up a data lake. 这两者并存. 最初,这两个平台之间有一些数据共享,但仅此而已. Pressured by customer dem和s to run analytics across both the data lake 和 the 数据仓库, vendors on both sides began working toward a more complete integration of a warehouse 和 lake.”  

Two common approaches by modern vendors took form to address this: the data platform approach (e.g. 雪花、Amazon、Microsoft、谷歌和砖)和查询方法(e.g. Dermio, Kylogence和Asima). Depending on the competency centers of the organization they may choose a platform approach or a query approach depending on what will best facilitate the skills on their teams. 这些方法为数据分析提供了统一的方法, 结构, 数据的来源也不是那么重要. Given this new found uniformity it might be difficult for you to identify which solution is best for your organization, 但对于这个挑战,我有好消息.

电子游戏厅让您选择

电子游戏厅提供了一个 现代数据集成平台 为构建 智能数据管道. 智能数据管道设计用于连接到任何数据库, 数据仓库, or data lake service 和 provide quick value by making sure these platforms are full of useful, 可靠的, 和当前的数据. 智能数据管道有助于意图驱动设计, which means that you build the pipelines with a focus on the needed flow 和 transformations for the data 和 worry about the platform destinations later. 您可以针对任何范例进行设计,并在策略发生变化时逆转您的路线. An important feature for data engineers who may have to spend a majority of their time changing pipeline dynamics when destinations change. streamset不仅支持所有主要的 数据仓库和数据湖平台 包括云服务,但用户实际上可以 构建到多个目的地的管道.

控制集线器中的多云数据管道So if you are still unclear about the best solution for your data 和 analytics needs then mitigate your risk with 智能数据管道. They put you in control 和 remove the risk of being locked into a trend that no longer serves you. Because with 智能数据管道 the only attachment may be personal 和 as John Zada writes “humans are not particularly flexible when it comes to using their paradigms. 在内心深处,电子游戏网址大全是习惯的动物,有时是痴迷的动物.”

回到顶部

电子游戏网址大全使用cookie来改善您对电子游戏网址大全网站的体验. 单击“允许所有人同意”并继续访问电子游戏网址大全的网站. 隐私政策