{"id":17076,"date":"2022-02-22T09:41:01","date_gmt":"2022-02-22T08:41:01","guid":{"rendered":"https:\/\/www.codemotion.com\/magazine\/?p=17076"},"modified":"2023-05-30T14:42:10","modified_gmt":"2023-05-30T12:42:10","slug":"enabling-the-data-lakehouse","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/","title":{"rendered":"Enabling the Data Lakehouse"},"content":{"rendered":"\n<p>This article by Codemotion and <a href=\"https:\/\/partners.codemotion.com\/deloitte-italy?_ga=2.167535865.696103516.1644917877-393279832.1633096892\" target=\"_blank\" aria-label=\"Deloitte  (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">Deloitte <\/a>shares insights about the characteristics and benefits of Data Lakehouses &#8211; a combination of Data Lakes and Data Warehouses<\/p>\n\n\n\t\t\t\t<div class=\"wp-block-uagb-table-of-contents uagb-toc__align-left uagb-toc__columns-1  uagb-block-427152ac      \"\n\t\t\t\t\tdata-scroll= \"1\"\n\t\t\t\t\tdata-offset= \"30\"\n\t\t\t\t\tstyle=\"\"\n\t\t\t\t>\n\t\t\t\t<div class=\"uagb-toc__wrap\">\n\t\t\t\t\t\t<div class=\"uagb-toc__title\">\n\t\t\t\t\t\t\tTable Of Contents\t\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t<div class=\"uagb-toc__list-wrap \">\n\t\t\t\t\t\t<ol class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#an-introduction-to-data-lakes\" class=\"uagb-toc-link__trigger\">An introduction to Data Lakes<\/a><li class=\"uagb-toc__list\"><a href=\"#data-lake-v-data-lakehouse-whats-the-difference\" class=\"uagb-toc-link__trigger\">Data Lake v. Data lakehouse. What\u2019s the difference?<\/a><ul class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#acid-transactions\" class=\"uagb-toc-link__trigger\">ACID transactions<\/a><li class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#how-can-lakehouses-help-with-data-pipelines\" class=\"uagb-toc-link__trigger\">How can Lakehouses help with data pipelines:<\/a><\/li><\/ul><\/li><li class=\"uagb-toc__list\"><a href=\"#how-to-build-enhanced-data-pipelines\" class=\"uagb-toc-link__trigger\">How to Build Enhanced Data Pipelines<\/a><ul class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#apache-hudi\" class=\"uagb-toc-link__trigger\">Apache Hudi<\/a><ul class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#copy-on-write\" class=\"uagb-toc-link__trigger\">Copy on Write<\/a><li class=\"uagb-toc__list\"><li class=\"uagb-toc__list\"><a href=\"#merge-on-read\" class=\"uagb-toc-link__trigger\">Merge on Read<\/a><\/li><\/ul><li class=\"uagb-toc__list\"><a href=\"#alternative-solutions\" class=\"uagb-toc-link__trigger\">Alternative solutions<\/a><\/li><\/ul><\/li><\/ul><\/li><li class=\"uagb-toc__list\"><a href=\"#best-practices-for-building-data-lakehouses\" class=\"uagb-toc-link__trigger\">Best Practices for building Data Lakehouses<\/a><\/ul><\/ul><\/ol>\t\t\t\t\t<\/div>\n\t\t\t\t\t\t\t\t\t<\/div>\n\t\t\t\t<\/div>\n\t\t\t\n\n\n<h2 class=\"gb-headline gb-headline-865ca381 gb-headline-text\"><strong>An introduction to Data Lakes<\/strong><\/h2>\n\n\n\n<p>[<em>note: although \u201cdata\u201d is technically a plural noun, in this article, as it is widely the standard in the field, it is used as a collective, singular noun]<\/em><\/p>\n\n\n\n<p>Data Lakes are popular because of <strong>their ability to store large volumes of data cheaply and easily<\/strong>. A data lake is a large storage repository for data of all types, where the data is stored in its natural form, without any pre-processing or transformation. The main advantages of data lakes are that they <strong>enable organizations to store and access a much wider range of data types<\/strong> than traditional data warehouses, and can be used for data discovery and analytics.<\/p>\n\n\n\n<p>However, data lakes also have some disadvantages. One common issue is data quality. Because data is often aggregated from multiple sources and stored in its raw form, data quality can be poor. This can lead to inaccurate analysis and decision-making. Real-time operations is another common issue. Because data lakes are designed for storing and analyzing data over long periods of time, <strong>they are not well-suited for real-time operations<\/strong>. This can cause problems, slowing down business intelligence processes where data-driven decisions must be sped up as much as possible. A third common issue is performance: data lakes can be slow and cumbersome to use, which can lead to frustration and decreased productivity. The final common issue involves costs and lock-in. Data lakes can be expensive to set up and maintain, and the data can be difficult to export or share with other systems, which can lead to lock-in and decreased flexibility.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-8f55010b gb-headline-text\"><strong>Data Lake v. Data lakehouse. What\u2019s the difference?<\/strong><\/h2>\n\n\n\n<p>To overcome these limitations, many companies are turning to the Data Lakehouse model. <strong>The Data Lakehouse is a more structured and actively managed environment for data lakes<\/strong>, with features that make it easier to use, and get value from, the data. The Lakehouse model is an extension of the Data Lake concept, and addresses some of the limitations of traditional Data Lakes.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"614\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/09\/network-3424070_1280-1-1024x614.jpg\" alt=\"\" class=\"wp-image-16680\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/09\/network-3424070_1280-1-1024x614.jpg 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/09\/network-3424070_1280-1-300x180.jpg 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/09\/network-3424070_1280-1-768x461.jpg 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/09\/network-3424070_1280-1.jpg 1280w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Data Lakehouses are faster, more scalable, and have several built-in features.<\/figcaption><\/figure>\n\n\n\n<p>Data Lakehouses are a specific type of data lake that have been designed for real-time analysis and operations. <strong>Data Lakehouses are typically faster and more scalable<\/strong> than traditional data lakes, and have built-in features that support real-time ingestion and analysis, such as support for streaming data and time-series data.<\/p>\n\n\n\n<p>Lakehouses are built on a foundation of low-cost big data storage, which enables companies to create effective Data Pipelines with state-of-the-art performance. Lakehouses also include features that are essential for managing data at scale, such as:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>ACID transactions for reliable data processing<\/li>\n\n\n\n<li>A global namespace for managing data across multiple data stores<\/li>\n\n\n\n<li>A data catalog to help find and understand data<\/li>\n\n\n\n<li>Data quality and governance features to ensure that data is cleansed and standardized before use<\/li>\n<\/ul>\n\n\n\n<h3 class=\"gb-headline gb-headline-715a8483 gb-headline-text\"><strong>ACID transactions<\/strong><\/h3>\n\n\n\n<p>To understand why ACID transactions are necessary in Data Lakehouses, it\u2019s important to understand what ACID transactions are: a set of properties that guarantee that transactions are <strong>Atomic, Consistent, Isolated, and Durable<\/strong>. This means that when a transaction is executed, it is completed as a single unit and the data is left in a consistent state. Any inconsistency that may occur during the transaction is isolated from other transactions. The data is also durable, meaning that it is preserved even in the event of a system failure. In particular:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Atoms (or individual pieces of data) are not changed until the transaction is complete, and the change is seen by all interested parties.<\/li>\n\n\n\n<li>The data in a transaction is always consistent, i.e., it meets all the business rules that define it.<\/li>\n\n\n\n<li>Transactions are completely isolated from each other, so that one transaction can&#8217;t interfere with another.<\/li>\n\n\n\n<li>The results of a transaction are always durable, even if the power goes out or the system crashes.<\/li>\n<\/ul>\n\n\n\n<p>So, how can Lakehouses help companies with data pipelines to support business decisions?<\/p>\n\n\n\n<h3 class=\"gb-headline gb-headline-2791dc56 gb-headline-text\">How can Lakehouses help with data pipelines:<\/h3>\n\n\n\n<p>&#8211; By providing a foundation of low-cost big data storage, Lakehouses make it possible to build data pipelines that are both high-performance and low-cost.<\/p>\n\n\n\n<p>&#8211; The global namespace feature of Lakehouses helps data management across multiple data stores, making it easy to keep data in sync.<\/p>\n\n\n\n<p>&#8211; The data catalog feature of Lakehouses provides a single source of truth for understanding data, making it easy to find and use the data.<\/p>\n\n\n\n<p>&#8211; The data quality and governance features of Lakehouses help ensure that data is cleansed and standardized before use, so that users can be sure it meets their business requirements.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-e47d3de5 gb-headline-text\"><strong>How to Build Enhanced Data Pipelines<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"683\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-2-1024x683.jpg\" alt=\"\" class=\"wp-image-17086\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-2-1024x683.jpg 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-2-300x200.jpg 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-2-768x512.jpg 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-2-1536x1024.jpg 1536w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-2-600x400.jpg 600w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-2.jpg 1920w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Enhanced Data Pipelines offer many benefits.<\/figcaption><\/figure>\n\n\n\n<p>Data lakes provide a single repository for all data, which is essential for data-driven organizations. Ingestion pipelines are a key part of data lake infrastructure, and must be designed for scale, throughput, and reliability. <strong>In particular, this article is interested in \u201cenhanced data pipelines&#8221;<\/strong>, a term used to describe a data pipeline that has been enhanced to include features such as real-time ingestion.&nbsp;<\/p>\n\n\n\n<p>The purpose of an enhanced data pipeline is to improve the performance and efficiency of the data pipeline. In particular, an enhanced data pipeline can help to improve the following:<\/p>\n\n\n\n<p>1. Performance: An enhanced data pipeline offers improved performance by reducing the time it takes to extract, cleanse, and transform the data.<\/p>\n\n\n\n<p>2. Efficiency: An enhanced data pipeline improves efficiency by reducing the amount of storage required to store the data.<\/p>\n\n\n\n<p>3. Scalability: An enhanced data pipeline improves scalability by allowing a pipeline to handle more data.<\/p>\n\n\n\n<p>4. Flexibility: An enhanced data pipeline improves flexibility by allowing a pipeline to handle a variety of data formats and ingestion approaches.<\/p>\n\n\n\n<p>How can companies build these ingestion pipelines?&nbsp;<\/p>\n\n\n\n<h3 class=\"gb-headline gb-headline-7ef54e07 gb-headline-text\">Apache Hudi<\/h3>\n\n\n\n<p>One solution is to use Apache Hudi, an open-source framework developed by Uber in 2016 that helps with managing large datasets on distributed file systems. <strong>The framework also provides native support for <em>Atomicity, Consistency, Isolation, and Durability<\/em> (ACID)<\/strong> transactions on your Data Lake. Designed for high throughput and reliability, Apache Hudi can handle large volumes of data. It can be used to ingest data from a variety of sources, including Apache Kafka, Amazon Kinesis, and Amazon S3. Hudi is based on the Apache Beam platform, and is compatible with a variety of streaming engines, including Apache Spark, Apache Flink, and Google Cloud Dataflow.&nbsp;<\/p>\n\n\n\n<p>Under the hood, Hudi leverages the widely used Spark framework and supports 2 types of tables: &#8220;<em>Copy On Write<\/em>&#8221; and &#8220;<em>Merge On Read<\/em>&#8220;.<\/p>\n\n\n\n<h4 class=\"gb-headline gb-headline-1f3083c4 gb-headline-text\"><strong>Copy on Write<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data is stored in columnar file format (Parquet)<\/li>\n\n\n\n<li>Each Write action creates a new version of files<\/li>\n\n\n\n<li>Most suitable for Read-heavy batch workloads as the latest version of the dataset is always available<\/li>\n<\/ul>\n\n\n\n<h4 class=\"gb-headline gb-headline-afb949e2 gb-headline-text\"><strong>Merge on Read<\/strong><\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data is stored as a combination of columnar (Parquet) and row-based (Avro) storage files<\/li>\n\n\n\n<li>Row-based delta files are compacted and merge on a regular basis to build new versions of the target columnar files<\/li>\n\n\n\n<li>This storage type is better suited for Write-heavy streaming workloads<\/li>\n<\/ul>\n\n\n\n<p>One of the best features of Hudi is the different query modes available when reading data in tables. The \u201c<em>Last Snapshot<\/em>\u201d, \u201c<em>Incremental<\/em>\u201d or \u201c<em>Point-in-time<\/em>\u201d approaches are all possible.<\/p>\n\n\n\n<h3 class=\"gb-headline gb-headline-b8bbc9c0 gb-headline-text\">Alternative solutions<\/h3>\n\n\n\n<p>There are a number of alternative options for building similar Real-Time data pipelines. <strong>One popular option is using Google Dataflow SDK<\/strong>. Dataflow is a data processing platform that allows easy processing of data in parallel, and can be used to process data from a variety of sources, including Apache Kafka, HDFS, and MongoDB.<\/p>\n\n\n\n<p>Another option is Delta Lake, which is another OpenSource project providing features and performance similar to those available in Hudi.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-38660e66 gb-headline-text\"><strong>Best Practices for building Data Lakehouses<\/strong><\/h2>\n\n\n\n<p>There are many best practices when developing Data Lakehouses. Here are a few of the most important ones:&nbsp;<\/p>\n\n\n\n<p>1.<em> <strong>Develop a data governance plan<\/strong><\/em>. Essential for any data lakehouse, this plan should define who has access to which data, who is responsible for maintaining the data, and how the data will be cleansed and standardized.&nbsp;<\/p>\n\n\n\n<p>2. <em><strong>Create a data catalog<\/strong><\/em>. A key part of any data lakehouse, the catalog helps people find and understand the data stored in the data lake. The catalog should include information about the data, such as the source, format, and schema.&nbsp;<\/p>\n\n\n\n<p>3. <em><strong>Choose the best Data Platform<\/strong><\/em>. A good Data Platform is essential for managing an effective Data Lakehouse. A lot of options exist, mostly cloud-native or hybrid, but it\u2019s also possible to build a Data Platform on an OnPremises data center. The choice of solution must take into account specific concerns about Availability, Costs, Security and Interoperability.&nbsp;<\/p>\n\n\n\n<p>4. <em><strong>Cleanse and standardize the data<\/strong><\/em>. The quality of the data is one of the most important aspects of a Data Lakehouse. Data should be cleansed and standardized to ensure that it is accurate and trustworthy.&nbsp;<\/p>\n\n\n\n<p>5. <em><strong>Use big data analytics tools<\/strong><\/em>. Tools are essential for analyzing the data in a data lakehouse. These should include features for data exploration, visualization, and machine learning.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This article by Codemotion and Deloitte shares insights about the characteristics and benefits of Data Lakehouses &#8211; a combination of Data Lakes and Data Warehouses An introduction to Data Lakes [note: although \u201cdata\u201d is technically a plural noun, in this article, as it is widely the standard in the field, it is used as a&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/\">Read more<\/a><\/p>\n","protected":false},"author":58,"featured_media":17083,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":6,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[5244],"tags":[5571,9929],"collections":[],"class_list":{"0":"post-17076","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-cloud","8":"tag-big-data","9":"tag-cloud","10":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v27.5) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Enabling the Data Lakehouse - Codemotion Magazine<\/title>\n<meta name=\"description\" content=\"Discover how to get the most out of the data lakehouse approach and the differences with traditional data warehouses.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Enabling the Data Lakehouse\" \/>\n<meta property=\"og:description\" content=\"Discover how to get the most out of the data lakehouse approach and the differences with traditional data warehouses.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2022-02-22T08:41:01+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-05-30T12:42:10+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1918\" \/>\n\t<meta property=\"og:image:height\" content=\"1079\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Norman Di Palo\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Norman Di Palo\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/#article\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/\"},\"author\":{\"name\":\"Norman Di Palo\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#\\\/schema\\\/person\\\/55131e26e4c59236d55c04a6bb1363d0\"},\"headline\":\"Enabling the Data Lakehouse\",\"datePublished\":\"2022-02-22T08:41:01+00:00\",\"dateModified\":\"2023-05-30T12:42:10+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/\"},\"wordCount\":1512,\"publisher\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#organization\"},\"image\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/Data-Lakehouse-1.jpg\",\"keywords\":[\"Big Data\",\"Cloud\"],\"articleSection\":[\"Cloud\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/\",\"name\":\"Enabling the Data Lakehouse - Codemotion Magazine\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/#primaryimage\"},\"image\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/#primaryimage\"},\"thumbnailUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/Data-Lakehouse-1.jpg\",\"datePublished\":\"2022-02-22T08:41:01+00:00\",\"dateModified\":\"2023-05-30T12:42:10+00:00\",\"description\":\"Discover how to get the most out of the data lakehouse approach and the differences with traditional data warehouses.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/#primaryimage\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/Data-Lakehouse-1.jpg\",\"contentUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2022\\\/02\\\/Data-Lakehouse-1.jpg\",\"width\":1918,\"height\":1079},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/enabling-the-data-lakehouse\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"DevOps\",\"item\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Cloud\",\"item\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/devops\\\/cloud\\\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Enabling the Data Lakehouse\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#website\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#\\\/schema\\\/logo\\\/image\\\/\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2019\\\/11\\\/codemotionlogo.png\",\"contentUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2019\\\/11\\\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#\\\/schema\\\/logo\\\/image\\\/\"},\"sameAs\":[\"https:\\\/\\\/www.facebook.com\\\/Codemotion.Italy\\\/\",\"https:\\\/\\\/x.com\\\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/#\\\/schema\\\/person\\\/55131e26e4c59236d55c04a6bb1363d0\",\"name\":\"Norman Di Palo\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2024\\\/03\\\/norman-di-palo-100x100.jpeg\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2024\\\/03\\\/norman-di-palo-100x100.jpeg\",\"contentUrl\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/wp-content\\\/uploads\\\/2024\\\/03\\\/norman-di-palo-100x100.jpeg\",\"caption\":\"Norman Di Palo\"},\"description\":\"My name is Norman Di Palo, I\u2019m a Robotics and Artificial Intelligence student, researcher and consultant from Rome, Italy. I'm a public speaker and I've given several talks at tech events. I am founder and consultant for startups in Rome and Palo Alto. I write about my work and research on my blog, that is read by tens of thousands of people. I mostly enjoy robotics, deep learning, design, vinyls, and good coffee.\",\"url\":\"https:\\\/\\\/www.codemotion.com\\\/magazine\\\/author\\\/norman-di-palo\\\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Enabling the Data Lakehouse - Codemotion Magazine","description":"Discover how to get the most out of the data lakehouse approach and the differences with traditional data warehouses.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/","og_locale":"en_US","og_type":"article","og_title":"Enabling the Data Lakehouse","og_description":"Discover how to get the most out of the data lakehouse approach and the differences with traditional data warehouses.","og_url":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2022-02-22T08:41:01+00:00","article_modified_time":"2023-05-30T12:42:10+00:00","og_image":[{"width":1918,"height":1079,"url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg","type":"image\/jpeg"}],"author":"Norman Di Palo","twitter_card":"summary_large_image","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"Norman Di Palo","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/"},"author":{"name":"Norman Di Palo","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/55131e26e4c59236d55c04a6bb1363d0"},"headline":"Enabling the Data Lakehouse","datePublished":"2022-02-22T08:41:01+00:00","dateModified":"2023-05-30T12:42:10+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/"},"wordCount":1512,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg","keywords":["Big Data","Cloud"],"articleSection":["Cloud"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/","url":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/","name":"Enabling the Data Lakehouse - Codemotion Magazine","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg","datePublished":"2022-02-22T08:41:01+00:00","dateModified":"2023-05-30T12:42:10+00:00","description":"Discover how to get the most out of the data lakehouse approach and the differences with traditional data warehouses.","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg","width":1918,"height":1079},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/enabling-the-data-lakehouse\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"DevOps","item":"https:\/\/www.codemotion.com\/magazine\/devops\/"},{"@type":"ListItem","position":3,"name":"Cloud","item":"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/"},{"@type":"ListItem","position":4,"name":"Enabling the Data Lakehouse"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/55131e26e4c59236d55c04a6bb1363d0","name":"Norman Di Palo","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2024\/03\/norman-di-palo-100x100.jpeg","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2024\/03\/norman-di-palo-100x100.jpeg","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2024\/03\/norman-di-palo-100x100.jpeg","caption":"Norman Di Palo"},"description":"My name is Norman Di Palo, I\u2019m a Robotics and Artificial Intelligence student, researcher and consultant from Rome, Italy. I'm a public speaker and I've given several talks at tech events. I am founder and consultant for startups in Rome and Palo Alto. I write about my work and research on my blog, that is read by tens of thousands of people. I mostly enjoy robotics, deep learning, design, vinyls, and good coffee.","url":"https:\/\/www.codemotion.com\/magazine\/author\/norman-di-palo\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-600x400.jpg","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-600x600.jpg","author_info":{"display_name":"Norman Di Palo","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/norman-di-palo\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg",1918,1079,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-150x150.jpg",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-300x169.jpg",300,169,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-768x432.jpg",768,432,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-1024x576.jpg",1024,576,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-1536x864.jpg",1536,864,true],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg",1918,1079,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1.jpg",100,56,false],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-180x128.jpg",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-896x504.jpg",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-400x225.jpg",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-600x400.jpg",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2022\/02\/Data-Lakehouse-1-600x600.jpg",600,600,true]},"uagb_author_info":{"display_name":"Norman Di Palo","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/norman-di-palo\/"},"uagb_comment_info":0,"uagb_excerpt":"This article by Codemotion and Deloitte shares insights about the characteristics and benefits of Data Lakehouses &#8211; a combination of Data Lakes and Data Warehouses An introduction to Data Lakes [note: although \u201cdata\u201d is technically a plural noun, in this article, as it is widely the standard in the field, it is used as a&#8230;&hellip;","lang":"en","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/17076","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/58"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=17076"}],"version-history":[{"count":10,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/17076\/revisions"}],"predecessor-version":[{"id":20986,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/17076\/revisions\/20986"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/17083"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=17076"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=17076"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=17076"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=17076"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}