{"id":20300,"date":"2023-03-07T09:00:00","date_gmt":"2023-03-07T08:00:00","guid":{"rendered":"https:\/\/www.codemotion.com\/magazine\/?p=20300"},"modified":"2023-06-23T15:11:49","modified_gmt":"2023-06-23T13:11:49","slug":"mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/","title":{"rendered":"MapReduce Not Dead: Here\u2019s Why It\u2019s Still Ruling in the Cloud"},"content":{"rendered":"\n<p>MapReduce is a popular programming model widely used in data services and<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.codemotion.com\/magazine\/devops\/cloud\/is-distributed-cloud-the-future-of-cloud-architecture\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> cloud frameworks<\/a>. It plays a central role in the <strong>processing of big data sets, using distributed algorithms and potentially massive parallel operations<\/strong>. Though it is based on quite simple principles, it is recognised by many engineers as one of the most<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/medium.com\/@oscarstiffelman\/a-brief-history-of-mapreduce-97aec97df8ff\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> important tech innovations<\/a> in recent years. However, some of the originators of the technology have now abandoned it in favour of different data processing frameworks. So is MapReduce still vital as it was a few years ago?<\/p>\n\n\n\n<p>MapReduce is used for a wide variety of data-intensive operations, vital for today&#8217;s cloud operations and <strong><a aria-label=\"rapidly developing machine learning applications (opens in a new tab)\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/top-ai-trends-in-software-development-you-need-to-watch-out-in-2023\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\">rapidly developing machine learning applications<\/a><\/strong>. It can be used for searching, sorting, indexing and numerous other statistical procedures. Given the prevalence of cloud-based systems and frameworks, MapReduce is also well-suited to multi-cluster, parallel and other distributed environments where performance is key.<\/p>\n\n\n\n<p>For this article, we&#8217;ve teamed up with leading technology company,<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.luxoft.com\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> Luxoft<\/a>, to understand what MapReduce is and how it&#8217;s used, especially in big data and machine learning.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-introduction-to-mapreduce\"><strong>Introduction to MapReduce<\/strong><\/h2>\n\n\n\n<p>In a world where parallel processing, cloud infrastructures and vast datasets have become the norm, MapReduce plays an essential role in helping engineers to sort through the masses of information. It has features that are well-suited for massive networked infrastructures.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-what-is-mapreduce\"><strong>What is MapReduce?<\/strong><\/h3>\n\n\n\n<p>MapReduce replaces some functions that might previously be carried out by search operations in relational database management systems (RDBMS). <strong>But MapReduce is more than simply a replacement for databases.<\/strong> It is not a data storage system as such, and it can interact equally with databases or filesystems. It is, rather, an algorithmic framework for processing datasets into more usable structures.<\/p>\n\n\n\n<p>Though there are several different implementations, the term &#8216;<strong>MapReduce&#8217; was first used by Google for its own proprietary web indexing technology<\/strong>. Google used it to supersede a diverse set of algorithms that they had developed piecemeal over many years. MapReduce brought greater clarity and simplicity to their indexing and analysis. They filed a patent on it in 2004, but numerous open source implementations have since been developed to use it, including Apache Hadoop and<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/couchdb.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> CouchDB<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-how-it-works\"><strong>How it works<\/strong><\/h3>\n\n\n\n<p>At its core, MapReduce is a relatively simple data processing framework that utilises a<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.jstatsoft.org\/article\/view\/v040i01\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> Split-Apply-Combine strategy<\/a>. T<strong>his approach, also known as &#8216;divide and conquer&#8217;, means breaking a problem down into small chunks, processing and then reassembling the pieces<\/strong>. For MapReduce, the problem domain is typically huge and often unstructured datasets, where it is necessary to find or create some order to make them usable.<\/p>\n\n\n\n<p>The functionality can be broken down, as the name suggests, into two key processes, &#8216;map&#8217; and &#8216;reduce&#8217;, that are commonly used in functional programming. Crucially, both these procedures can be carried out either sequentially or in parallel, yielding enormous speed advantages when implemented in cloud-based infrastructures.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-advantages-and-disadvantages\"><strong>Advantages and disadvantages<\/strong><\/h3>\n\n\n\n<p>MapReduce has proved more broadly useful than initially intended. Google first introduced (and patented) it just for their indexing operations. However, it&#8217;s since proved invaluable for a whole host of large-scale data problems. <strong>Its logic is used to model everything from human genome decoding to natural language processing<\/strong>. And it powers the analysis of the huge datasets that are needed for today&#8217;s AI applications and machine learning systems. In each of these situations, its main benefit is its amenability to massively parallel processing architectures.<\/p>\n\n\n\n<p>However, some have considered it too disk-oriented, potentially involving a lot of resource-intensive IO operations. It is also suggested that its tendency to rebuild indexes rather than make incremental changes may be unnecessary and wasteful. For these reasons, Google itself has moved on to utilise different frameworks like<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/research.google\/pubs\/pub36726\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> Percolator<\/a> and<a href=\"https:\/\/www.the-paper-trail.org\/post\/2014-06-04-paper-notes-stream-processing-at-google-with-millwheel\/\" class=\"ek-link\"> MillWheel<\/a>, which leverage streaming and updates rather than batch processing, avoiding full index reconstruction. However, Google&#8217;s use cases may not be typical. Not all data processors favour this approach and MapReduce still has many adherents.<\/p>\n\n\n\n<p>Clearly, implementations need care with their partition functions, since excess data writing at this stage can seriously impact performance. But MapReduce still offers a simple model with excellent speed, scalability and flexibility as well as the resilience and security that are important for public-facing applications.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-mapreduce-and-big-data\"><strong>MapReduce and Big Data<\/strong><\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"518\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/08\/BigData_2267x1146_trasparent-1024x518.png\" alt=\"Big Data\" class=\"wp-image-7724\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/08\/BigData_2267x1146_trasparent-1024x518.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/08\/BigData_2267x1146_trasparent-300x152.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/08\/BigData_2267x1146_trasparent-768x388.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2020\/08\/BigData_2267x1146_trasparent.png 1200w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Big Data is where Map Reduce really comes into its own. Wrangling huge datasets can be computationally expensive, so this is where it&#8217;s vital to make the best use of resources.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-is-mapreduce-the-answer-to-your-big-data-problem\"><strong>Is MapReduce the answer to your big data problem?<\/strong><\/h2>\n\n\n\n<p>The modern digital landscape is comprised of vast amounts of data. Machine learning relies on detecting patterns found in these enormous datasets. So a technology that can bring order to unstructured data is imperative for effective systems. <strong>But such systems need to be able to work at scale<\/strong>. That means massively parallel processing and extensive cloud distributions must be leveraged. MapReduce makes use of distributed file systems rather than centralised processing to make the necessary performance gains.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-distributed-algorithms-how-they-work\"><strong>Distributed algorithms: how they work<\/strong><\/h2>\n\n\n\n<p>As we already noted, the key operations of the MapReduce framework are quite simple in principle and inherently suited to distributed processing. Processing can occur across nodes within a local network or may be distributed globally through cloud infrastructures. Often, each node has access only to data gathered locally, thus reducing transfer costs across the network. The following three steps then are the basis of the process:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>The processing nodes use the <strong>map<\/strong> function to sort or filter data based on chosen properties.<\/li>\n\n\n\n<li>Nodes localise data by keys generated by the map function. This is known as the <strong>shuffle<\/strong> function.<\/li>\n\n\n\n<li>Data is then processed with the <strong>reduce<\/strong> operation to generate summarised results like counts, averages or other output requirements.<\/li>\n<\/ol>\n\n\n\n<p><a aria-label=\" (opens in a new tab)\" href=\"https:\/\/hadoop.apache.org\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\">Apache Hadoop<\/a> is a good example that employs this approach. It&#8217;s an open source framework for distributed computing that uses the MapReduce model to enable scaling from a single machine to potentially thousands of cloud-based nodes. It supports distributed shuffles and works with parallel file systems to increase performance and manage node failures in the event of hardware problems.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-how-big-e-commerce-uses-mapreduce\"><strong>How Big e-Commerce uses MapReduce<\/strong><\/h2>\n\n\n\n<p>An important use case of MapReduce is e-commerce. Companies like Amazon, eBay and Alibaba use the framework along with cloud technologies to generate sales initiatives.<strong> A typical example is the identification of targetable products based on users\u2019 interests or previous buying behaviour<\/strong>. Such assessments can draw on a wide array of data from many different sources.<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/docs.aws.amazon.com\/emr\/latest\/ManagementGuide\/emr-what-is-emr.html\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> Amazon&#8217;s Elastic MapReduce (EMR)<\/a> is a commonly used implementation for such applications.<\/p>\n\n\n\n<p>EMR uses a cluster of Amazon EC2 nodes. The master node coordinates the distribution of tasks and data to the core and task nodes that carry out the processing. <strong>The data is typically stored as files on each node and passes sequentially through the processing stages. <\/strong>On completion, the results can be written to a location such as an Amazon S3 bucket. EMR thus provides an easily-deployed, efficient and scalable solution for big e-commerce data processing requirements. Crucially, EMS can also use server-side and client-side encryption to protect sensitive customer data.<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"590\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/12\/headless-ecommerce-e1640018954352-1024x590.webp\" alt=\"Headless CMS Ecommerce\" class=\"wp-image-16643\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/12\/headless-ecommerce-e1640018954352-1024x590.webp 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/12\/headless-ecommerce-e1640018954352-300x173.webp 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/12\/headless-ecommerce-e1640018954352-768x442.webp 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/12\/headless-ecommerce-e1640018954352-1536x885.webp 1536w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2021\/12\/headless-ecommerce-e1640018954352-2048x1179.webp 2048w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Companies like Amazon, eBay and Alibaba leverage on MapReduce.<\/figcaption><\/figure>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-uses-in-machine-learning-and-multicore\"><strong>Uses in Machine Learning and Multicore<\/strong><\/h2>\n\n\n\n<p><a href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/machine-learning\/6-courses-to-dive-deep-into-machine-learning-in-2022\/\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">Machine learning<\/a> depends on large datasets for its effectiveness. But the processing involved can be computationally expensive. Traditionally, programmers looking for performance gains have endeavoured to speed up individual algorithms. While great ingenuity has yielded some impressive results, there are limits to how much can be done in a singular computational space.<\/p>\n\n\n\n<p>We are by now very familiar with the concept of multicore processors, but they are only as useful as the algorithms developed to make good use of them. <strong>For many years, programmers working on real-time systems have needed to consider carefully how to manage multi-threaded applications. The solutions are not always obvious<\/strong>. In particular, it has taken some work to optimise machine learning processes to use multicore processes. This extends naturally to massive parallelisation across networks. MapReduce has been key in the development of such approaches.<\/p>\n\n\n\n<p>It is an indication of MapReduce&#8217;s flexibility and multiple applicability that it can be used as much in multicore frameworks as in networked parallelisation. <strong>However, with multicore use, it requires less failover provision, so the architecture can be lighter<\/strong>. Using the MapReduce framework enables a number of algorithmic approaches such as locally weighted linear regression (LWLR), naive Bayes (NB) and neural networks, that have been important for the development of ML technologies.<\/p>\n\n\n\n<p>As data collection continues to expand, amalgamation and processing techniques need to scale to match so that ML applications can benefit. <strong>MapReduce is still proving an invaluable framework in the cloud for ML and data analytics of all sorts that power today&#8217;s e-Commerce,<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.codemotion.com\/magazine\/infographics\/iot-trends-and-buzzwords-today\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> IoT<\/a> and numerous web services<\/strong>. To discover more, check out<a aria-label=\" (opens in a new tab)\" href=\"https:\/\/www.luxoft.com\/\" target=\"_blank\" rel=\"noreferrer noopener\" class=\"ek-link\"> Luxoft<\/a> to see how they use MapReduce in innovative real-world technologies.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><a href=\"https:\/\/career.luxoft.com\/locations\/italy\/?utm_source=Codemotion&amp;utm_medium=banner&amp;utm_campaign=articles\" target=\"_blank\" rel=\"noreferrer noopener\"><img loading=\"lazy\" decoding=\"async\" width=\"976\" height=\"251\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/01\/Banner_Luxoft_dec2-v4.png\" alt=\"\" class=\"wp-image-19912\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/01\/Banner_Luxoft_dec2-v4.png 976w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/01\/Banner_Luxoft_dec2-v4-300x77.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/01\/Banner_Luxoft_dec2-v4-768x198.png 768w\" sizes=\"auto, (max-width: 976px) 100vw, 976px\" \/><\/a><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>MapReduce is a popular programming model widely used in data services and cloud frameworks. It plays a central role in the processing of big data sets, using distributed algorithms and potentially massive parallel operations. Though it is based on quite simple principles, it is recognised by many engineers as one of the most important tech&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/\">Read more<\/a><\/p>\n","protected":false},"author":64,"featured_media":20303,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":6,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[16],"tags":[],"collections":[],"class_list":{"0":"post-20300","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-big-data","8":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Why MapReduce Still Rules the Cloud - Codemotion<\/title>\n<meta name=\"description\" content=\"MapReduce is still a vital tool for big ecommerce companies. It&#039;s also becoming more and more popular in machine learning. Read on!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"MapReduce Not Dead: Here\u2019s Why It\u2019s Still Ruling in the Cloud\" \/>\n<meta property=\"og:description\" content=\"MapReduce is still a vital tool for big ecommerce companies. It&#039;s also becoming more and more popular in machine learning. Read on!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-03-07T08:00:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-06-23T13:11:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1280\" \/>\n\t<meta property=\"og:image:height\" content=\"853\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Codemotion\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Codemotion\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"7 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/\"},\"author\":{\"name\":\"Codemotion\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/201bb98b02412383686cced7521b861c\"},\"headline\":\"MapReduce Not Dead: Here\u2019s Why It\u2019s Still Ruling in the Cloud\",\"datePublished\":\"2023-03-07T08:00:00+00:00\",\"dateModified\":\"2023-06-23T13:11:49+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/\"},\"wordCount\":1454,\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png\",\"articleSection\":[\"Big Data\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/\",\"name\":\"Why MapReduce Still Rules the Cloud - Codemotion\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png\",\"datePublished\":\"2023-03-07T08:00:00+00:00\",\"dateModified\":\"2023-06-23T13:11:49+00:00\",\"description\":\"MapReduce is still a vital tool for big ecommerce companies. It's also becoming more and more popular in machine learning. Read on!\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#primaryimage\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png\",\"width\":1280,\"height\":853},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI\/ML\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Big Data\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"MapReduce Not Dead: Here\u2019s Why It\u2019s Still Ruling in the Cloud\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Codemotion.Italy\/\",\"https:\/\/x.com\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/201bb98b02412383686cced7521b861c\",\"name\":\"Codemotion\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/cropped-codemotionlogo-150x150.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/cropped-codemotionlogo-150x150.png\",\"caption\":\"Codemotion\"},\"description\":\"Articles wirtten by the Codemotion staff. Tech news, inspiration, latest treends in software development and more.\",\"sameAs\":[\"https:\/\/x.com\/CodemotionIT\"],\"url\":\"https:\/\/www.codemotion.com\/magazine\/author\/codemotion-2\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Why MapReduce Still Rules the Cloud - Codemotion","description":"MapReduce is still a vital tool for big ecommerce companies. It's also becoming more and more popular in machine learning. Read on!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/","og_locale":"en_US","og_type":"article","og_title":"MapReduce Not Dead: Here\u2019s Why It\u2019s Still Ruling in the Cloud","og_description":"MapReduce is still a vital tool for big ecommerce companies. It's also becoming more and more popular in machine learning. Read on!","og_url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2023-03-07T08:00:00+00:00","article_modified_time":"2023-06-23T13:11:49+00:00","og_image":[{"width":1280,"height":853,"url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png","type":"image\/png"}],"author":"Codemotion","twitter_card":"summary_large_image","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"Codemotion","Est. reading time":"7 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/"},"author":{"name":"Codemotion","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/201bb98b02412383686cced7521b861c"},"headline":"MapReduce Not Dead: Here\u2019s Why It\u2019s Still Ruling in the Cloud","datePublished":"2023-03-07T08:00:00+00:00","dateModified":"2023-06-23T13:11:49+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/"},"wordCount":1454,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png","articleSection":["Big Data"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/","url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/","name":"Why MapReduce Still Rules the Cloud - Codemotion","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png","datePublished":"2023-03-07T08:00:00+00:00","dateModified":"2023-06-23T13:11:49+00:00","description":"MapReduce is still a vital tool for big ecommerce companies. It's also becoming more and more popular in machine learning. Read on!","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png","width":1280,"height":853},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/mapreduce-not-dead-heres-why-its-still-ruling-in-the-cloud\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"AI\/ML","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/"},{"@type":"ListItem","position":3,"name":"Big Data","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/"},{"@type":"ListItem","position":4,"name":"MapReduce Not Dead: Here\u2019s Why It\u2019s Still Ruling in the Cloud"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/201bb98b02412383686cced7521b861c","name":"Codemotion","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/cropped-codemotionlogo-150x150.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/cropped-codemotionlogo-150x150.png","caption":"Codemotion"},"description":"Articles wirtten by the Codemotion staff. Tech news, inspiration, latest treends in software development and more.","sameAs":["https:\/\/x.com\/CodemotionIT"],"url":"https:\/\/www.codemotion.com\/magazine\/author\/codemotion-2\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-600x400.png","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-600x600.png","author_info":{"display_name":"Codemotion","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/codemotion-2\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png",1280,853,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-150x150.png",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-300x200.png",300,200,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-768x512.png",768,512,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-1024x682.png",1024,682,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png",1280,853,false],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png",1280,853,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280.png",100,67,false],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-180x128.png",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-896x504.png",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-400x225.png",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-600x400.png",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/03\/cloud-gce1ca54aa_1280-600x600.png",600,600,true]},"uagb_author_info":{"display_name":"Codemotion","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/codemotion-2\/"},"uagb_comment_info":0,"uagb_excerpt":"MapReduce is a popular programming model widely used in data services and cloud frameworks. It plays a central role in the processing of big data sets, using distributed algorithms and potentially massive parallel operations. Though it is based on quite simple principles, it is recognised by many engineers as one of the most important tech&#8230;&hellip;","lang":"en","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/20300","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/64"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=20300"}],"version-history":[{"count":6,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/20300\/revisions"}],"predecessor-version":[{"id":21564,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/20300\/revisions\/21564"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/20303"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=20300"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=20300"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=20300"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=20300"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}