{"id":204,"date":"2019-03-21T13:00:02","date_gmt":"2019-03-21T12:00:02","guid":{"rendered":"https:\/\/www.codemotion.com\/magazine\/light-up-the-spark-in-catalyst-by-avoiding-udf\/"},"modified":"2020-06-12T14:01:06","modified_gmt":"2020-06-12T12:01:06","slug":"light-up-the-spark-in-catalyst-by-avoiding-udf","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/","title":{"rendered":"Light up the Spark in catalyst by avoiding UDF"},"content":{"rendered":"<p><span class=\"firstcharacter\">W<\/span>e often talk about cloud computing, artificial intelligence and <span id=\"urn:enhancement-93d50e11\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/machine_learning\">machine learning<\/span>, but just as frequently we forget all the software architecture that is behind these projects and how the data is treated, manipulated and managed. The databases are in fact a vital component for every company and their <span id=\"urn:enhancement-fab1f52b\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/management\">management<\/span> is entrusted to very advanced <span id=\"urn:enhancement-b6508718\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/system\">systems<\/span> such as Apache Spark.<\/p>\n<p><a href=\"https:\/\/milan2018.codemotionworld.com\/speaker\/4259\/\" target=\"_blank\" rel=\"noopener noreferrer\">Adi Polak<\/a>, Cloud Developer Advocate at <span id=\"urn:enhancement-9f881c1e\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/microsoft\">Microsoft<\/span>, deepened this aspect in her talk at <a href=\"https:\/\/milan2018.codemotionworld.com\/conference\/\" target=\"_blank\" rel=\"noopener noreferrer\">Milan Codemotion 2018<\/a>. Apache Spark is a unified analytics engine for large-scale <span id=\"urn:enhancement-dd5868d1\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_processing\">data processing<\/span>; this project achieves high performance for both batch and streaming data, using a state-of-the-art DAG scheduler, a query optimizer, and a physical execution engine. Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it interactively from the Scala, <span id=\"urn:enhancement-ddd2c249\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/python_programming_language\">Python<\/span>, R, and SQL shells.<\/p>\n<p>Spark powers a stack of libraries including SQL and DataFrames, MLlib for <span id=\"urn:enhancement-7b7690c5\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/machine_learning\">machine learning<\/span>, GraphX, and Spark Streaming. You can combine these libraries seamlessly in the same application. Spark can run using its standalone cluster mode, on EC2, on Hadoop YARN, on Mesos, or on <span id=\"urn:enhancement-a02b79a6\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/kubernetes\">Kubernetes<\/span>.<\/p>\n<p>Spark facilitates the <span id=\"urn:enhancement-823cde23\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/implementation\">implementation<\/span> of both iterative algorithms, that visit their data set multiple times in a loop, and interactive\/exploratory data analysis, i.e., the repeated database-style querying of data. The latency of such applications may be reduced by several orders of magnitude compared to a MapReduce implementation (as was common in Apache Hadoop stacks). Among the class of iterative algorithms are the training algorithms for machine learning systems, which formed the initial impetus for developing Apache Spark.<\/p>\n<p>The component that supports the entire project is called Spark Core. It provides distributed task dispatching, scheduling, and basic I\/O functionalities, exposed through an <span id=\"urn:enhancement-4232600c\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/application_programming_interface\">application programming interface<\/span> (for <span id=\"urn:enhancement-d8cb9ae1\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/java\">Java<\/span>, <span id=\"urn:enhancement-4c3e7109\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/python_programming_language\">Python<\/span>, Scala, and R) centered on the RDD abstraction. Another \u201ccore\u201d component is Spark SQL that introduced a data abstraction called DataFrames. Spark SQL provides support for structured and semi-structured data and a domain-specific language (DSL) to manipulate DataFrames in Scala, <span id=\"urn:enhancement-5f5507fe\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/java\">Java<\/span>, or <span id=\"urn:enhancement-1059c1d7\" class=\"textannotation disambiguated wl-creative-work\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/python_programming_language\">Python<\/span>. It also provides SQL language support, with command-line interfaces and ODBC\/JDBC server. Although DataFrames lack the compile-time type-checking afforded by RDDs, as of Spark 2.0, the strongly typed <span id=\"urn:enhancement-86729dc0\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_set\">DataSet<\/span> is fully supported by Spark SQL as well.<\/p>\n<p><a style=\"width: 300px; height: 110px;\" href=\"https:\/\/milan2018.codemotionworld.com\/speaker\/4259\/\"><img decoding=\"async\" class=\"aligncenter wp-image-2474 size-full\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png\" alt=\"\" \/><\/a><\/p>\n<p>In particular, Adi Polak told us about Catalyst, an Apache Spark SQL query optimizer, and how to exploit it to avoid using UDF. The User-Defined Functions is a feature of Spark SQL to define new column-based functions that extend the vocabulary of Spark SQL&#8217;s DSL for transforming <span id=\"urn:enhancement-dee7792\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_set\">datasets<\/span>.<\/p>\n<p>Catalyst is based on functional programming constructs in Scala and designed with these key two purposes:<\/p>\n<p>&#8211; Easily add new optimisation techniques and features to Spark SQL;<br \/>\n&#8211; Enable external developers to extend the optimizer (e.g. adding data source specific rules, support for new data types, etc.).<\/p>\n<p><a style=\"width: 300px; height: 110px;\" href=\"https:\/\/milan2018.codemotionworld.com\/speaker\/4259\/\"><img decoding=\"async\" class=\"aligncenter wp-image-2474 size-full\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.50.56.png\" alt=\"\" \/><\/a><\/p>\n<p>Catalyst contains a general library for representing trees and applying rules to manipulate them. On top of this framework, it has libraries specific to relational query processing (e.g. expressions, logical query plans), and several sets of rules that handle different phases of query execution: analysis, logical optimisation, physical planning, and code generation to compile parts of queries to Java bytecode. For the latter, it uses another Scala feature, quasiquotes, that makes it easy to generate code at runtime from composable expressions. Catalyst also offers several public extension points, including external data sources and user-defined types. As well, Catalyst supports both rule-based and cost-based optimization.<\/p>\n<p><a style=\"width: 300px; height: 110px;\" href=\"https:\/\/milan2018.codemotionworld.com\/speaker\/4259\/\"><img decoding=\"async\" class=\"aligncenter wp-image-2474 size-full\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.51.59.png\" alt=\"\" \/><\/a><\/p>\n<p>Normally to manipulate the data present a SQL database with Spark it is possible to exploit a custom UDF. However, as Adi Polak reminds us, use the higher-level standard column-based functions with <span id=\"urn:enhancement-a5d8d4ac\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_set\">dataset<\/span> operators whenever possible before reverting to using your own custom UDF functions since UDFs are a blackbox for Spark and so it does not even try to optimise them.<\/p>\n<p>In fact, the abuse of custom UDF can lead to the loss of constant folding and of predicate pushdown.<\/p>\n<p>Constant folding is the process of recognising and evaluating constant expressions at compile time rather than computing them at runtime. While the predicate pushdown is a form of optimisation, it can drastically reduce query\/<span id=\"urn:enhancement-d1bbf3a3\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_processing\">processing<\/span> time by filtering out data earlier rather than later. Depending on the <span id=\"urn:enhancement-370cf5b3\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_processing\">processing<\/span> framework, predicate pushdown can optimise your query by doing things like filtering data before it is transferred over the network, filtering data before loading into memory, or skipping reading entire files or chunks of files.<\/p>\n<p>So as to not give up on these two features, we can then exploit Catalyst and implement a QueryExecution &amp; explain. QueryExecution represents the execution pipeline of a structured query (as a <span id=\"urn:enhancement-805c73e9\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_set\">dataset<\/span>) with execution stages (phases). QueryExecution is the result of executing a LogicalPlan in a SparkSession (and so you could create a <span id=\"urn:enhancement-13f8f857\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_set\">dataset<\/span> from a logical operator or use the QueryExecution after executing a logical operator).<\/p>\n<p>By exploiting this function it is possible to obtain performances with SQL databases that are much higher than UDF. So Adi Polak recommends using UDF only as a last resort or in any case using UDF or UDAF only to perform a single operation and never more than one at a time.<\/p>\n<p>Avoiding UDFs might not generate instant improvements, but at least it will prevent future performance issues, should the code change. Also, by using built-in Spark SQL functions we cut down our testing effort, as everything is performed on Spark\u2019s side. These functions are designed by JVM experts so UDFs are not likely to achieve better performance.<\/p>\n<p>For example, the following code can be replaced with notNull function:<\/p>\n<p><a style=\"width: 300px; height: 110px;\" href=\"https:\/\/milan2018.codemotionworld.com\/speaker\/4259\/\"><img decoding=\"async\" class=\"aligncenter wp-image-2474 size-full\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.54.31-1.png\" alt=\"\" \/><\/a><\/p>\n<p>Another piece of advice from Adi Polak is look under the hood and analyse Spark\u2019s execution plan with .explain(true). From the <span id=\"urn:enhancement-2624e98b\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/data_set\">Dataset<\/span> object or Dataframe object you can call the explain method like this:<\/p>\n<p><code>\/\/always check yourself using<br \/>\ndataframe.explain(true)<\/code><\/p>\n<p>The output of this function is the Spark\u2019s execution plan and this is a good way to notice wrong executions.<\/p>\n<p>In order to reduce the number of stages and shuffling, best practice is first to understand the stages and then search for a way to reduct the <span id=\"urn:enhancement-804d9bde\" class=\"textannotation disambiguated wl-thing\" itemid=\"http:\/\/data.wordlift.io\/wl01770\/entity\/complexity\">complexity<\/span>. Adi Polak continues showing us also an example of a calling method of a query with UDF:<\/p>\n<p><a style=\"width: 300px; height: 110px;\" href=\"https:\/\/milan2018.codemotionworld.com\/speaker\/4259\/\"><img decoding=\"async\" class=\"aligncenter wp-image-2474 size-full\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-11.10.51.png\" alt=\"\" \/><\/a><\/p>\n<p>From the filtering stage, you can see that casting takes place and it happens each time an entry goes through the UDF. In our case it cast it to string.<\/p>\n<p>In the physical plan we see what will actually happen in our executors, we see the partition filters, pushdown filters, the schema, the project method.<\/p>\n<p>And now without UDF:<\/p>\n<p><a style=\"width: 300px; height: 110px;\" href=\"https:\/\/milan2018.codemotionworld.com\/speaker\/4259\/\"><img decoding=\"async\" class=\"aligncenter wp-image-2474 size-full\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-11.15.02.png\" alt=\"\" \/><\/a><\/p>\n<p>As mentioned previously, without UDF we might benefit from the pushdown filter which will happen at the storage level, which means that it won\u2019t load all the data into Spark memory because the Spark process reads the data after the storage already filtered what\u2019s needed to be filtered.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>We often talk about cloud computing, artificial intelligence and machine learning, but just as frequently we forget all the software architecture that is behind these projects and how the data is treated, manipulated and managed. The databases are in fact a vital component for every company and their management is entrusted to very advanced systems&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/\">Read more<\/a><\/p>\n","protected":false},"author":35,"featured_media":205,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[16],"tags":[22],"collections":[],"class_list":{"0":"post-204","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-big-data","8":"tag-codemotion-milan","9":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Light up the Spark in catalyst by avoiding UDF - Codemotion Magazine<\/title>\n<meta name=\"description\" content=\"Codemotion and Facebook organized the Tech Leadership Training boot camp, heres a personal reportage from one of our attendees.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Light up the Spark in catalyst by avoiding UDF\" \/>\n<meta property=\"og:description\" content=\"Codemotion and Facebook organized the Tech Leadership Training boot camp, heres a personal reportage from one of our attendees.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2019-03-21T12:00:02+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-06-12T12:01:06+00:00\" \/>\n<meta name=\"author\" content=\"Claudio Davide Ferrara\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Claudio Davide Ferrara\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"6 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/\"},\"author\":{\"name\":\"Claudio Davide Ferrara\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/7e126fb002c44ab81b24ee243dc2a86d\"},\"headline\":\"Light up the Spark in catalyst by avoiding UDF\",\"datePublished\":\"2019-03-21T12:00:02+00:00\",\"dateModified\":\"2020-06-12T12:01:06+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/\"},\"wordCount\":1146,\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png\",\"keywords\":[\"Codemotion Milan\"],\"articleSection\":[\"Big Data\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/\",\"name\":\"Light up the Spark in catalyst by avoiding UDF - Codemotion Magazine\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png\",\"datePublished\":\"2019-03-21T12:00:02+00:00\",\"dateModified\":\"2020-06-12T12:01:06+00:00\",\"description\":\"Codemotion and Facebook organized the Tech Leadership Training boot camp, heres a personal reportage from one of our attendees.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#primaryimage\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png\",\"width\":1200,\"height\":610},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI\/ML\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Big Data\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/\"},{\"@type\":\"ListItem\",\"position\":4,\"name\":\"Light up the Spark in catalyst by avoiding UDF\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Codemotion.Italy\/\",\"https:\/\/x.com\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/7e126fb002c44ab81b24ee243dc2a86d\",\"name\":\"Claudio Davide Ferrara\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/abd7b110a5518b2f0154fdfb31cefbad4892317672726b9c38c3cd16af635abb?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/abd7b110a5518b2f0154fdfb31cefbad4892317672726b9c38c3cd16af635abb?s=96&d=mm&r=g\",\"caption\":\"Claudio Davide Ferrara\"},\"description\":\"I collaborate with HTML.it as news editor, writing news on the world of Information Technology, domotics, IoT, free software and Linux distributions.\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/author\/claudio-davide-ferrara\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Light up the Spark in catalyst by avoiding UDF - Codemotion Magazine","description":"Codemotion and Facebook organized the Tech Leadership Training boot camp, heres a personal reportage from one of our attendees.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/","og_locale":"en_US","og_type":"article","og_title":"Light up the Spark in catalyst by avoiding UDF","og_description":"Codemotion and Facebook organized the Tech Leadership Training boot camp, heres a personal reportage from one of our attendees.","og_url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2019-03-21T12:00:02+00:00","article_modified_time":"2020-06-12T12:01:06+00:00","author":"Claudio Davide Ferrara","twitter_card":"summary_large_image","twitter_image":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"Claudio Davide Ferrara","Est. reading time":"6 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/"},"author":{"name":"Claudio Davide Ferrara","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/7e126fb002c44ab81b24ee243dc2a86d"},"headline":"Light up the Spark in catalyst by avoiding UDF","datePublished":"2019-03-21T12:00:02+00:00","dateModified":"2020-06-12T12:01:06+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/"},"wordCount":1146,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png","keywords":["Codemotion Milan"],"articleSection":["Big Data"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/","url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/","name":"Light up the Spark in catalyst by avoiding UDF - Codemotion Magazine","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png","datePublished":"2019-03-21T12:00:02+00:00","dateModified":"2020-06-12T12:01:06+00:00","description":"Codemotion and Facebook organized the Tech Leadership Training boot camp, heres a personal reportage from one of our attendees.","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png","width":1200,"height":610},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/light-up-the-spark-in-catalyst-by-avoiding-udf\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"AI\/ML","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/"},{"@type":"ListItem","position":3,"name":"Big Data","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/big-data\/"},{"@type":"ListItem","position":4,"name":"Light up the Spark in catalyst by avoiding UDF"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/7e126fb002c44ab81b24ee243dc2a86d","name":"Claudio Davide Ferrara","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/abd7b110a5518b2f0154fdfb31cefbad4892317672726b9c38c3cd16af635abb?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/abd7b110a5518b2f0154fdfb31cefbad4892317672726b9c38c3cd16af635abb?s=96&d=mm&r=g","caption":"Claudio Davide Ferrara"},"description":"I collaborate with HTML.it as news editor, writing news on the world of Information Technology, domotics, IoT, free software and Linux distributions.","url":"https:\/\/www.codemotion.com\/magazine\/author\/claudio-davide-ferrara\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-600x400.png","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-600x600.png","author_info":{"display_name":"Claudio Davide Ferrara","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/claudio-davide-ferrara\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png",1200,610,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-150x150.png",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-300x153.png",300,153,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-768x390.png",768,390,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-1024x521.png",1024,521,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png",1200,610,false],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png",1200,610,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22.png",100,51,false],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-180x128.png",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-896x504.png",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-400x225.png",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-600x400.png",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/03\/Screen-Shot-2019-03-20-at-10.49.22-600x600.png",600,600,true]},"uagb_author_info":{"display_name":"Claudio Davide Ferrara","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/claudio-davide-ferrara\/"},"uagb_comment_info":0,"uagb_excerpt":"We often talk about cloud computing, artificial intelligence and machine learning, but just as frequently we forget all the software architecture that is behind these projects and how the data is treated, manipulated and managed. The databases are in fact a vital component for every company and their management is entrusted to very advanced systems&#8230;&hellip;","lang":"en","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/204","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/35"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=204"}],"version-history":[{"count":3,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/204\/revisions"}],"predecessor-version":[{"id":5496,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/204\/revisions\/5496"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/205"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=204"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=204"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=204"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=204"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}