{"id":22735,"date":"2023-08-28T09:30:00","date_gmt":"2023-08-28T07:30:00","guid":{"rendered":"https:\/\/www.codemotion.com\/magazine\/?p=22735"},"modified":"2023-08-24T12:41:28","modified_gmt":"2023-08-24T10:41:28","slug":"python-and-databricks-a-dynamic-duo-for-data-dominance","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/","title":{"rendered":"Python and Databricks: A Dynamic Duo for Data Dominance"},"content":{"rendered":"\n<p>Synergizing Data Analysis: Python&#8217;s Versatility in Databricks<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"h-introduction\">Introduction<\/h1>\n\n\n\n<p>In this article, we\u2019ll describe what&nbsp; DataBricks is and why we may want or need to use it.<\/p>\n\n\n\n<p>Then, we\u2019ll show that DataBricks can work, among the others, with Python. In particular, we\u2019ll show how we can use the Notebooks that DataBricks provides us the exact way we use Jupyter Notebooks.<\/p>\n\n\n\n<p>The use case we\u2019ll implement at the end of this article, then, we\u2019ll show how easy it is to use Python to make data analysis and predictions with Machine Learning using the Notebooks in DataBricks.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"h-introducing-databricks\">Introducing DataBricks<\/h1>\n\n\n\n<p><a href=\"https:\/\/www.databricks.com\/\">DataBricks<\/a> is a Data Lakehouse Platform that \u201c<a href=\"https:\/\/www.databricks.com\/product\/data-lakehouse\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\"><em>combines the best elements of data lakes and data warehouses to help you reduce costs and deliver on your data and AI initiatives faster<\/em><\/a>.\u201d\u00a0<\/p>\n\n\n\n<p>The concept of Data Lakehouse arose only recently (circa 2020) and goes beyond the concepts and structures of Data Warehouses and Data Lakes.<\/p>\n\n\n\n<p>Let\u2019s see how.<\/p>\n\n\n\n<p>The concept and structure of <strong>Data Warehouses<\/strong> were born in the 1980s and it&#8217;s a model that gives us the possibility to have structured data &#8211; meaning text and numbers &#8211; in a structured environment &#8211; to simplify and clarify: a database that uses SQL.<\/p>\n\n\n\n<p>In recent years &#8211; circa in 2010 -, considering the need to organize somehow unstructured data &#8211; meaning: audio, images, and so on &#8211; the architecture of <strong>Data Lakes<\/strong> was born. This gives us the possibility to store unstructured data in a unique place for further analysis.<\/p>\n\n\n\n<p>One of the big disadvantages of Data Lakes is that to make BI reports we first need <a href=\"https:\/\/en.wikipedia.org\/wiki\/Extract,_transform,_load\">ETLs<\/a> to structure the data. Meaning: we need to create one (or more!) Data Warehouse inside the Data Lake, then we can analyze the data and make BI reports.<\/p>\n\n\n\n<p>A Data Lakehouse, instead, is a new system \u201c<a href=\"https:\/\/www.databricks.com\/blog\/2020\/01\/30\/what-is-a-data-lakehouse.html?itm_data=lakehouse-link-lakehouseblog\">that addresses the limitations of data lakes. A lakehouse is a new, open architecture that combines the best elements of data lakes and data warehouses<\/a>\u201d.<\/p>\n\n\n\n<p>Simplifying, the key advantages of Data Lakehouses with respect to Data Lakes are:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>BI support<\/strong>. BI tools can be used directly from the source data.<\/li>\n\n\n\n<li><strong>Data governance<\/strong>. They guarantee data integrity through data governance.<\/li>\n\n\n\n<li><strong>Lower data storage costs<\/strong>. Lakehouses have generally lower storage costs than Lakes.<\/li>\n<\/ul>\n\n\n\n<p>In a few words, DataBricks is:<\/p>\n\n\n\n<p>\u201c<em>One platform for your data, consistently governed and available for all your analytics and AI<\/em>\u201d.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\" id=\"h-getting-started-with-databricks\">Getting Started with DataBricks<\/h1>\n\n\n\n<p>To start using DataBricks, we first need to create an account <a href=\"https:\/\/www.databricks.com\/try-databricks?itm_data=Homepage-HeroCTA-Trial#account\">here<\/a>. DataBricks is it is a paid service, but they provide you with a 14-day trial license.<\/p>\n\n\n\n<p>If you have an account on a cloud service, like AWS or Azure, you can use it: this comes in handy if you have data stored in one of such services.<\/p>\n\n\n\n<p>If you don\u2019t have one, you don\u2019t need to create it. After filling in the fields for the registration, in fact, you can specify that you don\u2019t have an account on any of the services they list and they\u2019ll manage to make you use their cloud service.<\/p>\n\n\n\n<p>The registration process is easy and fast and, in a couple of minutes (maybe, less), you are in and you can start working with DataBricks. For example, you can launch a new notebook or import data from your local machine:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"351\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-1024x351.png\" alt=\"data science, python and databricks\" class=\"wp-image-22736\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-1024x351.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-300x103.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-768x263.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1.png 1535w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">DataBricks Dashboard. Image by Author.<\/figcaption><\/figure>\n\n\n\n<p>The very first thing we have to do is load some data in DataBricks before we can actually use it for our purposes. This process in DataBricks is called \u201ccluster creation\u201d because you\u2019ll be assigned some computational resources. The process can be started by clicking on \u201cData\u201d in the left sidebar:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"407\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-python-1024x407.gif\" alt=\"\" class=\"wp-image-22738\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-python-1024x407.gif 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-python-300x119.gif 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-python-768x305.gif 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-python-1536x610.gif 1536w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Creating a Cluster in Databricks. Image by Author.<\/figcaption><\/figure>\n\n\n\n<p>Now, here we can choose between three options:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Upload a file<\/strong>. This makes us upload a file from our local machine.<\/li>\n\n\n\n<li><strong>S3<\/strong>.This makes us retrieve the data from Amazon S3: Amazon\u2019s scalable objects storage.<\/li>\n\n\n\n<li><strong>Other Data Sources<\/strong>. Here you can choose between various sources, including Amazon Kinesis, Snowflakes, and others.<\/li>\n<\/ul>\n\n\n\n<p>We\u2019ll use a dataset to predict the pricing of houses retrieved from Kaggle <a href=\"https:\/\/www.kaggle.com\/datasets\/harishkumardatalab\/housing-price-prediction?resource=download\">here<\/a>.<\/p>\n\n\n\n<p>We simply download the file, unzip it, and load the CSV file into DataBricks:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"856\" height=\"655\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python-tutorial.png\" alt=\"\" class=\"wp-image-22739\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python-tutorial.png 856w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python-tutorial-300x230.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python-tutorial-768x588.png 768w\" sizes=\"auto, (max-width: 856px) 100vw, 856px\" \/><figcaption class=\"wp-element-caption\">Loading a CVS file into DataBricks. Image by Author.<\/figcaption><\/figure>\n\n\n\n<p>As we can see, DataBricks gives us a path that indicates the location of the file.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Python in DataBricks<\/h1>\n\n\n\n<p>Before starting our analyses, whether we\u2019re talking about Machine Learning or analytical discovery, we have to create a new Notebook. We can do it from the dashboard like so:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"407\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-on-databricks-1024x407.gif\" alt=\"Notebook on databricks. Python tutorial\" class=\"wp-image-22740\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-on-databricks-1024x407.gif 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-on-databricks-300x119.gif 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-on-databricks-768x305.gif 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-on-databricks-1536x610.gif 1536w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Creating a Notebook on Databricks. Image by author.<\/figcaption><\/figure>\n\n\n\n<p>As we can see, the Notebook is graphically similar to Jupyter Notebooks or Notebooks in Google Colaboratory.<\/p>\n\n\n\n<p>If we want, we can see the resources we have available in our cluster. On the left-sidebar we click on \u201cCompute\u201d and this is what we can see:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"231\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Available-resources.-Databricks-and-Python-1024x231.png\" alt=\"\" class=\"wp-image-22741\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Available-resources.-Databricks-and-Python-1024x231.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Available-resources.-Databricks-and-Python-300x68.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Available-resources.-Databricks-and-Python-768x173.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Available-resources.-Databricks-and-Python-1536x346.png 1536w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Available-resources.-Databricks-and-Python.png 1880w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Available resources. Image by author.<\/figcaption><\/figure>\n\n\n\n<p>So, we have a cluster with 2 active cores and 15Gb.<\/p>\n\n\n\n<p>Now, let\u2019s return to our Notebook.<\/p>\n\n\n\n<p>To install a Python library we need to use the magic command \u201c%\u201d.<\/p>\n\n\n\n<p>The important thing to know is that DataBricks is equipped with all the most used <a href=\"https:\/\/www.codemotion.com\/magazine\/languages\/python\/\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">Python <\/a>libraries. So, for example, if we want to install <a href=\"https:\/\/www.codemotion.com\/magazine\/data-science\/data-analysis-made-easy-mastering-pandas-for-insightful-results\/\" target=\"_blank\" aria-label=\" (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">Pandas<\/a> by typing %pip install pandas, DataBricks will return to us that Pandas is already installed:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"295\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-pandas-1024x295.png\" alt=\"notebook on pandas. Tutorial Python and Databricks.\" class=\"wp-image-22742\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-pandas-1024x295.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-pandas-300x86.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-pandas-768x221.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-pandas-1536x443.png 1536w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Notebook-pandas.png 1881w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">A Notebook showing Pandas is already installed in DataBricks. Image by Author.<\/figcaption><\/figure>\n\n\n\n<p>So, if we don\u2019t need any particular library, we can import our data and show what\u2019s inside it.<\/p>\n\n\n\n<p>Now, DataBricks also helps us speed up the process involved in loading the data and analyzing it. What we mean is that DataBricks can directly create a new Notebook and add the data to our cluster when we import them.<br><br>So, suppose we want to analyze the data related to the houses, as we said before. This time the name of the file is houses.csv. We can perform the process by clicking on \u201cCreate a table in a Notebook\u201d:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"790\" height=\"725\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image7.png\" alt=\"\" class=\"wp-image-22743\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image7.png 790w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image7-300x275.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image7-768x705.png 768w\" sizes=\"auto, (max-width: 790px) 100vw, 790px\" \/><figcaption class=\"wp-element-caption\">Creating a table in a Notebook. Image by Author.<\/figcaption><\/figure>\n\n\n\n<p>Then, DataBricks creates a Notebook for us that is ready to use:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"407\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-1024x407.gif\" alt=\"databricks and python tutorial iamge 8\" class=\"wp-image-22744\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-1024x407.gif 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-300x119.gif 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-768x305.gif 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-1536x610.gif 1536w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>So, DataBricks created a ready-to-use Notebook with the first cell written in Spark. This way, when we run it, it displays the data we loaded as we can see.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Analytics and Machine Learning with Python in DataBricks<\/h1>\n\n\n\n<p>Now, we want to make some predictions with Machine Learning using DataBricks.<\/p>\n\n\n\n<p>For the sake of simplicity, we can use the \u201cdiabetes\u201d dataset provided by sklearn.<\/p>\n\n\n\n<p>So, let\u2019s open a new Notebook in DataBricks as we\u2019ve shown earlier, and import all the libraries we need:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">import pandas <span class=\"hljs-keyword\">as<\/span> pd\r\nimport numpy <span class=\"hljs-keyword\">as<\/span> np\r\n\r\n\r\n<span class=\"hljs-comment\"># Plotting<\/span>\r\nimport seaborn <span class=\"hljs-keyword\">as<\/span> sns\r\nimport matplotlib.pyplot <span class=\"hljs-keyword\">as<\/span> plt\r\n<span class=\"hljs-comment\"># Images dimensions<\/span>\r\nplt.figure(figsize=(<span class=\"hljs-number\">10<\/span>, <span class=\"hljs-number\">7<\/span>))\r\n\r\n\r\n<span class=\"hljs-comment\"># Sklearn<\/span>\r\nfrom sklearn.datasets import load_diabetes <span class=\"hljs-comment\">#importing data<\/span>\r\nfrom sklearn.linear_model import LinearRegression\r\nfrom sklearn.model_selection import train_test_split\r\nfrom sklearn import linear_model\r\nfrom sklearn import metrics\r\n\r\nNow, let\u2019s show the dataset:\r\n\r\n<span class=\"hljs-comment\"># Import dataset<\/span>\r\ndiab = load_diabetes()\r\n\r\n<span class=\"hljs-comment\"># Definine feature and label<\/span>\r\nX = diab&#91;<span class=\"hljs-string\">'data'<\/span>]\r\ny = diab&#91;<span class=\"hljs-string\">'target'<\/span>]\r\n\r\n\r\n<span class=\"hljs-comment\">#  Create dataframe from X<\/span>\r\ndf = pd.DataFrame(X, columns=&#91;<span class=\"hljs-string\">\"age\"<\/span>,<span class=\"hljs-string\">\"sex\"<\/span>,<span class=\"hljs-string\">\"bmi\"<\/span>,<span class=\"hljs-string\">\"bp\"<\/span>, <span class=\"hljs-string\">\"tc\"<\/span>, <span class=\"hljs-string\">\"ldl\"<\/span>, <span class=\"hljs-string\">\"hdl\"<\/span>,<span class=\"hljs-string\">\"tch\"<\/span>, <span class=\"hljs-string\">\"ltg\"<\/span>, <span class=\"hljs-string\">\"glu\"<\/span>])\r\n\r\n<span class=\"hljs-comment\"># Add 'progression' from y<\/span>\r\ndf&#91;<span class=\"hljs-string\">'progression'<\/span>] = diab&#91;<span class=\"hljs-string\">'target'<\/span>]\r\n\r\n<span class=\"hljs-comment\">#  Show head<\/span>\r\ndf.head()\r\n\r\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>And we get:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1002\" height=\"217\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/table-results-python-and-databricks.png\" alt=\"\" class=\"wp-image-22745\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/table-results-python-and-databricks.png 1002w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/table-results-python-and-databricks-300x65.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/table-results-python-and-databricks-768x166.png 768w\" sizes=\"auto, (max-width: 1002px) 100vw, 1002px\" \/><figcaption class=\"wp-element-caption\">The \u201cDiabetes dataset\u201d from sklearn. Image by Author.<\/figcaption><\/figure>\n\n\n\n<p>Now, first of all, let me tell you that I didn\u2019t need to install any of the libraries imported above: they\u2019re all installed in DataBricks.<\/p>\n\n\n\n<p>Now, let\u2019s see if there are any highly correlated features with a correlation matrix like so:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\"><span class=\"hljs-comment\"># Apply mask<\/span>\r\nmask = np.triu(np.ones_like(df.corr()))\r\n\r\n\r\n<span class=\"hljs-comment\"># Show correlation matrix<\/span>\r\ndataplot = sns.heatmap(df.corr(), annot=<span class=\"hljs-keyword\">True<\/span>, fmt=<span class=\"hljs-string\">'.2f'<\/span>, mask=mask)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"545\" height=\"387\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image10-1-1.png\" alt=\"Tutorial databricks and python.\" class=\"wp-image-22746\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image10-1-1.png 545w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image10-1-1-300x213.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image10-1-1-180x128.png 180w\" sizes=\"auto, (max-width: 545px) 100vw, 545px\" \/><figcaption class=\"wp-element-caption\">The correlation matrix of the \u201cDiabetes dataset\u201d. Image by Author.<\/figcaption><\/figure>\n\n\n\n<p>Now, let\u2019s split the data into the train and the test sets, fit with a linear regression model, and calculate R<sup>2<\/sup>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\"><span class=\"hljs-comment\"># Define features<\/span>\r\nX = df.iloc&#91;:,:<span class=\"hljs-number\">-1<\/span>]\r\n\r\n\r\n<span class=\"hljs-comment\"># Define label<\/span>\r\ny = df&#91;<span class=\"hljs-string\">'progression'<\/span>]\r\n\r\n\r\n<span class=\"hljs-comment\"># Split<\/span>\r\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class=\"hljs-number\">0.2<\/span>,random_state=<span class=\"hljs-number\">42<\/span>)\r\n\r\n\r\n<span class=\"hljs-comment\"># Fit lin. regr. the model<\/span>\r\nreg = LinearRegression().fit(X_train, y_train)\r\n\r\n\r\n<span class=\"hljs-comment\"># Make predictions<\/span>\r\ny_test_pred = reg.predict(X_test)\r\ny_train_pred = reg.predict(X_train)\r\n\r\n\r\n<span class=\"hljs-comment\"># R^2 on both sets<\/span>\r\n<span class=\"hljs-keyword\">print<\/span>(f<span class=\"hljs-string\">'Coeff. of determination on train set:{reg.score(X_train, y_train): .2f}'<\/span>) \r\n<span class=\"hljs-keyword\">print<\/span>(f<span class=\"hljs-string\">'Coeff. of determination on test set:{reg.score(X_test, y_test): .2f}'<\/span>) <\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>And we get:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">Coeff. of determination on train <span class=\"hljs-keyword\">set<\/span>: 0.53\r\nCoeff. of determination on test <span class=\"hljs-keyword\">set<\/span>: 0.45\r\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>So, the results of R<sup>2<\/sup> are not very convincing, and we\u2019d try different Machine Learning models to solve this problem.<\/p>\n\n\n\n<p>Anyway, here we\u2019ve shown that a notebook in DataBricks can be used exactly as any other Notebook.<\/p>\n\n\n\n<h1 class=\"wp-block-heading\">Conclusion<\/h1>\n\n\n\n<p>In this article, we\u2019ve described the importance of DataBricks as a Lakehouse that, amongst the others, allows us to manage <a href=\"https:\/\/docs.databricks.com\/en\/workflows\/index.html#what-is-databricks-jobs\">complicated ML workflows<\/a>.<\/p>\n\n\n\n<p>Anyway, we\u2019ve also shown how to use Notebooks in DataBricks with Python. As we\u2019ve seen, DataBricks has all the <a href=\"https:\/\/www.codemotion.com\/magazine\/languages\/python-libraries-data-science\/\" target=\"_blank\" aria-label=\"most used Python libraries (opens in a new tab)\" rel=\"noreferrer noopener\" class=\"ek-link\">most used Python libraries<\/a> already installed.<\/p>\n\n\n\n<p>Also, we can use Notebooks in DataBricks the same we we use Jupyter Notebooks, but with the advantage to manage workflows and huge amounts of data, as well as unstructured data, depending on the actual problem we\u2019re solving.<\/p>\n\n\n\n<p>Concluding, some of the reasons why we may use DataBricks are when we need:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>To deal with large-scale data processing and analysis.<\/li>\n\n\n\n<li>A collaborative environment for data scientists, analysts, and engineers to work together.<\/li>\n\n\n\n<li>To build end-to-end machine learning pipelines.<\/li>\n\n\n\n<li>To analyze and process real-time data.<\/li>\n\n\n\n<li>To leverage the capabilities of Apache Spark without managing the underlying infrastructure.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Synergizing Data Analysis: Python&#8217;s Versatility in Databricks Introduction In this article, we\u2019ll describe what&nbsp; DataBricks is and why we may want or need to use it. Then, we\u2019ll show that DataBricks can work, among the others, with Python. In particular, we\u2019ll show how we can use the Notebooks that DataBricks provides us the exact way&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/\">Read more<\/a><\/p>\n","protected":false},"author":171,"featured_media":22749,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[8457],"tags":[10854,68],"collections":[],"class_list":{"0":"post-22735","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science","8":"tag-databricks","9":"tag-python","10":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Python and Databricks: A Dynamic Duo for Data Dominance - Codemotion Magazine<\/title>\n<meta name=\"description\" content=\"Learn how to combine Python and Databricks in this in depth guide with code examples, images, and videos. Read on!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Python and Databricks: A Dynamic Duo for Data Dominance\" \/>\n<meta property=\"og:description\" content=\"Learn how to combine Python and Databricks in this in depth guide with code examples, images, and videos. Read on!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-28T07:30:00+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1448\" \/>\n\t<meta property=\"og:image:height\" content=\"724\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"Federico Trotta\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Federico Trotta\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"8 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/\"},\"author\":{\"name\":\"Federico Trotta\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/98d2abaf70e7d106abab1f38bf20f90d\"},\"headline\":\"Python and Databricks: A Dynamic Duo for Data Dominance\",\"datePublished\":\"2023-08-28T07:30:00+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/\"},\"wordCount\":1386,\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg\",\"keywords\":[\"Databricks\",\"Python\"],\"articleSection\":[\"Data Science\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/\",\"name\":\"Python and Databricks: A Dynamic Duo for Data Dominance - Codemotion Magazine\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg\",\"datePublished\":\"2023-08-28T07:30:00+00:00\",\"description\":\"Learn how to combine Python and Databricks in this in depth guide with code examples, images, and videos. Read on!\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#primaryimage\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg\",\"width\":1448,\"height\":724,\"caption\":\"Databricks and python. A complete guide for data dominance by Federico Trotta. Data science\"},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Science\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/data-science\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Python and Databricks: A Dynamic Duo for Data Dominance\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Codemotion.Italy\/\",\"https:\/\/x.com\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/98d2abaf70e7d106abab1f38bf20f90d\",\"name\":\"Federico Trotta\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/69bc8655986054bfe43c7eaa7f00e2ea939b761bd924064ea9b5972568a01714?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/69bc8655986054bfe43c7eaa7f00e2ea939b761bd924064ea9b5972568a01714?s=96&d=mm&r=g\",\"caption\":\"Federico Trotta\"},\"description\":\"I have loved writing since I was a young boy in school, writing detective stories as class exams. Thanks to my curiosity, I discovered programming and AI. Having a burning passion for writing, I couldn't avoid starting to write about these topics, so I decided to change my career to become a Technical Writer. My purpose is to educate people on Python programming, Machine Learning, and Data Science, through writing.\",\"sameAs\":[\"https:\/\/federicotrotta.com\/\",\"https:\/\/www.linkedin.com\/in\/federico-trotta\/?originalSubdomain=it\"],\"url\":\"https:\/\/www.codemotion.com\/magazine\/author\/federico-trotta\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Python and Databricks: A Dynamic Duo for Data Dominance - Codemotion Magazine","description":"Learn how to combine Python and Databricks in this in depth guide with code examples, images, and videos. Read on!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/","og_locale":"en_US","og_type":"article","og_title":"Python and Databricks: A Dynamic Duo for Data Dominance","og_description":"Learn how to combine Python and Databricks in this in depth guide with code examples, images, and videos. Read on!","og_url":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2023-08-28T07:30:00+00:00","og_image":[{"width":1448,"height":724,"url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg","type":"image\/jpeg"}],"author":"Federico Trotta","twitter_card":"summary_large_image","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"Federico Trotta","Est. reading time":"8 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/"},"author":{"name":"Federico Trotta","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/98d2abaf70e7d106abab1f38bf20f90d"},"headline":"Python and Databricks: A Dynamic Duo for Data Dominance","datePublished":"2023-08-28T07:30:00+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/"},"wordCount":1386,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg","keywords":["Databricks","Python"],"articleSection":["Data Science"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/","url":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/","name":"Python and Databricks: A Dynamic Duo for Data Dominance - Codemotion Magazine","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg","datePublished":"2023-08-28T07:30:00+00:00","description":"Learn how to combine Python and Databricks in this in depth guide with code examples, images, and videos. Read on!","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg","width":1448,"height":724,"caption":"Databricks and python. A complete guide for data dominance by Federico Trotta. Data science"},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/data-science\/python-and-databricks-a-dynamic-duo-for-data-dominance\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"Data Science","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/data-science\/"},{"@type":"ListItem","position":3,"name":"Python and Databricks: A Dynamic Duo for Data Dominance"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/98d2abaf70e7d106abab1f38bf20f90d","name":"Federico Trotta","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/69bc8655986054bfe43c7eaa7f00e2ea939b761bd924064ea9b5972568a01714?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/69bc8655986054bfe43c7eaa7f00e2ea939b761bd924064ea9b5972568a01714?s=96&d=mm&r=g","caption":"Federico Trotta"},"description":"I have loved writing since I was a young boy in school, writing detective stories as class exams. Thanks to my curiosity, I discovered programming and AI. Having a burning passion for writing, I couldn't avoid starting to write about these topics, so I decided to change my career to become a Technical Writer. My purpose is to educate people on Python programming, Machine Learning, and Data Science, through writing.","sameAs":["https:\/\/federicotrotta.com\/","https:\/\/www.linkedin.com\/in\/federico-trotta\/?originalSubdomain=it"],"url":"https:\/\/www.codemotion.com\/magazine\/author\/federico-trotta\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-600x400.jpg","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-600x600.jpg","author_info":{"display_name":"Federico Trotta","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/federico-trotta\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg",1448,724,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-150x150.jpg",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-300x150.jpg",300,150,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-768x384.jpg",768,384,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-1024x512.jpg",1024,512,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg",1448,724,false],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg",1448,724,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min.jpg",100,50,false],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-180x128.jpg",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-896x504.jpg",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-400x225.jpg",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-600x400.jpg",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/Databricks-and-Python.-Data-science-min-600x600.jpg",600,600,true]},"uagb_author_info":{"display_name":"Federico Trotta","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/federico-trotta\/"},"uagb_comment_info":0,"uagb_excerpt":"Synergizing Data Analysis: Python&#8217;s Versatility in Databricks Introduction In this article, we\u2019ll describe what&nbsp; DataBricks is and why we may want or need to use it. Then, we\u2019ll show that DataBricks can work, among the others, with Python. In particular, we\u2019ll show how we can use the Notebooks that DataBricks provides us the exact way&#8230;&hellip;","lang":"en","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/22735","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/171"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=22735"}],"version-history":[{"count":2,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/22735\/revisions"}],"predecessor-version":[{"id":22748,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/22735\/revisions\/22748"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/22749"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=22735"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=22735"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=22735"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=22735"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}