{"id":22677,"date":"2023-08-28T09:30:00","date_gmt":"2023-08-28T07:30:00","guid":{"rendered":"https:\/\/www.codemotion.com\/magazine\/?p=22677"},"modified":"2023-09-01T11:47:07","modified_gmt":"2023-09-01T09:47:07","slug":"python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/","title":{"rendered":"Python e DataBricks: la giusta accoppiata per dominare i dati"},"content":{"rendered":"\n<p>In questo articolo, <strong>descriveremo cos&#8217;\u00e8 DataBricks e perch\u00e9 potremmo aver bisogno di usarlo anche con Python.<\/strong><\/p>\n\n\n\n<p>Mostreremo che in <strong>DataBricks <\/strong>possiamo utilizzare, tra gli altri, <strong>Python <\/strong>come linguaggio di programmazione. In particolare, mostreremo come sia possibile utilizzare i Notebooks che DataBricks ci fornisce allo stesso modo in cui utilizziamo i Jupyter Notebooks.<\/p>\n\n\n\n<p>Il caso d&#8217;uso che implementeremo alla fine di questo articolo, infine, <strong>mostrer\u00e0 quanto sia facile usare Python per fare analisi dei dati e\/o predizioni <\/strong>con il Machine Learning usando i Notebooks in DataBricks.<\/p>\n\n\n\n<h1 class=\"gb-headline gb-headline-da9c2dce gb-headline-text\">Introduzione a DataBricks<\/h1>\n\n\n\n<p><a href=\"https:\/\/www.databricks.com\/\">DataBricks<\/a> \u00e8 una piattaforma Data Lakehouse che &#8220;<a href=\"https:\/\/www.databricks.com\/product\/data-lakehouse\">combina i migliori elementi dei Datalakes e dei Data Warehouses per aiutarti a ridurre i costi e a fornirti pi\u00f9 rapidamente soluzioni legate ai dati e all\u2019AI<\/a>&#8220;.<\/p>\n\n\n\n<p>Il concetto di <strong>Data Lakehouse \u00e8 nato solo di recente<\/strong> (circa nel 2020) e va oltre i concetti e le strutture di Data Warehouse e Data Lake. Vediamo come.<\/p>\n\n\n\n<p>Il concetto e la struttura dei <strong>Data Warehouse<\/strong> sono nati negli anni &#8217;80. Si tratta di un modello che ci d\u00e0 la possibilit\u00e0 di gestire dati strutturati &#8211; ovvero: testo e numeri &#8211; in un ambiente strutturato &#8211; per semplificare e chiarire: un database che utilizza SQL.<\/p>\n\n\n\n<p>Negli ultimi anni &#8211; circa nel 2010 -, vista la necessit\u00e0 di organizzare dati non strutturati &#8211; ovvero: audio, immagini, e cos\u00ec via &#8211; \u00e8 nata l&#8217;architettura dei <strong>Data Lakes<\/strong>. Questa ci d\u00e0 la possibilit\u00e0 di archiviare i dati non strutturati in un luogo unico per ulteriori analisi.<\/p>\n\n\n\n<p>Uno dei grandi svantaggi dei Data Lake \u00e8 che per creare report BI abbiamo prima bisogno di <a href=\"https:\/\/it.wikipedia.org\/wiki\/Extract,_transform,_load\">ETL<\/a> per strutturare i dati. Ovvero:<strong> dobbiamo creare uno (o pi\u00f9!) Data Warehouse all&#8217;interno del Data Lake da cui attingiamo i dati.<\/strong> Solo dopo questo passaggio, possiamo analizzare i dati e creare report di BI.<\/p>\n\n\n\n<p>Una Data Lakehouse, invece, \u00e8 un nuovo sistema \u201c<a href=\"https:\/\/www.databricks.com\/blog\/2020\/01\/30\/what-is-a-data-lakehouse.html?itm_data=lakehouse-link-lakehouseblog\">che affronta i limiti dei Data Lake. Un Lake House \u00e8 una architettura nuova che combina i migliori elementi dei data lake e data warehouse<\/a>\u201d.<\/p>\n\n\n\n<p>Semplificando, i principali vantaggi dei Data Lakehouse rispetto ai Data Lake sono:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Supporto degli strumenti di BI<\/strong>. Gli strumenti di BI possono essere utilizzati direttamente dai dati di origine.<\/li>\n\n\n\n<li><strong>Governance dei dati<\/strong>. Garantiscono l&#8217;integrit\u00e0 dei dati attraverso la governance.<\/li>\n\n\n\n<li><strong>Riduzione dei costi di archiviazione dei dati<\/strong>. I Data Lakehouse hanno generalmente costi di storage dei dati inferiori rispetto ai Data Lakes.<\/li>\n<\/ul>\n\n\n\n<p>In poche parole, DataBricks \u00e8:<\/p>\n\n\n\n<p>&#8220;<em>Una piattaforma per i tuoi dati, governata in modo coerente e disponibile per tutte le tue analisi e per l\u2019utilizzo dell\u2019IA<\/em>&#8220;.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>Lettura consigliata:<a href=\"https:\/\/www.codemotion.com\/magazine\/it\/linguaggi-programmazione\/programmare-con-python\/\" class=\"ek-link\"> <em>Come programmare con Python: Il linguaggio versatile che conquista tutti<\/em><\/a><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"gb-headline gb-headline-daefa66e gb-headline-text\">Come iniziare ad utilizzare DataBricks<\/h1>\n\n\n\n<p>Per iniziare ad utilizzare DataBricks, dobbiamo prima creare un account <a href=\"https:\/\/www.databricks.com\/try-databricks?itm_data=Homepage-HeroCTA-Trial#account\">qui<\/a>. DataBricks \u00e8 un servizio a pagamento, ma ci fornisce una licenza di prova di 14 giorni.<\/p>\n\n\n\n<p>Se hai un account su un servizio cloud, come AWS o Azure, puoi usarlo: questo torna utile se hai dei dati archiviati in uno di questi servizi.<\/p>\n\n\n\n<p>Se non ne hai uno, non \u00e8 necessario crearlo. Dopo aver compilato i campi per la registrazione, infatti, puoi specificare che non hai un account su nessuno dei servizi elencati: DataBricks ti far\u00e0, allora, utilizzare il suo servizio cloud.<\/p>\n\n\n\n<p>Il processo di registrazione \u00e8 facile e veloce e, in un paio di minuti (forse, meno), sei dentro e puoi iniziare a lavorare con DataBricks. Ad esempio, puoi lanciare un nuovo Notebook o importare dati dal tuo computer locale:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"351\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-1-1024x351.png\" alt=\"Python e DataBricks\" class=\"wp-image-22768\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-1-1024x351.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-1-300x103.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-1-768x263.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image1-1.png 1535w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">La dashboard di DataBricks. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>La prima cosa che dobbiamo fare \u00e8 caricare alcuni dati in DataBricks prima di poterli effettivamente utilizzare per i nostri scopi. Questo processo in DataBricks \u00e8 chiamato &#8220;creazione di cluster&#8221; perch\u00e9 ti verranno assegnate alcune risorse computazionali. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"407\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image2-1-1024x407.gif\" alt=\"Python e DataBricks\" class=\"wp-image-22770\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image2-1-1024x407.gif 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image2-1-300x119.gif 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image2-1-768x305.gif 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image2-1-1536x610.gif 1536w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Come creare un cluster in DataBricks. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>Il processo pu\u00f2 essere avviato facendo clic su &#8220;Data&#8221; nella barra laterale di sinistra:<\/p>\n\n\n\n<p>Qui possiamo scegliere tra tre opzioni:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Upload a file<\/strong>. Questo ci fa caricare un file dalla nostra macchina locale.<\/li>\n\n\n\n<li><strong>S3<\/strong>.Questo ci fa recuperare i dati da Amazon S3: lo storage di oggetti scalabili di Amazon.<\/li>\n\n\n\n<li><strong>Other Data Sources<\/strong>. Qui puoi scegliere tra varie fonti, tra cui Amazon Kinesis, Snowflakes e altre.<\/li>\n<\/ul>\n\n\n\n<p>Useremo un set di dati per prevedere il prezzo delle case. Puoi scaricarlo da Kaggle <a href=\"https:\/\/www.kaggle.com\/datasets\/harishkumardatalab\/housing-price-prediction?resource=download\">qui<\/a>.<\/p>\n\n\n\n<p>Scarichiamo il file, lo scompattiamo e carichiamo il file CSV in DataBricks:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"856\" height=\"655\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image3.png\" alt=\"Python e DataBricks\" class=\"wp-image-22771\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image3.png 856w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image3-300x230.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image3-768x588.png 768w\" sizes=\"auto, (max-width: 856px) 100vw, 856px\" \/><figcaption class=\"wp-element-caption\">Come caricare un CSV in DataBricks. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>Come si pu\u00f2 vedere, DataBricks assegna al file un path che indica la posizione in cui questo \u00e8 stato salvato.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<p><strong>Lettura consigliata: <em><a href=\"https:\/\/www.codemotion.com\/magazine\/it\/linguaggi-programmazione\/guida-librerie-python-data-science\/\" class=\"ek-link\">Librerie Python per Data Science: una guida completa<\/a><\/em><\/strong><\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h1 class=\"gb-headline gb-headline-e840ba4f gb-headline-text\">Python in DataBricks<\/h1>\n\n\n\n<p>Prima di iniziare le nostre analisi, che si parli di Machine Learning o di analisi dei dati, dobbiamo creare un nuovo Notebook. Possiamo farlo dalla dashboard in questo modo:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"407\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image4-1-1024x407.gif\" alt=\"Python e DataBricks\" class=\"wp-image-22773\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image4-1-1024x407.gif 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image4-1-300x119.gif 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image4-1-768x305.gif 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image4-1-1536x610.gif 1536w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Come creare un nuovo Notebook in DataBricks. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>Come possiamo vedere, il Notebook \u00e8 graficamente simile ad un Jupyter Notebook o ad un Notebook di Google Colaboratory.<\/p>\n\n\n\n<p>Se vogliamo, possiamo vedere le risorse che abbiamo a disposizione nel nostro cluster. Nella barra laterale sinistra clicchiamo su \u201cCompute\u201d e questo \u00e8 ci\u00f2 che si pu\u00f2 vedere:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"231\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image5-1-1024x231.png\" alt=\"Python e DataBricks\" class=\"wp-image-22774\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image5-1-1024x231.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image5-1-300x68.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image5-1-768x173.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image5-1-1536x346.png 1536w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image5-1.png 1880w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Le risorse computazionali che abbiamo a disposizione. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>Quindi, ci \u00e8 stato assegnato un cluster con 2 core attivi e 15 Gb.<\/p>\n\n\n\n<p>Torniamo ora al nostro Notebook.<\/p>\n\n\n\n<p>Per installare una libreria Python dobbiamo usare il magic command \u201c%\u201d.<\/p>\n\n\n\n<p>La cosa importante da sapere \u00e8 che DataBricks \u00e8 gi\u00e0 dotato di tutte le librerie <a href=\"https:\/\/www.codemotion.com\/magazine\/it\/linguaggi-programmazione\/programmare-con-python\/\" class=\"ek-link\">Python<\/a> pi\u00f9 utilizzate. Quindi, per esempio, se vogliamo installare <a href=\"https:\/\/www.codemotion.com\/magazine\/it\/scienza-dei-dati\/analisi-dei-dati-pandas\/\" class=\"ek-link\">Pandas<\/a> dobbiamo digitare %pip install pandas. In questo caso, DataBricks ci dir\u00e0 che Pandas \u00e8 gi\u00e0 installato:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"295\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image6-1024x295.png\" alt=\"Python e DataBricks\" class=\"wp-image-22775\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image6-1024x295.png 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image6-300x86.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image6-768x221.png 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image6-1536x443.png 1536w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image6.png 1881w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Un Notebook che mostra che Pandas \u00e8 gi\u00e0 installato in DataBricks. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>Quindi, se non abbiamo bisogno di una libreria particolare, possiamo importare i nostri dati e passare direttamente alla loro analisi.<\/p>\n\n\n\n<p>DataBricks ci aiuta anche ad accelerare il processo relativo al caricamento dei dati ed alla loro successiva analisi. In particolare, DataBricks pu\u00f2 creare direttamente un nuovo Notebook ed aggiungere i dati al nostro cluster quando li importiamo.<\/p>\n\n\n\n<p>Supponiamo, quindi, di voler analizzare i dati relativi alle case, come abbiamo detto prima. Questa volta il nome del file \u00e8 houses.csv. Possiamo eseguire il processo cliccando su \u201cCreate a table in a Notebook\u201d:<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"790\" height=\"725\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image7-2.png\" alt=\"Python e DataBricks\" class=\"wp-image-22776\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image7-2.png 790w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image7-2-300x275.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image7-2-768x705.png 768w\" sizes=\"auto, (max-width: 790px) 100vw, 790px\" \/><figcaption class=\"wp-element-caption\">Create a table in a Notebook. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>Quindi, DataBricks crea per noi un Notebook gi\u00e0 pronto all\u2019uso:<\/p>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"407\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-3-1024x407.gif\" alt=\"Python e DataBricks\" class=\"wp-image-22779\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-3-1024x407.gif 1024w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-3-300x119.gif 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-3-768x305.gif 768w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image8-3-1536x610.gif 1536w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><figcaption class=\"wp-element-caption\">Un Notebook pronto all\u2019uso. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>Quindi, DataBricks ha creato un Notebook pronto all&#8217;uso con la prima cella che \u00e8 scritta in Spark. In questo modo, come si pu\u00f2 vedere, appena la eseguiamo, questa ci mostra i dati che abbiamo caricato.<\/p>\n\n\n\n<h2 class=\"gb-headline gb-headline-f6cff391 gb-headline-text\">Analisi dei dati e Machine Learning con Python in DataBricks<\/h2>\n\n\n\n<p>Adesso vogliamo fare alcune predizioni con il Machine Learning utilizzando DataBricks.<\/p>\n\n\n\n<p>Per semplicit\u00e0, possiamo utilizzare il set di dati &#8220;Diabetes&#8221; fornito da sklearn.<\/p>\n\n\n\n<p>Apriamo un nuovo Notebook in DataBricks come abbiamo mostrato in precedenza e importiamo tutte le librerie di cui abbiamo bisogno:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\">import pandas <span class=\"hljs-keyword\">as<\/span> pd\nimport numpy <span class=\"hljs-keyword\">as<\/span> np\n\n\n<span class=\"hljs-comment\"># Plotting<\/span>\nimport seaborn <span class=\"hljs-keyword\">as<\/span> sns\nimport matplotlib.pyplot <span class=\"hljs-keyword\">as<\/span> plt\n<span class=\"hljs-comment\"># Images dimensions<\/span>\nplt.figure(figsize=(<span class=\"hljs-number\">10<\/span>, <span class=\"hljs-number\">7<\/span>))\n\n\n<span class=\"hljs-comment\"># Sklearn<\/span>\nfrom sklearn.datasets import load_diabetes <span class=\"hljs-comment\">#importing data<\/span>\nfrom sklearn.linear_model import LinearRegression\nfrom sklearn.model_selection import train_test_split\nfrom sklearn import linear_model\nfrom sklearn import metrics\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Adesso, visualizziamo i nostri dati:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\"><span class=\"hljs-comment\"># Import dataset<\/span>\ndiab = load_diabetes()\n\n<span class=\"hljs-comment\"># Definine feature and label<\/span>\nX = diab&#91;<span class=\"hljs-string\">'data'<\/span>]\ny = diab&#91;<span class=\"hljs-string\">'target'<\/span>]\n\n\n<span class=\"hljs-comment\">#  Create dataframe from X<\/span>\ndf = pd.DataFrame(X, columns=&#91;<span class=\"hljs-string\">\"age\"<\/span>,<span class=\"hljs-string\">\"sex\"<\/span>,<span class=\"hljs-string\">\"bmi\"<\/span>,<span class=\"hljs-string\">\"bp\"<\/span>, <span class=\"hljs-string\">\"tc\"<\/span>, <span class=\"hljs-string\">\"ldl\"<\/span>, <span class=\"hljs-string\">\"hdl\"<\/span>,<span class=\"hljs-string\">\"tch\"<\/span>, <span class=\"hljs-string\">\"ltg\"<\/span>, <span class=\"hljs-string\">\"glu\"<\/span>])\n\n<span class=\"hljs-comment\"># Add 'progression' from y<\/span>\ndf&#91;<span class=\"hljs-string\">'progression'<\/span>] = diab&#91;<span class=\"hljs-string\">'target'<\/span>]\n\n<span class=\"hljs-comment\">#  Show head<\/span>\ndf.head()\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1002\" height=\"217\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image9.png\" alt=\"Python e DataBricks\" class=\"wp-image-22778\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image9.png 1002w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image9-300x65.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image9-768x166.png 768w\" sizes=\"auto, (max-width: 1002px) 100vw, 1002px\" \/><figcaption class=\"wp-element-caption\">Il dataset \u201cDiabetes\u201d di sklearn. Immagine dell\u2019Autore.<\/figcaption><\/figure>\n\n\n\n<p>Come prima cosa diciamo che non c\u2019\u00e8 stato bisogno di installare nessuna delle librerie importate sopra: sono tutte gi\u00e0 installate in DataBricks.<\/p>\n\n\n\n<p>Ora, vediamo se ci sono features altamente correlate con una matrice di correlazione come la seguente:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\"><span class=\"hljs-comment\"># Apply mask<\/span>\nmask = np.triu(np.ones_like(df.corr()))\n\n\n<span class=\"hljs-comment\"># Show correlation matrix<\/span>\ndataplot = sns.heatmap(df.corr(), annot=<span class=\"hljs-keyword\">True<\/span>, fmt=<span class=\"hljs-string\">'.2f'<\/span>, mask=mask)<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"545\" height=\"387\" src=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image10-2.png\" alt=\"Python e DataBricks\" class=\"wp-image-22780\" srcset=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image10-2.png 545w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image10-2-300x213.png 300w, https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/image10-2-180x128.png 180w\" sizes=\"auto, (max-width: 545px) 100vw, 545px\" \/><figcaption class=\"wp-element-caption\">La matrice di correlazione del dataset \u201cDiebetes\u201d. Immagine dell\u2019Autore<\/figcaption><\/figure>\n\n\n\n<p>Adesso, dividiamo il set di dati nel train e test sets, facciamo il fit con un modello di regressione lineare, e calcoliamo R<sup>2<\/sup>:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"PHP\" data-shcb-language-slug=\"php\"><span><code class=\"hljs language-php\"><span class=\"hljs-comment\"># Define features<\/span>\nX = df.iloc&#91;:,:<span class=\"hljs-number\">-1<\/span>]\n\n\n<span class=\"hljs-comment\"># Define label<\/span>\ny = df&#91;<span class=\"hljs-string\">'progression'<\/span>]\n\n\n<span class=\"hljs-comment\"># Split<\/span>\nX_train, X_test, y_train, y_test = train_test_split(X, y, test_size=<span class=\"hljs-number\">0.2<\/span>,random_state=<span class=\"hljs-number\">42<\/span>)\n\n\n<span class=\"hljs-comment\"># Fit lin. regr. the model<\/span>\nreg = LinearRegression().fit(X_train, y_train)\n\n\n<span class=\"hljs-comment\"># Make predictions<\/span>\ny_test_pred = reg.predict(X_test)\ny_train_pred = reg.predict(X_train)\n\n\n<span class=\"hljs-comment\"># R^2 on both sets<\/span>\n\n\n<span class=\"hljs-keyword\">print<\/span>(f<span class=\"hljs-string\">'Coeff. of determination on train set:{reg.score(X_train, y_train): .2f}'<\/span>) \n<span class=\"hljs-keyword\">print<\/span>(f<span class=\"hljs-string\">'Coeff. of determination on test set:{reg.score(X_test, y_test): .2f}'<\/span>) <\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">PHP<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">php<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Otteniamo:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\">Coeff. of determination on train <span class=\"hljs-keyword\">set<\/span>: 0.53\nCoeff. of determination on test <span class=\"hljs-keyword\">set<\/span>: 0.45\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Quindi, i risultati di R<sup>2<\/sup> non sono molto convincenti e dovremmo provare altri modelli di Machine Learning per risolvere questo problema.<\/p>\n\n\n\n<p>Ad ogni modo, qui abbiamo dimostrato che un Notebook in DataBricks pu\u00f2 essere utilizzato esattamente come qualsiasi altro Notebook.<\/p>\n\n\n\n<h1 class=\"gb-headline gb-headline-06d09881 gb-headline-text\">Conclusioni<\/h1>\n\n\n\n<p>In questo articolo <strong>abbiamo descritto l&#8217;importanza di DataBricks come Lakehouse<\/strong> che, tra le varie cose, ci consente anche di gestire <a href=\"https:\/\/docs.databricks.com\/en\/workflows\/index.html#what-is-databricks-jobs\">complicati flussi di lavoro con il ML<\/a>.<\/p>\n\n\n\n<p>Ad ogni modo, abbiamo anche mostrato<strong> come utilizzare i Notebooks in DataBricks con Python<\/strong>. Inoltre, abbiamo anche inoltre visto che DataBricks ha gi\u00e0 installate tutte le librerie Python pi\u00f9 utilizzate, semplificandoci il lavoro.<\/p>\n\n\n\n<p>Come abbiamo detto,<strong> possiamo utilizzare i Notebooks in DataBricks allo stesso modo in cui utilizziamo i Jupyter Notebooks<\/strong>, ma con il vantaggio di poter gestire flussi di lavoro complicati ed enormi quantit\u00e0 di dati, nonch\u00e9 dati non strutturati, a seconda del problema che stiamo risolvendo.<\/p>\n\n\n\n<p>In conclusione, <strong>alcuni dei motivi per cui potremmo utilizzare DataBricks sono quando abbiamo bisogno di:<\/strong><\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Affrontare l&#8217;elaborazione e l&#8217;analisi dei dati su larga scala.<\/li>\n\n\n\n<li>Un ambiente collaborativo in cui data scientist, analisti e ingegneri possano lavorare insieme.<\/li>\n\n\n\n<li>Creare pipeline di Machine Learning end-to-end.<\/li>\n\n\n\n<li>Analizzare ed elaborare dati in tempo reale.<\/li>\n\n\n\n<li>Sfruttare le funzionalit\u00e0 di Apache Spark senza gestirne l&#8217;infrastruttura sottostante.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>In questo articolo, descriveremo cos&#8217;\u00e8 DataBricks e perch\u00e9 potremmo aver bisogno di usarlo anche con Python. Mostreremo che in DataBricks possiamo utilizzare, tra gli altri, Python come linguaggio di programmazione. In particolare, mostreremo come sia possibile utilizzare i Notebooks che DataBricks ci fornisce allo stesso modo in cui utilizziamo i Jupyter Notebooks. Il caso d&#8217;uso&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/\">Read more<\/a><\/p>\n","protected":false},"author":171,"featured_media":22751,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[10273],"tags":[10442,10438],"collections":[],"class_list":{"0":"post-22677","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-data-science-it","8":"tag-python-it","9":"tag-sviluppo-software-it","10":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Python e DataBricks: come usarli per dominare i dati<\/title>\n<meta name=\"description\" content=\"In questo articolo spiegheremo la sinergizzazione dell&#039;analisi dei dati attraverso la versatilit\u00e0 di Python e DataBricks.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Python e DataBricks: la giusta accoppiata per dominare i dati\" \/>\n<meta property=\"og:description\" content=\"In questo articolo spiegheremo la sinergizzazione dell&#039;analisi dei dati attraverso la versatilit\u00e0 di Python e DataBricks.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2023-08-28T07:30:00+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2023-09-01T09:47:07+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1081\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/webp\" \/>\n<meta name=\"author\" content=\"Federico Trotta\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Federico Trotta\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"9 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/\"},\"author\":{\"name\":\"Federico Trotta\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/98d2abaf70e7d106abab1f38bf20f90d\"},\"headline\":\"Python e DataBricks: la giusta accoppiata per dominare i dati\",\"datePublished\":\"2023-08-28T07:30:00+00:00\",\"dateModified\":\"2023-09-01T09:47:07+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/\"},\"wordCount\":1422,\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp\",\"keywords\":[\"Python\",\"sviluppo software\"],\"articleSection\":[\"Data Science\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/\",\"name\":\"Python e DataBricks: come usarli per dominare i dati\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp\",\"datePublished\":\"2023-08-28T07:30:00+00:00\",\"dateModified\":\"2023-09-01T09:47:07+00:00\",\"description\":\"In questo articolo spiegheremo la sinergizzazione dell'analisi dei dati attraverso la versatilit\u00e0 di Python e DataBricks.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#primaryimage\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp\",\"width\":1920,\"height\":1081},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"Data Science\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Python e DataBricks: la giusta accoppiata per dominare i dati\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Codemotion.Italy\/\",\"https:\/\/x.com\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/98d2abaf70e7d106abab1f38bf20f90d\",\"name\":\"Federico Trotta\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/69bc8655986054bfe43c7eaa7f00e2ea939b761bd924064ea9b5972568a01714?s=96&d=mm&r=g\",\"contentUrl\":\"https:\/\/secure.gravatar.com\/avatar\/69bc8655986054bfe43c7eaa7f00e2ea939b761bd924064ea9b5972568a01714?s=96&d=mm&r=g\",\"caption\":\"Federico Trotta\"},\"description\":\"I have loved writing since I was a young boy in school, writing detective stories as class exams. Thanks to my curiosity, I discovered programming and AI. Having a burning passion for writing, I couldn't avoid starting to write about these topics, so I decided to change my career to become a Technical Writer. My purpose is to educate people on Python programming, Machine Learning, and Data Science, through writing.\",\"sameAs\":[\"https:\/\/federicotrotta.com\/\",\"https:\/\/www.linkedin.com\/in\/federico-trotta\/?originalSubdomain=it\"],\"url\":\"https:\/\/www.codemotion.com\/magazine\/author\/federico-trotta\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Python e DataBricks: come usarli per dominare i dati","description":"In questo articolo spiegheremo la sinergizzazione dell'analisi dei dati attraverso la versatilit\u00e0 di Python e DataBricks.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/","og_locale":"en_US","og_type":"article","og_title":"Python e DataBricks: la giusta accoppiata per dominare i dati","og_description":"In questo articolo spiegheremo la sinergizzazione dell'analisi dei dati attraverso la versatilit\u00e0 di Python e DataBricks.","og_url":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2023-08-28T07:30:00+00:00","article_modified_time":"2023-09-01T09:47:07+00:00","og_image":[{"width":1920,"height":1081,"url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp","type":"image\/webp"}],"author":"Federico Trotta","twitter_card":"summary_large_image","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"Federico Trotta","Est. reading time":"9 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/"},"author":{"name":"Federico Trotta","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/98d2abaf70e7d106abab1f38bf20f90d"},"headline":"Python e DataBricks: la giusta accoppiata per dominare i dati","datePublished":"2023-08-28T07:30:00+00:00","dateModified":"2023-09-01T09:47:07+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/"},"wordCount":1422,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp","keywords":["Python","sviluppo software"],"articleSection":["Data Science"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/","url":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/","name":"Python e DataBricks: come usarli per dominare i dati","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp","datePublished":"2023-08-28T07:30:00+00:00","dateModified":"2023-09-01T09:47:07+00:00","description":"In questo articolo spiegheremo la sinergizzazione dell'analisi dei dati attraverso la versatilit\u00e0 di Python e DataBricks.","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp","width":1920,"height":1081},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/python-e-databricks-la-giusta-accoppiata-per-dominare-i-dati\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"Data Science","item":"https:\/\/www.codemotion.com\/magazine\/it\/data-science-it\/"},{"@type":"ListItem","position":3,"name":"Python e DataBricks: la giusta accoppiata per dominare i dati"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/98d2abaf70e7d106abab1f38bf20f90d","name":"Federico Trotta","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/69bc8655986054bfe43c7eaa7f00e2ea939b761bd924064ea9b5972568a01714?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/69bc8655986054bfe43c7eaa7f00e2ea939b761bd924064ea9b5972568a01714?s=96&d=mm&r=g","caption":"Federico Trotta"},"description":"I have loved writing since I was a young boy in school, writing detective stories as class exams. Thanks to my curiosity, I discovered programming and AI. Having a burning passion for writing, I couldn't avoid starting to write about these topics, so I decided to change my career to become a Technical Writer. My purpose is to educate people on Python programming, Machine Learning, and Data Science, through writing.","sameAs":["https:\/\/federicotrotta.com\/","https:\/\/www.linkedin.com\/in\/federico-trotta\/?originalSubdomain=it"],"url":"https:\/\/www.codemotion.com\/magazine\/author\/federico-trotta\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-600x400.webp","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-600x600.webp","author_info":{"display_name":"Federico Trotta","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/federico-trotta\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp",1920,1081,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-150x150.webp",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-300x169.webp",300,169,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-768x432.webp",768,432,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-1024x577.webp",1024,577,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-1536x865.webp",1536,865,true],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp",1920,1081,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01.webp",100,56,false],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-180x128.webp",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-896x504.webp",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-400x225.webp",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-600x400.webp",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/08\/01-600x600.webp",600,600,true]},"uagb_author_info":{"display_name":"Federico Trotta","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/federico-trotta\/"},"uagb_comment_info":0,"uagb_excerpt":"In questo articolo, descriveremo cos&#8217;\u00e8 DataBricks e perch\u00e9 potremmo aver bisogno di usarlo anche con Python. Mostreremo che in DataBricks possiamo utilizzare, tra gli altri, Python come linguaggio di programmazione. In particolare, mostreremo come sia possibile utilizzare i Notebooks che DataBricks ci fornisce allo stesso modo in cui utilizziamo i Jupyter Notebooks. Il caso d&#8217;uso&#8230;&hellip;","lang":"it","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/22677","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/171"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=22677"}],"version-history":[{"count":9,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/22677\/revisions"}],"predecessor-version":[{"id":22787,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/22677\/revisions\/22787"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/22751"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=22677"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=22677"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=22677"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=22677"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}