• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer
Codemotion Magazine

Codemotion Magazine

We code the future. Together

  • Discover
    • Live
    • Tech Communities
    • Hackathons
    • Coding Challenges
    • For Kids
  • Watch
    • Talks
    • Playlists
    • Edu Paths
  • Magazine
    • Backend
    • Frontend
    • AI/ML
    • DevOps
    • Dev Life
    • Soft Skills
    • Infographics
  • Talent
    • Discover Talent
    • Jobs
  • Partners
  • For Companies
Home » AI/ML » Machine Learning » Why do some machine learning models fail?
Machine Learning

Why do some machine learning models fail?

An overview of the current use of machine learning algorithms: why these algorithms often fail, and why we should know how ML libraries work.

March 9, 2020 by Leo Sorge

Table Of Contents
  1. A good path from astrophysics to neuroscience
  2. Machine Learnng: Linearity is broken
  3. Deciding what model suits you best: real world apps
  4. Conclusions

Everything is automatic and effortless with machine learning algorithms, right? Big data is all you need, after all. You have a dataset, you split it when necessary, you take one machine learning model, you train it and the miracle of a correct classification or prediction shines its light on you, your name, your business. Artificial intelligence is easy, isn’t it? No, it is not: this is only advertising.

“Most Machine Learning talks present beautiful cases of success, but in reality models often fail to deliver the desired performance“, stated Rafael Garcia-Dias in the introduction to his speech at Codemotion Milan 2019. “It is not uncommon to see developers blaming certain models and even blacklisting certain models.”

Garcia-Dias is a Research associate at King’s College London whose main focus is on developing machine learning models based on structural MRI to diagnose patients. In many cases, he has found that repeated trial-and-error processes are required to find a good data/algorithm combination, if one exists at all.

Data are nothing without control over the problem you are facing. “Only when you know that can you think about your model”, Garcia-Dias clarifies. “Be sure you understand your problem”: if you don’t have enough data, then generate it, even if this could prove expensive.

A good path from astrophysics to neuroscience

Automated learning can help in branches of knowledge that are fascinating, but inaccessible to the human mind. Garcia-Dias offers really amazing examples from his career. He invested time in testing the chemical history of galaxies: “With machine learning tools you can understand where the interstellar gas in each of them comes from”. Your data constraints limit your performances: “not all clusters are distinguishable with today’s approaches”.

The second example from Rafael Garcia-Dias’ work is an analysis of MRI scans. “We determine the brain age, then we compare it with the real age of the person”, explains the King’s College researcher; “the results can help diagnose some important diseases in time”.

Machine Learnng: Linearity is broken

A common mistake researchers make is fostering misleading expectations about the process’ linearity. First of all, each model has its own limitations, and the coder must be aware of these in order to be sure the reality will match the desired results.

One successful example of Rafael Garcia-Dias’ experience is based on the k-means algorithm. It’s important to understand the underlying assumptions of your model. In k-means the basic distance is euclidean, and this is one constraint to the use of this model. It rarely works ‘as is’, and often needs much work on data and parameters. Moreover, there are many viable alternatives, such as GMM and DBscan.

Gaussian Mixture Modelling is an extension to the k-means algorithm, assuming that all the data points are generated from a mixture of a finite number of Gaussian distributions with unknown parameters. The Scikit-learn library allows the use of GMM with several alternative strategies.

Density-based spatial clustering of applications with noise groups together points with many nearby neighbors.

Garcia-Dias tested these three algorithms with different parameters, showing that very small changes can significantly alter the homogeneity score. If you have a feel for your data you can limit the number of trial tests you need to carry out.

Deciding what model suits you best: real world apps

Many different algorithms on the market have multiple libraries. You have to know what is behind their code in order to make good use of them. This great variety of development tools could generate a problem of choice. Coding for machine learning can look strange but it’s more or less like any other kind of programming: if you know one environment, you can learn any other environment.

Existing libraries can look inadequate for a particular goal, so the researcher may think of writing their own code. Is this usually a mistake?

“I never write new libraries myself“, answers Rafael Garcia-Dias, “because that code is highly optimized and strongly reviewed. But I often look for other libraries in different languages”. Python‘s libraries are often surpassed by R’s equivalents, to give one example.

“Great programmers develop great libraries, and algorithms, all stuff that flows in the open-source software pool, sooner or later”. Each of these will have its own limitations to study and understand so that a user makes the best choice. It’s better to spend time looking for dummy classifiers and dummy regressors!

Conclusions

Bad models don’t exist, to be crystal clear. There are some silly, limiting mistakes to be avoided: it’s essential to be aware of the assumptions behind each model, and to really feel your database.

The most important advice suggests a continuous process flux – “never quit thinking” – that looks well suited to both AI algorithms and real-life activities.

facebooktwitterlinkedinreddit
Share on:facebooktwitterlinkedinreddit

Tagged as:Codemotion Milan

Coronavirus: a puzzle game to help in finding a cure
Previous Post
The Lifecycle of the Developer’s Career
Next Post

Related articles

  • Neural Networks: The Evolution of Deepfakes
  • 6 Courses to Dive Deep Into Machine Learning in 2022
  • Programmable Logic: FPGA Internal and External Interfacing
  • Embedded Processing in Programmable Logic
  • FPGAs: What Do They Do, and Why Should You Use Them?
  • How to Optimise Your IoT Device’s Power Consumption
  • How to Implement Data Version Control and Improve Machine Learning Outcomes
  • The Rise of Machine Learning at the Network Edge
  • The Future of Machine Learning at the Edge
  • Questions and Answers in Virtual Assistants

Primary Sidebar

Learn new skills for 2023 with our Edu Paths!

Codemotion Edu Paths for 2023

Codemotion Talent · Remote Jobs

Java Developer & Technical Leader

S2E | Solutions2Enterprises
Full remote · Java · Spring · Docker · Kubernetes · Hibernate · SQL

AWS Cloud Architect

Kirey Group
Full remote · Amazon-Web-Services · Ansible · Hibernate · Kubernetes · Linux

Front-end Developer

Wolters Kluwer Italia
Full remote · Angular-2+ · AngularJS · TypeScript

Flutter Developer

3Bee
Full remote · Android · Flutter · Dart

Latest Articles

web accessibility standards, guidelines, WCAG

Implementing Web Accessibility in the Right Way

Web Developer

devops, devsecops, cibersecurity, testing

3 Data Breaches in Web Applications and Lessons Learned

Cybersecurity

The influence of Artificial Intelligence in HR

Devs Meet Ethics: the Influence of Artificial Intelligence In HR

AI/ML

google earth engine

What is Google Earth Engine and Why It’s Key For Sustainability Data Analysis

Data Science

Footer

  • Magazine
  • Events
  • Community
  • Learning
  • Kids
  • How to use our platform
  • Contact us
  • Become a Contributor
  • About Codemotion Magazine
  • How to run a meetup
  • Tools for virtual conferences

Follow us

  • Facebook
  • Twitter
  • LinkedIn
  • Instagram
  • YouTube
  • RSS

© Copyright Codemotion srl Via Marsala, 29/H, 00185 Roma P.IVA 12392791005 | Privacy policy | Terms and conditions

Follow us

  • Facebook
  • Twitter
  • LinkedIn
  • Instagram
  • RSS