• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Codemotion Magazine

We code the future. Together

  • Discover
    • Events
    • Community
    • Partners
    • Become a partner
    • Hackathons
  • Magazine
    • Backend
    • Frontend
    • AI/ML
    • DevOps
    • Dev Life
    • Soft Skills
    • Infographics
  • Talent
    • Discover Talent
    • Jobs
    • Manifesto
  • Companies
  • For Business
    • EN
    • IT
    • ES
  • Sign in

CodemotionFebruary 22, 2024

Data Science in Action: Real-World Use Cases and Success Stories 

Big Data
Applied Data science, machine learning, debugging
facebooktwitterlinkedinreddit

What is Data Science?

To avoid any misunderstandings about the definition of data science, please note that we will reference the definition provided by MIT(1) whenever this term is mentioned:

“Data science encompasses a set of principles, problem definitions, algorithms, and processes for extracting non-obvious and useful patterns from large datasets. It is closely related to the fields of data mining and machine learning, but broader in scope.“

Recommended article
March 11, 2025

Big Data: Limitless Growth and Its Impact on Today’s IT Landscape

claudia caldara

Big Data

Since data science coincides with many different disciplines, the use cases should provide some context as to how this scope can be narrowed to a specific end. The success stories are to provide clear examples of feasibility in real-world production.

Each case will be presented in a similar format that includes a real-world challenge, a data science-related solution for this challenge, and finally, how data science helped solve it.

Case 1: The Science of American Football

Our first use case examines the apparent connection between data science and sports success, using Amazon Web Services (AWS) is a deep well of documented solutions. The staff writer’s report was gathered using information passed by Elena Ehrlich, a data scientist at AWS.

The Challenge

The National Football League (NFL) has long since adapted metrics as a way to evaluate players, starting with the scouting combine that occurs before every draft. Ehrlich’s system is a natural evolution of simply evaluating the fastest players based on their dash times, and judging quarterbacks based on how well they can hit non-human targets.

The Science

According to Ehrlich, the Splice Binned-Pareto distribution (SBPD) method to “robustly and accurately model time-series with heavy tailed noise”. In probability, the term ‘heavy tailed’ refers to number sets with wild distribution, or high levels of randomness. These are the same type of distributions as such wide-ranging scenarios as weather patterns and anthropological studies covering countrywide populations.

The Solution

In football speak, their method of data analysis accounts for many more scenarios than the previous ways, including the various circumstances outside the game itself that affect player performance. The results are also presented empirically, and experimentation can occur continually and abundantly as long as NFL games are played.

SBPD was shown in the updated passer rating system used by the NFL, which supplanted the archaic QB rating used by the league’s official trackers.  Their model still measures a player’s performance, only it accounts for more variations throughout different time periods, as well as the factors that contribute to these changes.  

Of course, the implications go beyond just sports, since the same ideas can be applied to any wildly unpredictable series of events. This is particularly useful for predictive models in markets affected by wildly variable factors like product sales that are based on social media trends. 


Recommended video: Why Most Data Science Projects Never Make it To Production

Loading the player...

Case 2: Uber’s Revolutionary Ride Algorithms

As the premier transportation company in the world, Uber epitomizes the idea of a data science success story. This case study provides more insight into the specific methods that were used to manage intellectual property with virtually unlimited growth potential.

The Challenge

Uber faced the type of logistical challenges one might expect with the pool of consumer data alone. The necessity of multiple disciplines relating to big data becomes apparent from a single ride, which must estimate the driver’s ETA based on the user’s location while factoring in traffic and providing a fare. This requires geolocation data, personal and financial consumer data, and real-time traffic data working for millions of transactions per month. 

The Science

Uber is secretive about the specifics of their operation, but many of their approach to data science can be inferred from an excerpt of job qualifications for Uber’s Senior Data Scientist position:

  • Selecting and employing advanced statistical procedures to obtain actionable insights
  • Cross-validating models to ensure their generalizability
  • Designing and analyzing large-scale online experiments and interpreting the results to draw actionable conclusions

From these three we can glean that Uber employs ‘advanced statistical procedures’, which are really the company’s proprietary algorithms. Some of these may have been reverse-engineered, but the specific weighted values applied by Uber are probably ever-changing. We can also see the apparent use of cross-validation models and deep learning experiments in Uber’s toolset.

The Solution

It can be surmised that Uber employs a more specific set of algorithms to weigh different factors according to region. Much of the geolocational lifting would be completed by the map application, which employs its own high-level correlation between GPS and vehicle telemetry to pinpoint the nearest drivers and the best routes. Uber adds machine learning, artificial intelligence, and route optimization algorithms that draw upon real-time data continuously.

Case 3: Open-Source Machine Learning 

Our third use case examines a success story that isn’t directly related to business or victory, but the type of innovation that affects everyone. TensorFlow–originally created by Google Research–is an open-source tool created by data scientists for data scientists to employ and manage machine learning operations.  As such, it is a complex application meant for deep learning projects, to be used by software engineers and those of similar experience.

The Challenge

The challenge of TensorFlow was to adapt a concept as varied and specific as machine learning into a cohesive product that is adaptable to the rigors of any task. As machine learning can apply to virtually any field that employs large data sets, creating an adaptable solution means you must consider all avenues of inquiry within reason. 

The Science

A ‘tensor’ is a mathematical term that describes the multilineal relationship between sets of algebraic objects, which makes them a ‘straightforward’ way to describe physical objects using mathematical dimensions in this context. Connecting complex mathematical objects in such a straightforward yet versatile way enables such high-level data diving that can only occur with node mapping and neural networks. 

As its moniker suggests, TensorFlow is most useful in graphing the relationships between these complex entities in a way that provides far more insight. This allows software developers, hardware developers, and even social media marketers to observe more specific patterns in large data sets with many varying attributes, and make corrections accordingly.

The Solution

Nvidia, the world’s most well-known video card developer, provided an ideal use case about the value of TensorFlow for production in their article describing the product. Nvidia describes the benefits of using TensorFlow, and by extension, the benefits of deep learning algorithms for worldwide hardware manufacturers.

Nvidia uses Tensorflow to model various processes, but they most likely spend the most time creating computational simulations that represent actual hardware. And if the obvious connection between their GPUs and deep learning seems dubious, note that even larger companies like Twitter, Airbus, and PayPal employ TensorFlow as a foundational tool.

More on Data Science

Thanks to tech giants like Google, Oracle, and Linux Foundation actively supporting open source development, data sciences like deep learning and artificial intelligence have become available to everyone. And while some companies might think that the concepts involved are beyond their means, they’re likely already applying data science in some capacity each time they use an app for work.

If you’d like to learn more about how data science can be applied to your specific modeling needs, more resources are available here.

References:

  1. Top 15 data-science Software Development Companies in 2023 
  2. Data Science – MIT 
  3. Using data science to help improve NFL quarterback passing scores 
  4. The science behind NFL Next Gen Stats’ new passing metric
  5. TensorFlow – What Is It and Why Does It Matter? 

Related Posts

Logical data warehouse vs traditional data warehouse. This article explores the advantages of logical data warehouses.

Logical Data Warehouses vs. Traditional Data Warehouses

Codemotion
July 20, 2023

MapReduce Not Dead: Here’s Why It’s Still Ruling in the Cloud

Codemotion
March 7, 2023
apache kafka

Is Apache Kafka Still Relevant?

Codemotion
December 12, 2022

Data Lake vs. Data Warehouse: Which to Use?

Pohan Lin
July 11, 2022
Share on:facebooktwitterlinkedinreddit

Tagged as:data science

Codemotion
Articles wirtten by the Codemotion staff. Tech news, inspiration, latest treends in software development and more.
Angular Model Inputs: two-way binding inputs with Signals
Previous Post
gRPC in a Cloud-native Environment: Challenge Accepted
Next Post

Footer

Discover

  • Events
  • Community
  • Partners
  • Become a partner
  • Hackathons

Magazine

  • Tech articles

Talent

  • Discover talent
  • Jobs

Companies

  • Discover companies

For Business

  • Codemotion for companies

About

  • About us
  • Become a contributor
  • Work with us
  • Contact us

Follow Us

© Copyright Codemotion srl Via Marsala, 29/H, 00185 Roma P.IVA 12392791005 | Privacy policy | Terms and conditions