Humans are capable of walking on the moon. We can operate an International Space Station, and sending rovers onto the surface of Mars. We can launch hundreds of satellites into the unknown to gather data. We explore the universe, both personally and by using machines through initiatives such as HelloExoWorld.
Our quest to explore the galaxy involves many skilled individuals. They come from different backgrounds, combining different approaches and experimental techniques to solve complex questions. They wonder, Are we alone in the universe? Will we become an interplanetary species? Horacio Gonzalez presented a talk about HelloExoWorld at this year's Codemotion Rome 2019, an initiative that brings together space exploration, big data, and open source software.
The Kepler Project and NASA datasets
There are many satellites exploring space, moving silently in the cold, dark air, gathering data for research. One of the most famous is the Kepler space telescope, launched in 2009 by NASA and officially operating until 2018.
Kepler can be considered a hero robot, while orbiting around the sun and staring into the Milky Way (our home galaxy), it collected brightness data on over 150000 stars. Kepler examined stars with the aim of observing orbiting planets, ultimately helping discover inhabitable planets and – maybe – alien forms of life.
The United States are not keeping the data for themselves and, following the noble values of open data, they released around 25 terabytes (!!!) of data recorded from Kepler. Data is freely available at the NASA open data portal.
The Power of Project HelloExoWorld
How data is used as evidence of new planets
One of the most valuable approaches is the “transit method”. To put it simply, we look at how the brightness of a star changes over time, and we search for short and repeated periods of time in which the brightness decreases. This regular drop in brightness is evidence of an orbiting planet around the star. When a planet orbits around a star, it obfuscates part of the light reaching the telescope, so we observe less brightness for a while.
We should imagine stars with many orbiting planets, so the brightness track can be quite messy and hard to segment. The more planets orbit a star, the more the star is interesting, the harder the brightness profile is to analyse.
The Anomaly Detection Problem
The transit method translates in a time series analysis problem, and, more specifically, in an anomaly detection problem. Gonzalez described the main phases of their approach to anomaly detection. The starting point is a series of brightness values in time, regarding a specific observed star. First step is to eliminate noise by downsampling, for example with a rolling average. Then subtract the smoothed curve from the original one, so spikes in brightness are more evident.
On the technical side, this kind of analysis runs with the help of an open source tool named WARP10, specifically designed to deal with time series on massive amounts of data. WARP10 was initially developed at OVH, as a cloud infrastructure monitoring tool. It was then open sourced and also put to use on open science challenges. Among WARP10’s time series analytics functions there are: moving averages, ARMA, Markov hidden models, Fourier transforms and entropy encoding.
When scientists and cloud engineers team up
An interesting aspect of project HelloExoWorld is the synergy between NASA scientists and OVH engineers. Scientist are excellent at writing complex time series algorithms, taking into account astrophysics and going into detailed analysis, statistics and visualisations of the Kepler dataset. On the other front, cloud engineers are excellent at dealing with massive amounts of data, scaling both hardware and software, and keeping such a huge system monitored and resilient.
Horatio detailed:
“We scan big amounts of data using standard time series analytics, and identify interesting planets which NASA scientists can study deeper with advanced algorithms.”
That’s the perfect example of collaboration between research and industry, reaching a delicate equilibrium between scientific questioning and a technological solidity.
Project HelloExoWorld is a glimpse into the unknown universe, but also a glimpse into the future. What is unique about this endeavour in my view, is a combination of community values, openness (open data, open science and open source are involved), space exploration, big data, cloud computing and, most of all, dreaming big. A project in which science and industry give each other their strengths, curiosity is the main driving value and looking at the stars is both romantic and useful.
Take a look at Horatio's slide deck. f you're a fan of space exploration, you might enjoy our celebrations of the 50th anniversary of the moon landing, from this year's Codemotion Milan.