When you read or watch the latest news on the spread of COVID-19 you will often find them accompanied by impressive looking graphs in convincing designs. Which information did they use to visualize these COVID-19 data analytics? And is it possible to reproduce what you’ve read?
Thinking about the spread of COVID-19 you can think of a lot of different parameters to analyze. You could include variables like population density or household numbers. Or you could think about weather data, local regulations, etc.
With all these different sources you might be curious to understand which analysis has been done to draw certain conclusions. And you might want to run your own analytics on COVID-19 data and examine data sets to test your own statements. But where to start?
Mo Haghighi, IBM’s head of Developer Ecosystems in Europe, has prepared a step-by-step workshop series, COVID-19 data analytics with Kubernetes and OpenShift, where you can run your own analytics on COVID-19 data and examine data sets. The different episodes will cover COVID-19 data retrieval, parsing, and analytics.
What will you learn about COVID-19 data analytics?
The series explain you how to retrieve COVID-19 data from an authentic source and make it securely available through REST APIs on Kubernetes and OpenShift. The primary applications are developed in the Spring Boot open source Java-based framework, but you can add more features and apply analytical services on the data in the form of microservices written in different programming languages.
The ultimate goal of the workshop series is to teach developers how to automate the entire application development process, so developers only focus on coding and let OpenShift take care of all the heavy-lifting and tedious tasks in the background.
As mentioned earlier, to simplify the learning journey and make it use-case oriented, these workshops are designed around COVID-19 data analytics.
But before you start on these workshops, let’s have a closer look at some of the definitions.
On the high level, the application has been developed in Java programming language on Spring Boot framework and provides us with a number of API endpoints for retrieving COVID-19 data per region, country, dates and periods.
COVID-19 data is fetched from Johns Hopkins University‘s repository on GitHub, which has been consistently referred to as an authentic source of COVID-19 data by various authorities around the world.
Our application is cloud native, which means it has been built, delivered and operated in a way that is not hard-wired to any infrastructure.
Due to its cloud native architecture, the application has been partitioned into multiple containerized microservices including the data parsers for the number of positive cases, mortality rates, and a user interface. Microservices are containerized by Docker and deployed on Kubernetes and OpenShift.
In summary, our application orchestrates multiple containerized microservices for parsing COVID-19 time-series data, for the number of positive cases and mortality rates in different countries and regions.
In the first workshop you will learn about cloud-native application development, the benefits of microservices architecture, and the motivations behind their vast adoption. Then it’s time for a quick tour of a COVID-19 application and how it is designed.
Would you like to skip the intro and dive directly into the tutorial of your interest? Here’s an overview of the other episodes:
- In the second workshop you will learn about containers and how to use Docker as the de facto standard to containerize and test your applications.
- In the third workshop you will learn about container orchestration, Kubernetes concepts and components, and how to deploy and scale your application on Kubernetes.
- In the fourth workshop you will dive into the Red Hat OpenShift Container Platform and experience how OpenShift simplifies and secures your orchestration tasks by automating the steps taken with Kubernetes. You first use the command-line interface tool to deploy and scale the built containers. Then, you use the OpenShift web console to deploy the application by using only its source code with a few clicks. That powerful feature for developers is called Source-to-Image (S2I).
- In the fifth workshop you explore how Red Hat CodeReady Workspaces on OpenShift helps teams build with speed, agility, security, and, most notably, code in production from anywhere.
- In this sixth workshop you use Red Hat CodeReady Containers to build, test, and deploy your application locally on your machine.
COVID-19 data analytics and cloud: topics summary
As you have read, there are several technologies and tools mentioned above. Before we dive deeper into this tutorial, here is a quick summary of those topics, mainly attempting to educate you on how modern applications, like this COVID-19 analytical application, are designed.
Modern web application development
Modern application development practices require developers to design their applications in a way that it follows agile principles by focusing on the application itself, rather than where it resides. Following such principles is part of ‘Cloud Native’ application development, which ultimately make the applications highly scalable in order to be seamlessly upgraded and migrated across multiple hosting platforms.
Cloud native refers to how an application is built and deployed, rather than where the application resides. It basically defines that the application must be built, delivered and operated in a way that it is not hard-wired to any infrastructure.
Cloud native development offers those advantages simply by relying on microservices architecture that are designed to integrate into any cloud environment. A cloud native application consists of discrete, reusable components known as microservices. Microservices architecture is the building block and most essential ingredient of cloud native applications.
Before we dive deeper into microservices, let’s take a look at the type of application used to be developed in the traditional way or as they are called “Monolithic applications“.
Almost a decade ago, agility, time to market and rapid application deployment were not as vital as they are today. Developers built a product and added features to it over time. As new features were added, the application size grew bigger and bigger in size and complexity.
Applications were designed to run on top of Virtual Machines, integrated with an operating system, libraries, dependencies and a single database attached to the entire application.
As demand grew higher, scaling the application was literally meant spinning up new VMs and adding more machine to the infrastructure.
Different services in the monolithic applications were tightly integrated and failing one part of the application caused the entire application to become unresponsive or unusable for clients.
Failures aside, when making updates, performing maintenance or adding a new service, the entire application must have been rebuilt and deployed again. If new updates happened to cause the application to fail over time, the entire application had to brought down in order to fix the problem. Identifying and fixing the error was a tedious process and never guaranteed to succeed in a short amount of time.
On the other hand, developers who worked on a project often had to program in the same programming language and use common platforms and tools to keep their individual parts compatible.
Scaling a single part of monolithic application in most cases wasn’t possible unless deployed on a separate VM. That required the entire application to be updated with additional resources for the entire application instead of individual services.
Monolithic application also disrupted teamwork and prevented collaboration between developers and operation engineers. There always existed a massive tension between the two teams, pointing fingers every time something went wrong.
Developers used to blame operation engineers for not having a thorough understanding of the architecture and causing their code to break, and operation engineers blamed developers for delivering software, which was not scalable, ready for production and demands too much underlying resources.
Advantages of Microservices
Microservices architecture addresses all of the liabilities that are inherent in monolithic applications.
Microservice architecture advocates partitioning large monolithic applications into smaller independent services that communicate with each other by using HTTP and messages.
Services must be:
- Highly maintainable and testable
- Loosely coupled
- Independently deployable
- Organized around business capabilities
Microservices architecture addresses all of the liabilities that are inherent in monolithic applications.
In summary, microservices architecture allows:
- Different parts of the application to evolve on different timelines
- Different parts of the application can be deployed separately
- Developers to choose their technology stack for each Microservice as it best fits the purpose
- Individual services to scale dynamically at runtime rather than the entire application
The most obvious advantage however is: if any part of the application fails, the whole application will not necessarily become unavailable/unresponsive to the customer. Because the application is not designed and operated as a single entity like in monolithic architecture.
For microservices to be independently deployable and runnable, they must have all their dependencies and libraries integrated, or as we will learn in the next section, they must be ‘containerised’.
Container is a unit of software with the application code, packaged with its libraries and dependencies so that it can be run anywhere.
If you’re familiar with virtual machines, the first question you may have will be, what are the differences between containers and virtual machines?
VMs have been around for quite a while and are considered the foundation of the first generation of cloud computing. In fact, containers are inspired by the VMs, but ‘Containers’ are a lighter-weight and more agile way of handling virtualization.
Rather than spinning up an entire virtual machine, a container packages together all the essential ingredients needed to run your app except the Operating system. That minimalization alone makes a huge difference when it comes to handling, transferring and loading containers.
Virtualization is a process to create an abstraction layer over computer hardware that allows the hardware elements of a single computer to be divided into multiple virtual computers.
Containers on the other hand, virtualize the operating system, so each individual container contains only the application, its libraries and dependencies, instead of virtualizing the underlying hardware. And in the case of Docker (as a de facto containerization tool), this so-called virtualization of the operating system takes place by Docker.
Containers have been around for a long time (and developers can create containers without relying on Docker) but Docker makes it easier, simpler, and safer to build, deploy, and manage containers.
Docker is essentially the first toolkit that, due to its simplicity, enabled all developers to build, deploy, run, update, and stop containers using simple commands and work-saving automation.
A container ‘image’ is only a package of code, libraries and decencies like a tar file, and it only becomes a container at runtime.
Containerization platforms and tools like Docker have a good view of what’s happening to our containers, but not our host machine.
A multi-container application must run on a multi-host environment in order to eliminate that single point of failure. If one host went down our orchestration tool can switch the load to another host.
We need to be able to create new instances of our individual microservices containers to scale accordingly. When one or more of our services need to be updated, or let’s say we are adding a new service to our mix, the orchestration platform must be able to automatically schedule new deployments and create new instances of our containers with zero downtime.
Kubernetes is an open-source system for automating deployment, scaling, and management of containerized applications. Kubernetes scales and manages our containers according to the available underlying resources on the host.
It also checks our container continually to make sure they are healthy, and in case of any failure, it’ll take actions to reinstate the deployment, create new instances or restore the services.
In summary, Kubernetes attempts to continuously reconcile between the observed and the desired states. Users state what they desire with regard to running the application, such as the number of instances of individual microservices, the allocated resources, etc, and Kubernetes observes the actual state of our application and will keep on adjusting and re-adjusting our resources to make that happen.
Learning Kubernetes commands, having to containerise microservices and working from a command line interface are not ideal for developers. In addition, Kubernetes is an open-source project and lacks enterprise features such as regular updates, security releases and application/platform certification.
OpenShift is built on top of Kubernetes, and brings along all the brilliant advantages of Kubernetes, but it bundles Kubernetes with features that will ultimately provide the best experience to both developers and Operation engineers.
OpenShift wraps Kubernetes around a number of components, including an enterprise-grade Linux operating system (RHEL/CoreOS), Networking, monitoring, registry, and more importantly, authentication and authorisation services.
The above-mentioned components ultimately provide several automated workflows for enhancing security, boosting productivity and seamless migration across multiple clouds.
Some of the key advantages of OpenShift include a simple Web Console, Homogenous Architecture/Interface across multiple public clouds and Enterprise support.
One of the most distinctive features of OpenShift is its unique and feature-rich web console that allows developers and operation engineers to implement various tasks from a simple graphical interface. They can build, deploy, expose, update, and almost implement any task in two separate perspectives of developer and administrator.
Cloud Platform Agnostic
Kubernetes’ offerings differ from one platform to another. Almost every major cloud provider offers a different flavour of Kubernetes. They offer different sets of add-ons, plug-in and instructions for connecting different components/resources to the application.
In most cases, those instructions are only applicable to that particular platform, and cannot be migrated from one platform to another.
With OpenShift container platform, Developers’ experience and the way they interact with the platform though the web console is identical. Therefore, building, deploying and managing applications with OpenShift container platform is truly “build it once and deploy it anywhere”.
Kubernetes is an open-source project, whereas OpenShift is a product based on an open source project (Kubernetes Origin Distribution or OKD). Comparing Kubernetes with OpenShift is like that classical example of comparing an engine with a car.
OpenShift includes enterprise support, ecosystem certification And most importantly, regular releases and security updates at every level of the container stack and throughout the application lifecycle.
OpenShift for Developers
OpenShift offers an opinionated integration of features to simplify and expedite the application development process. It delights developers in three key areas:
1. Application Templates
Developers want to get started on coding as quickly as possible, rather than spending time learning about different platforms, tools and services, and how to refactor their applications.
OpenShift comes with pre-created quick start application templates, allowing developers to build their application, based on various programming languages, frameworks, and databases, with one click from the user interface. It also allows developers to define their own custom templates.
2. Continuous Integration and Continuous Deployment (CD/CD)
Developers want to focus on coding and not worrying about what’s going to happen in the background. Deploying to OpenShift is as easy as clicking a button from the user interface and enabling continues deployment. OpenShift allows developers to fully control the deployment lifecycle by enabling continues integration, whether the updates are being ‘pushed’ from their git, or manually through images or containers.
3. CodeReady Containers
Developers often desire to test their applications locally on their own machine, before deploying to public clouds. OpenShift offers CodeReady Containers, which is a local version of OpenShift running on Linux, MacOS and Microsoft Windows. CodeReady Containers can significantly expedite application development process.
Good luck and happy coding!