• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Codemotion Magazine

We code the future. Together

  • Discover
    • Events
    • Community
    • Partners
    • Become a partner
    • Hackathons
  • Magazine
    • Backend
    • Frontend
    • AI/ML
    • DevOps
    • Dev Life
    • Soft Skills
    • Infographics
  • Talent
    • Discover Talent
    • Jobs
    • Manifesto
  • Companies
  • For Business
    • EN
    • IT
    • ES
  • Sign in
ads

Sergio MonteleoneFebruary 21, 2019

A journey riding in the cloud(s)

Cloud
facebooktwitterlinkedinreddit

The cloud computing panorama is a huge and messy collection of technologies, services and providers. Building elastic distributed systems is fun but orientating such a vast environment can be quite challenging.

Talking about some do’s and don’ts in cloud computing, Suman Kumari and Wamika Singh from ThoughtWorks shared their adventure in migrating a monolithic application to the cloud at Codemotion Milan 2018.

ThoughtWorks was awarded by an Italian manufacturing company the task of re-engineering a multi-factory global scale application using an elastic infrastructure, delivering a proof of concept as soon as possible while keeping operational and maintenance costs low. The main objective of the project was to achieve better insight on the production process by collecting, streaming and aggregating data produced at each plant.

Another project requirement was to design a cloud agnostic application, to allow the integration of different cloud providers if needed.
Kumari and Singh described their journey by talking about four main topics: the overall system infrastructure, the data streaming architecture, the data retrieval system they used and the DevOps procedures they adopted.
 

Application Infrastructure

To run and deploy their services, ThoughtWorks went with the well-known and industry acclaimed approach of containerisation. Containers are portable, safe and cost-effective so they quickly became the de-facto standard for cloud applications. In particular, ThoughtWorks decided to host their containers on a Kubernetes cluster to benefit from features such as auto scaling, automated roll-outs and roll-backs, autodiscovery, etc.
The infrastructure was created with Terraform on AWS and the Kubernetes cluster was provisioned using Kops.
A few YAML files later, the cluster was up-and-running.

Data Streaming

Several services were evaluated to implement the data streaming infrastructure. In particular, they evaluated SQS and KINESIS from Amazon, before deciding to go with Apache Kafka. Using a custom deployed streaming platform rather than an hosted one allowed keeping the operational costs low without sacrificing performance.
Kafka was deployed on the Kubernetes cluster with confluent using the official Docker images.
 

 

Querying Service

For the querying service, ThoughtWorks went with Amazon Athena. Athena has a variety of built-in importers, supporting CSV, Parquet, JSON and others. It is based on the Presto engine and does not require extra ETL steps to run, as data is stored directly on S3 buckets. As with many other serverless services, it has a low infrastructure cost as the client pays only for the queries he/she runs.
Athena is interrogated by the application, written in Python, using the PyAthena interface library.

Continuous Deployment

Adopting a continuous integration and deployment model is almost mandatory to maintain cloud applications, as they allow an effective improvement of the development team productivity.
ThoughtWorks evaluated two on-premise solutions to implement CI/CD pipelines for their application, comparing Travis and CircleCI. The latter was ultimately chosen for its better starting cost for enterprises.
 


 

Learnings

Although Athena was initially chosen for the development of a proof of concept, it showed its limits when used as a frequently accessed service. Athena doesn’t handle high concurrent loads. Since it is designed as a non-ETL service, it doesn’t cache data and this may be inefficient in certain applications. ThoughtWorks ultimately decided to drop Athena and use RDS instead, developing a custom interface to RDS for Kafka and performing some pre-processing before dumping the raw data to RDS. As the reader may expect, moving from many small Parquet files to a relational database allowed a great performance improvement.
Once again, serverless services are great and powerful, but choosing the one that fits a specific use case is a matter of both experience and good testing.

Related Posts

Kubernetes Helm Guide

10 Tips and Tricks for Using Kubernetes Helm

Gilad David Maayan
June 18, 2024
10 years of kubernetes

Celebrating 10 Years of Kubernetes: A Journey Through Innovation

Codemotion
June 3, 2024
microservices digital transformation. From monolith to microservices concept.

Two feet in a shoe: more than one container in a single Pod

gregoriopalama
March 26, 2024
A-landscape-oriented-image-that-embodies-the-concept-of-a-lightweight-proxy-approach-in-a-cloud-native-scenario-with-a-special-focus

gRPC in a Cloud-native Environment: Challenge Accepted

Diego Petrecolla
February 22, 2024
Share on:facebooktwitterlinkedinreddit

Tagged as:Codemotion Milan

Sergio Monteleone
Software developer and the co-founder of Moga Software s.r.l., a software house based in Italy. I tend to write code for anything that has a C/C++ compiler, but don't mind using other technologies and languages. I love cats, dogs and, more in general, any lifeform when Lifeform.numLegs() <= 4.
SELECT for Cities contest: solve challenges and create value for citizens
Previous Post
Azure Durable Functions and Serverless Orchestration
Next Post

Footer

Discover

  • Events
  • Community
  • Partners
  • Become a partner
  • Hackathons

Magazine

  • Tech articles

Talent

  • Discover talent
  • Jobs

Companies

  • Discover companies

For Business

  • Codemotion for companies

About

  • About us
  • Become a contributor
  • Work with us
  • Contact us

Follow Us

© Copyright Codemotion srl Via Marsala, 29/H, 00185 Roma P.IVA 12392791005 | Privacy policy | Terms and conditions