The cloud computing panorama is a huge and messy collection of technologies, . Building elastic distributed systems is fun but orientating such a vast environment can be quite challenging.
ThoughtWorks was awarded by an Italian manufacturing company the task of re-engineering a multi-factory global scale application using an elastic infrastructure, delivering a proof of concept as soon as possible while keeping operational and maintenance costs low. The main objective of thewas to achieve better insight on the production process by collecting, streaming and aggregating data produced at each plant.
Another project requirement was to design a cloud agnostic application, to allow the integration of different cloud providers if needed.
Kumari and Singh described their journey by talking about four main topics: the overall system infrastructure, the streaming architecture, the data retrieval system they used and the DevOps procedures they adopted.
To run and deploy their services, ThoughtWorks went with the well-known and
The infrastructure was created with Terraform on AWS and the Kubernetes cluster was provisioned using Kops.
A few YAML files later, the cluster was up-and-running.
Several services were evaluated to implement the data streaming infrastructure. In particular, they evaluated SQS and KINESIS from Amazon, before deciding to go with Apache Kafka. Using a custom deployed streaming platform rather than an hosted one allowed keeping the operational costs low without sacrificing
Kafka was deployed on the Kubernetes cluster with confluent using the official Docker images.
For the querying service, ThoughtWorks went with Amazon Athena. Athena has a variety of built-in importers, supporting CSV, Parquet, JSON and others. It is based on the Presto engine and does not require extra ETL steps to run, as
Athena is interrogated by the application, written in Python, using the PyAthena interface library.
Adopting a continuous integration and deployment model is almost mandatory to maintain cloud applications, as they allow an effective improvement of the development team productivity.
ThoughtWorks evaluated two on-premise solutions to implement CI/CD pipelines for their application, comparing Travis and CircleCI. The latter was ultimately chosen for its better starting cost for enterprises.
Although Athena was initially chosen for the development of a proof of concept, it showed its limits when used as a frequently accessed service. Athena doesn’t handle high concurrent loads. Since it is designed as a non-ETL service, it doesn’t cache
Once again, serverless services are great and powerful, but choosing the one that fits a specific use case is a matter of both experience and good testing.