• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Codemotion Magazine

We code the future. Together

  • Discover
    • Events
    • Community
    • Partners
    • Become a partner
    • Hackathons
  • Magazine
    • Backend
    • Frontend
    • AI/ML
    • DevOps
    • Dev Life
    • Soft Skills
    • Infographics
  • Talent
    • Discover Talent
    • Jobs
    • Manifesto
  • Companies
  • For Business
    • EN
    • IT
    • ES
  • Sign in
ads

Leo SorgeMay 28, 2021

Serverless Event Processing on AWS Platform w/ Kinesis

Cloud
Serverless Event Processing on AWS Platform w Kinesis
facebooktwitterlinkedinreddit
Table Of Contents
  1. The project
  2. Introduction to serverless
  3. General overview of an event architecture
    • Producers
    • The Schema Registry
    • The AWS Lambda
    • The DynamoDB 
  4. Kinesis: the AWS fully-managed streaming service 
    • AWS Kinesis Vs. Kafka 
    • Kinesis data analytics
    • Glue
    • Athena
    • API Gateway
  5. Case Study: Feeding Read APIs
    • Producer: the code
    • Lambda function: the code
  6. Setting up serverless on AWS
    • Configuring AWS Kinesis
    • Configuring DynamoDB
    • Configuring the Lambda function
  7. Conclusions

The project

Openbank is the 100% digital bank of the Santander Group, currently undergoing a technological transformation and international expansion. The work is organized in a startup-like format, using agile methodologies to take client experience to the next level. 

The Netherlands, Germany and Portugal were OpenBank’s flagships in 2019, with  Argentina as the next target, with others to follow. The microservice architecture runs in AWS and the languages and frameworks used include React, Java, Spring, Kotlin, Scala, Spark, Python, Flink, and more.

Recommended article
kubernetes cost optimization
July 23, 2024

Kubernetes Cost Optimization: A Developer’s Guide

Gilad David Maayan

Cloud

This article offers an initial description of the tools used, followed by a real use case with AWS platform w/ Kinesis.

Introduction to serverless

‘Serverless’ is a model of Cloud-based execution in which applications are built and run, but management of the infrastructure is delegated to the cloud provider. 

With this model, tasks such as provisioning, configuring, maintaining, operating or scaling the server can be forgotten about. All the related billing phases can be simply managed when working with single functions or microservices.

General overview of an event architecture

Openbank has developed an event-based architecture that allows the decoupling of applications from each other. Broadly speaking, this architecture contains a group of data-producing applications and a group of data-consuming applications. Individuals can belong to one or the other, or to both groups.

Both producers and consumers make intensive use of serverless technologies. A simplified diagram of the proposed architecture would be as follows:

Diagram: A simplified diagram of the architecture proposed by Openbank.
Serverless technologies can be used by both data-producing and data-consuming applications.

The main players in this diagram are producers, the schema registry, and AWS Lambdas, although many other components are also necessary to the architecture.

Producers

Producers use a common client that sends messages to the Kinesis streams. The messages, schemas, and metadata are sent in the Avro serialization format.

Avro is a row-oriented remote procedure call and data serialization framework developed within Apache’s Hadoop project. It uses the JSON format for defining data types and protocols, serializing data in a compact binary format. The three heading bytes are the identifier of the schema in the schema registry. 

The Schema Registry

This registry provides a metadata serving layer with a RESTful interface for storing and retrieving Avro, JSON Schema, and Protobuf schemas. The registry stores a versioned history of all schemas based on a specified subject name strategy, provides multiple compatibility settings and allows the evolution of schemas.

The schema registry provides serializers that plug into Apache Kafka clients that handle schema storage and retrieval for Kafka messages sent in any of the supported formats.

The AWS Lambda

AWS Lambda is an AWS service that allows code to be executed in various languages ​​such as Python, node.js, Go, Java, Ruby or Powershell without worrying about managing infrastructure. The system has a multitude of triggers that go from API gateways to events in S3 or Kinesis messages.

Tasks that range from simple scripts that execute based on events to REST APIs through lambdas can be executed.

The DynamoDB 

It is a fully managed key/value NOSQL database that offers throughputs below 10 milliseconds at any scale. It provides a flexible pricing model, a stateless connection model that works seamlessly with serverless, and consistent response time even as your database scales to enormous size. It can be interesting to match its characteristics with other NOSQL databases such as MongoDB.

Kinesis: the AWS fully-managed streaming service 

Real-time data come in an almost infinite variety of formats. All of these need to be treated in the same way in today’s information-based systems.

Amazon Kinesis makes it easy to collect, process, and analyze in real-time, streaming new data formats such as website clickstreams or IoT streams, together with classical application logs, texts, audios, and videos, without the need to manage the related infrastructure.

AWS Kinesis Vs. Kafka 

Data streams are often managed through Kafka Streams, a kafka-based library for building streaming applications that transform inputs into database calls, API calls, or Kafka items. The library sports a concise code structure, a distributed architecture, and a fault-tolerant approach.

AWS offers Kinesis in place of Kafka Streams. It is interesting to take a look at the main differences between Kafka Streams and Kinesis.

Kafka requires the organization to book DevOps time to manage clusters, while Kinesis comes in a fully-managed version. This means Kafka looks more flexible, but that comes at a cost. Absolute performances depend heavily on the use case.

AWS Kinesis is fully compliant with the AWS structure, allowing data to be analyzed by lambdas and processing to be paid for by use.

Kafka Streaming allows functional aggregations and mutations to be performed. 

Kinesis data analytics

AWS Kinesis Analytics allows for the performance of SQL-like queries on data. This module runs flink jobs without having to manage a Hadoop cluster and can be used to do window operations on streams inside the proposed project.

Further components are shown in the functional diagram of this project: Glue, Athena, and the API Gateway.

Glue

Glue allows Spark jobs to be run in a serverless way, without the need to manage a Hadoop cluster. Glue also has a fully managed metastore and a crawler to retrieve data.

Athena

This is the AWS serverless version of Apache Presto. Among other things, Athena allows queries to be launched on files in S3 buckets; thanks to federated queries, tables from different databases can be joined.

API Gateway

A frequent way to transform Http req/res into events that a lambda can handle.

Case Study: Feeding Read APIs

A good way to illustrate part of the event-based architecture is to  focus on the part of the architecture that powers the reading APIs.

In the example below, the payment module will send a message after making a payment: it is saved in the DynamoDB, so the reading APIs can consult it.

The process to power the DynamoDB responds to the following diagram.

Diagram detailing the API-powering part of the architecture.

Descripción generada automáticamente
DynamoDB is central to lambda usage.

The producer registers the message’s schema in the schema registry. The producer sends the message to the Kinesis stream. The lambda takes three steps:

  • it receives the message
  • it retrieves the schema from the schema registry and parses the message
  • it saves it in DynamoDB optimized for reading.

Producer: the code

The data producer will take care of sending the messages in Avro format to Kinesis.

First, the confluent dependency is added in order to be able to first serialize the messages, then access the schemas of the schema registry and the dependency of the AWS Kinesis SDK to be able to send the messages.

A simple POJO (Plain Old Java, or JavaScript, Object) is created to be the message that will be sent to Kinesis serialized in Avro.

The producer code is:

The process continues step-by-step: 

The first thing to do is to initialize the serializer for the message in Avro format:

The producer can be configured so that at the time of serializing Avro it generates the schema in the schema registry, or the schemas can be created in advance in the schema registry.

The message to Avro is serialized:

Then initialize the Kinesis client: 

And send the Avro to AWS Kinesis.

Lambda function: the code

A lambda function will take care of receiving the messages in Avro format, passing them to JSON and saving them into the DynamoDB. First, it will receive the message in Avro containing, in its first three bytes, the schema id in the schema registry. 

The confluent library will access the schema in the schema registry from the id and will parse the message that will be passed to JSON to save in DynamoDB.

To be able to deserialize the data, the confluent dependencies must be added.

Next, it’s time for the lambda function, Kinesis, and DynamoDB dependencies: 

Finally, the JSON dependency.

Lambdas in AWS require the reception of an Uber-Jar with all the dependencies in it. Use the maven-shade-plugin for this task.

The lambda code looks like this:

Here is the code for the Lambda function:

Now, it’s time to implement the lambda interface for Kinesis.

That means implementing the method:

This method will be in charge of receiving the calls with the AWS Kinesis messages:

The code will simply deserialize the received messages and save them in a POJO object that is annotated with DynamoDB annotations and will save the messages in a table.

The most important annotations are:

  • DynamoDBTable: contains the name of the dynamo table.
  • DynamoDBHashKey: the partition key.
  • DynamoDBAttribute: the attributes.

Setting up serverless on AWS

It’s better to define all our infrastructure as code with cloudformation, but in this case to keep it simple we will define the infrastructure with the AWS console.

Configuring AWS Kinesis

After accessing the AWS account and going to the Kinesis section, click on the option ‘Create data stream’.

User interface: Amazon Kinesis data stream creation
AWS Kinesis has an easy configuration interface.

The configuration screen of the stream appears; here the name and the number of shards desired are to be specified. Then click on “Create data stream” and the stream is created!

User interface: Configuring Data Streams in Amazon Kinesis.
Specify the name and the number of shards desired for the data stream.

Configuring DynamoDB

The DynamoDB section allows for creating tables.

User interface of Amazon Kinesis: Creating a table in DynamoDB.
Creating a table is a one-click process on DynamoDB.

The table name and the primary key are required.

User interface of Amazon Kinesis: the “Create table” module.
Input the table name and the primary key.

Configuring the Lambda function

Load the code into the lambda function in the lambda section of the AWS console – just click the ‘Create Function’ button.

Application: Configuring a lambda function in AWS Kinesis.
The AWS console offers a direct link to the lambda-function section.
User interface of Amazon Kinesis: Creating a lambda function.
Choose ‘Author from scratch’ if you want to provideinformation about the lambda function.

Name the function in the ‘Author from Scratch’ option, choosing Java 11 as the language.

User interface Amazon Kinesi: Coosing the language for Lambda runtime.
Specify the runtime the code is written for.

The next section is the Security section, to be compiled with an IAM role; with both read and write permissions on DynamoDB, permissions to consume the AWS Kinesis stream and a lambda execution role. 

Then hit the create button. The main configuration screen of the lambda function shows up.

User interface of Amazon Kinesis: The main configuration screen of the lambda function.
The created lambda function page can be accessed easily.

By pressing the ‘+ Add trigger button’,  all possible triggers for a function are listed. Choose Kinesis in this case. A new form appears in which to choose the stream that has been previously created.

The uber-jar that can be built with a simple command – mvn clean package – is uploaded in the code section. Put the handler in the ‘Runtime settings section’ and save it: the lambda function is already configured to receive messages from the stream, then write data in DynamoDB.

Conclusions

A very simple Proof of Concept that illustrates part of how to create a fully serverless event-based architecture has been demonstrated above. The creation of an event-based architecture is often highly complex in terms of both the configuration and operation of the platform. 

Thanks to the AWS serverless approach, these complexities are greatly reduced and the developer can spend the largest part of his/her time building functionality, thus providing real business value.

Related Posts

Kubernetes Helm Guide

10 Tips and Tricks for Using Kubernetes Helm

Gilad David Maayan
June 18, 2024
10 years of kubernetes

Celebrating 10 Years of Kubernetes: A Journey Through Innovation

Codemotion
June 3, 2024
microservices digital transformation. From monolith to microservices concept.

Two feet in a shoe: more than one container in a single Pod

gregoriopalama
March 26, 2024
A-landscape-oriented-image-that-embodies-the-concept-of-a-lightweight-proxy-approach-in-a-cloud-native-scenario-with-a-special-focus

gRPC in a Cloud-native Environment: Challenge Accepted

Diego Petrecolla
February 22, 2024
Share on:facebooktwitterlinkedinreddit

Tagged as:AWS Serverless

Leo Sorge
I hold a degree in electronics. I talk and write about science and technology in both real and close-to-real worlds since 1976. I frankly believe that business plan and singularity are excellent starts for science-fiction stories.
C++ to Microservices: Back-end Migration to Continuous Integration
Previous Post
Embedded Processing in Programmable Logic
Next Post

Footer

Discover

  • Events
  • Community
  • Partners
  • Become a partner
  • Hackathons

Magazine

  • Tech articles

Talent

  • Discover talent
  • Jobs

Companies

  • Discover companies

For Business

  • Codemotion for companies

About

  • About us
  • Become a contributor
  • Work with us
  • Contact us

Follow Us

© Copyright Codemotion srl Via Marsala, 29/H, 00185 Roma P.IVA 12392791005 | Privacy policy | Terms and conditions