• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Codemotion Magazine

We code the future. Together

  • Discover
    • Events
    • Community
    • Partners
    • Become a partner
    • Hackathons
  • Magazine
    • Backend
    • Frontend
    • AI/ML
    • DevOps
    • Dev Life
    • Soft Skills
    • Infographics
  • Talent
    • Discover Talent
    • Jobs
    • Manifesto
  • Companies
  • For Business
    • EN
    • IT
    • ES
  • Sign in
ads

Laura Melania RocchiSeptember 24, 2018

Kai Wähner: build a scalable infrastructure with Apache Kafka

Big Data
facebooktwitterlinkedinreddit

Kai Wähner works as a Technology Evangelist at Confluent, a Silicon Valley startup working closely with the Apache community to improve the project Kafka, a streaming platform to build highly scalable, mission-critical infrastructures.
His main area of expertise lies within the fields of Big Data Analytics, Machine Learning, Integration, Microservices, Internet of Things, Stream Processing and Blockchain. He is a regular speaker at international conferences such as JavaOne, O’Reilly Software Architecture or ApacheCon, furthermore, he also writes articles for professional journals.
Kai will deliver a talk about Deep Learning at Extreme Scale in the Cloud with Apache Kafka and TensorFlow at Codemotion Berlin 2018.

Discover more about Codemotion Berlin!

Kai, as Tech Evangelist, how would you describe the work Confluent is doing on Kafka?

Confluent builds Kafka itself (including Kafka Connect for integration and Kafka Streams for stream processing) and adds a powerful ecosystem including open source components such as REST Proxy, Schema Registry and KSQL (the Streaming SQL engine for Kafka). There is a great 40min video from a conference talk if you want to get a high-level introduction and overview: “Introduction to Apache Kafka as Event-Driven Open Source Streaming Platform”

How does your working routine look like? What is the most exciting part of being a Tech Evangelist?

As Technology Evangelist, I have two main tasks in my daily job:
1) Work with customers to discuss architectures, projects and a combination of different (cutting edge and legacy) technologies, and
2) do public talks, webinars and articles. I focus on cutting-edge technologies such as Apache Kafka and its open source ecosystem, Machine Learning frameworks such as TensorFlow, Internet of Things technologies such as MQTT, container technologies such as Docker and Kubernetes, and modern architectures leveraging microservices or serverless.
As part of preparing talks and demos, I also build small side projects on Github, e.g. for running Deep Learning models built with TensorFlow, DeepLearning4J or H2O within a Kafka Streams application (https://github.com/kaiwaehner/kafka-streams-machine-learning-examples) or end-to-end integration from MQTT devices to Kafka clusters in hybrid scenarios (on premise and public cloud) using KSQL and Confluent Replicator (https://github.com/kaiwaehner/ksql-udf-deep-learning-mqtt-iot).

What are the pros and cons of this technology?

Apache Kafka and its open source ecosystem are present in almost any big company. It evolved from a scalable messaging layer with high throughput capabilities to a much more powerful streaming platform. The use cases started with big data log analytics into Hadoop for batch processing, but now include mission-critical deployments for payments, real-time fraud detection, logistics or predictive maintenance. Kafka is everywhere and its ecosystem gets stronger and stronger every month.

Who is it that could use this technology?

Kafka is used by companies such as LinkedIn (processing over 4.5 trillion messages per day), Netflix (processing 6 Petabyte data per day at peak times), and almost any other tech giant. But also most traditional companies such as banks, telcos, retailers, automotive, and others use Kafka more and more as a central nervous system for their most critical and innovative projects.

It is not just used for high throughput and scalability, it also decouples systems and applications well. This allows building microservice infrastructures without tight coupling. Something that was not possible before – even with tools like Enterprise Service Bus (ESB) or other integration frameworks (which promised similar capabilities. My blog post “Apache Kafka vs. Enterprise Service Bus (ESB)—Friends, Enemies, or Frenemies?” (https://www.confluent.io/blog/apache-kafka-vs-enterprise-service-bus-esb-friends-enemies-or-frenemies/) goes into much more detail here.

What’s your and your company day-to-day commitment to Kafka?

At Confluent, we fix critical bugs, add new features (such as exactly-once semantics) and security standards (such as OAuth recently), and build a whole ecosystem with many new components (such as KSQL for scalable stream processing without writing source code). We also work closely with the Apache Kafka open source community on Kafka mailing list, via our community Slack channel (https://launchpass.com/confluentcommunity), in meetups all over the world, or at conferences such as Kafka Summit where you can listen to Kafka Committers from Confluent, but also to companies from LinkedIn, Apple, Uber, Zalando, Google, and many more.

Why should people be interested in Kafka?

As mentioned above, Kafka is cutting edge technology, but also used in many critical projects today. KSQL is a game changer. It allows people with less programming skills to still build powerful stream processing applications on top of Apache Kafka; just with SQL-like code, no source code like Java needed. KSQL also offers a REST interface, so data engineers and developers can use it from non-JVM languages such as Python, Go or any other REST-based tooling. While KSQL is easy to use, you can build powerful streaming use cases including Streaming ETL, Real Time Dash Boards or Anomaly Detection. The best is that it is based natively on Apache Kafka – with all its benefits like high scalability, high volume throughput and fail-over. You can deploy KSQL queries for continuous processing and scale it to millions of messages per second, with high availability and zero data loss.

What about Kafka and the hot topic “Machine Learning”?

At Codemotion Berlin, my talk will be about the combination of Apache Kafka and Machine Learning to build a scalable infrastructure for analytic models. This includes ingestion, preprocessing, training, deployment and monitoring of analytic models. This is a huge challenge for most companies as you cannot simply deploy some Python code into production and expect 24/7 availability and good performance. You need to have the right infrastructure for the whole ML process. This is where Kafka ecosystem shines, thus this is a perfect combination. See my blog post “How to Build and Deploy Scalable Machine Learning in Production with Apache Kafka” (https://www.confluent.io/blog/build-deploy-scalable-machine-learning-production-apache-kafka/).

Interested in Kafka, Big Data, Machine Learning? Join us at Codemotion Berlin, don’t miss the opportunity to deepen your knowledge on these topics with Kai Wähner on November 20-21!

Join Codemotion Berlin!

Related Posts

Applied Data science, machine learning, debugging

Data Science in Action: Real-World Use Cases and Success Stories 

Codemotion
February 22, 2024
Logical data warehouse vs traditional data warehouse. This article explores the advantages of logical data warehouses.

Logical Data Warehouses vs. Traditional Data Warehouses

Codemotion
July 20, 2023

MapReduce Not Dead: Here’s Why It’s Still Ruling in the Cloud

Codemotion
March 7, 2023
apache kafka

Is Apache Kafka Still Relevant?

Codemotion
December 12, 2022
Share on:facebooktwitterlinkedinreddit

Tagged as:Codemotion Berlin

Laura Melania Rocchi
Lorna Mitchell, between CodeReview and PHP7
Previous Post
Santiago Siri: How Blockchain will save the democracy
Next Post

Footer

Discover

  • Events
  • Community
  • Partners
  • Become a partner
  • Hackathons

Magazine

  • Tech articles

Talent

  • Discover talent
  • Jobs

Companies

  • Discover companies

For Business

  • Codemotion for companies

About

  • About us
  • Become a contributor
  • Work with us
  • Contact us

Follow Us

© Copyright Codemotion srl Via Marsala, 29/H, 00185 Roma P.IVA 12392791005 | Privacy policy | Terms and conditions