• Skip to primary navigation
  • Skip to main content
  • Skip to footer

Codemotion Magazine

We code the future. Together

  • Discover
    • Events
    • Community
    • Partners
    • Become a partner
    • Hackathons
  • Magazine
    • Backend
    • Frontend
    • AI/ML
    • DevOps
    • Dev Life
    • Soft Skills
    • Infographics
  • Talent
    • Discover Talent
    • Jobs
    • Manifesto
  • Companies
  • For Business
    • EN
    • IT
    • ES
  • Sign in

Natalia de Pablo GarciaDecember 10, 2025 5 min read

Build AI Apps That See, Hear, and Talk Back — In Under 30 Minutes

AI/ML
facebooktwitterlinkedinreddit

So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it.

You’ll need a speech-to-text API. A vision model. A language model. A text-to-speech service. WebRTC for real-time streaming. WebSockets for low-latency communication. Then you stitch them all together, pray the latency stays under two seconds, and debug the async chaos when audio and video fall out of sync.

Recommended article
November 26, 2025

Artificial Neural Networks: Biological Inspiration Behind Deep Learning

Orli Dun

Orli Dun

AI/ML

We’ve been there. That’s exactly why we built Orga AI.

Unified SDKs. A seamless API flow. Vision, voice, and conversation — processed together in under 700 milliseconds. Support for 40+ languages out of the box. And yes, you can have it running in under 30 minutes.

What Is Orga AI?

Orga AI is a real-time conversational AI platform. Users turn on their camera, speak naturally, and show the AI what’s happening. The AI watches, listens, and responds with its voice — in their language.

Here’s what that looks like:
User (pointing their phone camera): “My smart hub won’t connect. The light keeps blinking orange.”
Orga AI (watching the blinking pattern): “I see that orange blink — your hub lost its network settings. Show me the back and I’ll walk you through a reset.”
No typing. No screenshots. No “please describe your issue in detail.” The AI sees the problem and talks through the solution like a colleague would.
That’s the experience you can ship with Orga.

Getting Started with the Orga SDKs

Orga provides a suite of SDKs from client to server side that work together to connect your application to our APIs.
The Orga client SDKs integrate with any React-based framework. Pick your stack and follow along.

Next.js (Fastest Setup)

Our Next.js starter scaffolds a complete app with video, audio, and AI conversation already wired up.

npx @orga-ai/create-orga-next-app my-app
cd my-app
npm install
npm run devCode language: CSS (css)

Open localhost:3000. You’ll see a working demo: camera preview, voice input, AI responses. Customize the personality, plug in your logic, and ship.

Backend Proxy (Node)

To keep your API key secure, you’ll first need to create a small backend service. This backend acts as a proxy between your app and Orga. Its job is to fetch ICE servers and an ephemeral token from the Orga API. Your client SDK (React or React Native) will call this endpoint before establishing its own connection

import 'dotenv/config';
import express from 'express';
import cors from 'cors';
import { OrgaAI } from '@orga-ai/node';
const app = express();
app.use(cors()); const orga = new OrgaAI({
apiKey: process.env.ORGA_API_KEY!
});
app.get('/api/orga-client-secrets', async (_req, res) => {
try {
const { ephemeralToken, iceServers } = await orga.getSessionConfig();
res.json({ ephemeralToken, iceServers });
} catch (error) {
console.error('Failed to get session config:', error);
res.status(500).json({ error: 'Internal server error' });
}
});
app.listen(5000, () => console.log('Proxy running on http://localhost:5000'));Code language: JavaScript (javascript)

Once this backend endpoint is running, your frontend SDK can call it to retrieve the session configuration. No need to expose your API key in the client.

That’s it for the backend. Next, we’ll see how to call this endpoint from the frontend and establish the connection. 

React (Vite, CRA, or Your Own Setup)

Already have a React project? Drop in the SDK.

npm install @orga-ai/reactCode language: CSS (css)

When setting up the provider, this is where you tell the client SDK which backend endpoint to call. That endpoint returns the ephemeral token and the ICE servers, which the SDK then uses to establish a secure connection to Orga, without ever exposing your API key. 

'use client' 
import { OrgaAI, OrgaAIProvider } from '@orga-ai/react';
OrgaAI.init({
logLevel: 'debug',
model: 'orga-1-beta',
voice: 'alloy',
fetchSessionConfig: async () => {
const res = await fetch('http://localhost:5000/api/orga-client-secrets');
if (!res.ok) throw new Error('Failed to fetch session config');
const { ephemeralToken, iceServers } = await res.json();
return { ephemeralToken, iceServers };
},
});
export function OrgaClientProvider({ children }: { children: React.ReactNode }) {
return {children};
}Code language: JavaScript (javascript)

Then wrap your app with the provider:

import type { ReactNode } from 'react';
import { OrgaClientProvider } from './providers/OrgaClientProvider';
export default function RootLayout({ children }: { children: ReactNode }) {
return (
<html lang="en">
<body>
<OrgaClientProvider>{children}</OrgaClientProvider>
</body>
TypeScript
</html>
);
}Code language: JavaScript (javascript)

Now you’re ready to create your main file and import the useOrgaAI hook. It exposes everything you need: startSession(), endSession(), and real-time state like connectionState.

'use client'
 
import {
  useOrgaAI,
  OrgaVideo,
  OrgaAudio,
} from '@orga-ai/react';
 
export default function Home() {
  const {
    startSession,
    endSession,
    connectionState,
    toggleCamera,
    toggleMic,
    isCameraOn,
    isMicOn,
    userVideoStream,
    aiAudioStream,
  } = useOrgaAI();
 
  const isConnected = connectionState === 'connected';
  const isIdle = connectionState === 'disconnected';
 
  return (
    <main className="mx-auto flex max-w-2xl flex-col gap-6 p-8">
      <header>
        <h1 className="text-3xl font-bold">Orga React SDK Quick Start</h1>
        <p className="text-gray-600">Status: {connectionState}</p>
      </header>
 
      <section className="grid grid-cols-2 gap-4">
        <button
          className="rounded bg-blue-600 px-4 py-2 text-white disabled:opacity-50"
          disabled={!isIdle}
          onClick={() => startSession()}
        >
          Start Session
        </button>
        <button
          className="rounded bg-red-600 px-4 py-2 text-white disabled:opacity-50"
          disabled={!isConnected}
          onClick={() => endSession()}
        >
          End Session
        </button>
        <button
          className="rounded border px-4 py-2 disabled:opacity-50"
          disabled={!isConnected}
          onClick={toggleCamera}
        >
          {isCameraOn ? 'Camera On' : 'Camera Off'}
        </button>
        <button
          className="rounded border px-4 py-2 disabled:opacity-50"
          disabled={!isConnected}
          onClick={toggleMic}
        >
          {isMicOn ? 'Mic On' : 'Mic Off'}
        </button>
      </section>
 
      <OrgaVideo stream={userVideoStream} className="h-64 w-full rounded bg-black" />
      <OrgaAudio stream={aiAudioStream} />
    </main>
  );
}
Code language: JavaScript (javascript)

      

React Native (Expo)

Building for mobile? Same API, native performance. 

The setup is almost identical to the React example, with just a few differences in how you import and initialize the SDK.

Install the required dependencies:

npm install @orga-ai/react-native react-native-webrtc react-native-incall-managerCode language: CSS (css)

With Expo, ensure you update the app.json to request camera/mic access

{
  "expo": {
    "ios": {
      "infoPlist": {
        "NSCameraUsageDescription": "Allow $(PRODUCT_NAME) to access your camera",
        "NSMicrophoneUsageDescription": "Allow $(PRODUCT_NAME) to access your microphone"
      }
    },
    "android": {
      "permissions": [
        "android.permission.CAMERA",
        "android.permission.RECORD_AUDIO"
      ]
    }
  }
}
Code language: JSON / JSON with Comments (json)

When setting up the provider, this is where you tell the client SDK which backend endpoint to call. That endpoint returns the ephemeral token and the ICE servers, which the SDK then uses to establish a secure connection to Orga, without ever exposing your API key. 

import { Stack } from 'expo-router';
import { OrgaAI, OrgaAIProvider } from '@orga-ai/react-native';
 
OrgaAI.init({
  logLevel: 'debug',
  model: 'orga-1-beta',
  voice: 'alloy',
  fetchSessionConfig: async () => {
    const response = await fetch('http://localhost:5000/api/orga-client-secrets');
    if (!response.ok) throw new Error('Failed to fetch session config');
    const { ephemeralToken, iceServers } = await response.json();
    return { ephemeralToken, iceServers };
  },
});
 
export default function RootLayout() {
  return (
    <OrgaAIProvider>
      <Stack />
    </OrgaAIProvider>
  );
}
Code language: JavaScript (javascript)

    

Now you’re ready to create your main file and import the useOrgaAI hook. It exposes everything you need: startSession(), endSession(), and real-time state like connectionState.

import { StyleSheet, View } from 'react-native';
import {
  OrgaAICameraView,
  OrgaAIControls,
  useOrgaAI,
} from '@orga-ai/react-native';
 
export default function HomeScreen() {
  const {
    connectionState,
    isCameraOn,
    isMicOn,
    userVideoStream,
    startSession,
    endSession,
    toggleCamera,
    toggleMic,
    flipCamera,
  } = useOrgaAI();
 
  return (
    <View style={styles.container}>
      <OrgaAICameraView
        streamURL={userVideoStream?.toURL()}
        containerStyle={styles.cameraContainer}
        style={{ width: '100%', height: '100%' }}
      >
        <OrgaAIControls
          connectionState={connectionState}
          isCameraOn={isCameraOn}
          isMicOn={isMicOn}
          onStartSession={startSession}
          onEndSession={endSession}
          onToggleCamera={toggleCamera}
          onToggleMic={toggleMic}
          onFlipCamera={flipCamera}
        />
      </OrgaAICameraView>
    </View>
  );
}
 
const styles = StyleSheet.create({
  container: { flex: 1, backgroundColor: '#0f172a' },
  cameraContainer: { width: '100%', height: '100%' },
});
Code language: JavaScript (javascript)

Unlike React for web, building with React Native and Expo introduces a few mobile-specific details. WebRTC and audio require native access to the device’s camera and microphone, so you’ll need to configure permissions in your app.json.

 Note: Because Orga depends on native modules, the SDK will not run inside Expo Go. You’ll need to use a dev client. 

When developing locally, keep in mind that your backend endpoint must be reachable from your mobile device. Use your local network IP (e.g. http://192.168.x.x:5000) or a tunneling tool like ngrok when testing. 

For more detailed setup guides, troubleshooting steps, and in-depth examples for React, React Native, and Node, check out our full documentation. You’ll find everything you need to go from setup to production-ready.

What Can You Build?

Orga doesn’t sound like a robot. The voice is natural. The timing feels human. Users forget they’re talking to AI — so they speak freely and trust the answers. Here’s what that looks like:

1. Customer support. A user shows their router. The AI spots a red light on port 3 and says: “That port has a hardware fault. Try a different cable.” No more “describe your issue.” The AI just sees it.

2. Accessibility. A visually impaired user holds up their phone in a coffee shop. “What’s on the menu?” The AI reads the board aloud. Later: “Is my coffee ready?” It spots the cup with their name on the counter.

3. Field service. A technician points their phone at an unfamiliar control panel. “Where’s the reset switch?” The AI finds it behind a small cover on the left and walks them through the steps — hands-free.

If your users need to show something and talk about it, Orga was built for that.

Make It Yours with a System Prompt

Every Orga agent starts with a system prompt. It’s just text that tells the AI how to act. Want a friendly support rep? A strict safety checker? A patient tutor?

Just write it.

 You are a technician for HomeHelp.
Speak calmly. When users show a broken device, identify the model first, then guide them through the fix.
If you see exposed wires or water damage,
tell them to call a professional.

Start Building Today

Create a free account at platform.orga-ai.com to build and test a full prototype.

  1. Sign up at platform.orga-ai.com
  2. Grab your API key from the dashboard
  3. Run the starter command for your framework
  4. Explore the docs at docs.orga-ai.com

You’ve spent enough time wiring services together. Build the product instead.

What will you create?

Codemotion Collection Background
Top of the week
Our team’s picks

Want to find more articles like this? Check out the Top of the week collection, where you'll find a curated selection of fresh, new content just for you.

Share on:facebooktwitterlinkedinreddit

Tagged as:AI

Natalia de Pablo Garcia
A Great Programmer Removes, Doesn’t Add
Previous Post

Footer

Discover

  • Events
  • Community
  • Partners
  • Become a partner
  • Hackathons

Magazine

  • Tech articles

Talent

  • Discover talent
  • Jobs

Companies

  • Discover companies

For Business

  • Codemotion for companies

About

  • About us
  • Become a contributor
  • Work with us
  • Contact us

Follow Us

© Copyright Codemotion srl Via Marsala, 29/H, 00185 Roma P.IVA 12392791005 | Privacy policy | Terms and conditions