{"id":34801,"date":"2025-12-10T13:57:44","date_gmt":"2025-12-10T12:57:44","guid":{"rendered":"https:\/\/www.codemotion.com\/magazine\/?p=34801"},"modified":"2025-12-10T16:28:50","modified_gmt":"2025-12-10T15:28:50","slug":"build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes","status":"publish","type":"post","link":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/","title":{"rendered":"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes"},"content":{"rendered":"\n<p>So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it.<\/p>\n\n\n\n<p>You&#8217;ll need a speech-to-text API. A vision model. A language model. A text-to-speech service. WebRTC for real-time streaming. WebSockets for low-latency communication. Then you stitch them all together, pray the latency stays under two seconds, and debug the async chaos when audio and video fall out of sync.<\/p>\n\n\n\n<p><strong>We&#8217;ve been there. That&#8217;s exactly why we built Orga AI.<\/strong><\/p>\n\n\n\n<p>Unified SDKs. A seamless API flow. Vision, voice, and conversation \u2014 processed together in under 700 milliseconds. Support for 40+ languages out of the box. And yes, you can have it running in under 30 minutes.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-what-is-orga-ai\">What Is Orga AI?<\/h2>\n\n\n\n<p>Orga AI is a real-time conversational AI platform. Users turn on their camera, speak naturally, and show the AI what&#8217;s happening. The AI watches, listens, and responds with its voice \u2014 in their language.<\/p>\n\n\n\n<p>Here&#8217;s what that looks like:<br><strong>User <\/strong>(pointing their phone camera): &#8220;My smart hub won&#8217;t connect. The light keeps blinking orange.&#8221;<br><strong>Orga AI <\/strong>(watching the blinking pattern): &#8220;I see that orange blink \u2014 your hub lost its network settings. Show me the back and I&#8217;ll walk you through a reset.&#8221;<br>No typing. No screenshots. No &#8220;please describe your issue in detail.&#8221; The AI sees the problem and talks through the solution like a colleague would.<br>That&#8217;s the experience you can ship with Orga.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-getting-started-with-the-orga-sdks\"><strong>Getting Started with the Orga SDKs<\/strong><\/h2>\n\n\n\n<p>Orga provides a suite of SDKs from client to server side that work together to connect your application to our APIs.<br>The Orga client SDKs integrate with any React-based framework. Pick your stack and follow along.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-next-js-fastest-setup\">Next.js (Fastest Setup)<\/h3>\n\n\n\n<p>Our Next.js starter scaffolds a complete app with video, audio, and AI conversation already wired up.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-1\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">npx<\/span> <span class=\"hljs-keyword\">@orga-ai<\/span>\/create-orga-next-app my-app\ncd my-app\nnpm install\nnpm run dev<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-1\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Open<strong> localhost:3000.<\/strong> You&#8217;ll see a working demo: camera preview, voice input, AI responses. Customize the personality, plug in your logic, and ship.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\" id=\"h-backend-proxy-node\"><strong>Backend Proxy (Node)<\/strong><\/h3>\n\n\n\n<p>To keep your API key secure, you\u2019ll first need to create a small backend service. This backend acts as a proxy between your app and Orga. Its job is to fetch ICE servers and an ephemeral token from the Orga API. Your client SDK (React or React Native) will call this endpoint before establishing its own connection<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-2\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> <span class=\"hljs-string\">'dotenv\/config'<\/span>;\n<span class=\"hljs-keyword\">import<\/span> express <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'express'<\/span>;\n<span class=\"hljs-keyword\">import<\/span> cors <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'cors'<\/span>;\n<span class=\"hljs-keyword\">import<\/span> { OrgaAI } <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'@orga-ai\/node'<\/span>;\n<span class=\"hljs-keyword\">const<\/span> app = express();\napp.use(cors()); <span class=\"hljs-keyword\">const<\/span> orga = <span class=\"hljs-keyword\">new<\/span> OrgaAI({\n<span class=\"hljs-attr\">apiKey<\/span>: process.env.ORGA_API_KEY!\n});\napp.get(<span class=\"hljs-string\">'\/api\/orga-client-secrets'<\/span>, <span class=\"hljs-keyword\">async<\/span> (_req, res) =&gt; {\n<span class=\"hljs-keyword\">try<\/span> {\n<span class=\"hljs-keyword\">const<\/span> { ephemeralToken, iceServers } = <span class=\"hljs-keyword\">await<\/span> orga.getSessionConfig();\nres.json({ ephemeralToken, iceServers });\n} <span class=\"hljs-keyword\">catch<\/span> (error) {\n<span class=\"hljs-built_in\">console<\/span>.error(<span class=\"hljs-string\">'Failed to get session config:'<\/span>, error);\nres.status(<span class=\"hljs-number\">500<\/span>).json({ <span class=\"hljs-attr\">error<\/span>: <span class=\"hljs-string\">'Internal server error'<\/span> });\n}\n});\napp.listen(<span class=\"hljs-number\">5000<\/span>, () =&gt; <span class=\"hljs-built_in\">console<\/span>.log(<span class=\"hljs-string\">'Proxy running on http:\/\/localhost:5000'<\/span>));<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-2\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Once this backend endpoint is running, your frontend SDK can call it to retrieve the session configuration. No need to expose your API key in the client.<\/p>\n\n\n\n<p>That&#8217;s it for the backend. Next, we\u2019ll see how to call this endpoint from the frontend and establish the connection.&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>React (Vite, CRA, or Your Own Setup)<\/strong><\/h3>\n\n\n\n<p>Already have a React project? Drop in the SDK.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-3\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">npm<\/span> <span class=\"hljs-selector-tag\">install<\/span> <span class=\"hljs-keyword\">@orga-ai<\/span>\/react<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-3\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>When setting up the provider, this is where you tell the client SDK which backend endpoint to call. That endpoint returns the ephemeral token and the ICE servers, which the SDK then uses to establish a secure connection to Orga, without ever exposing your API key.&nbsp;<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-4\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-string\">'use client'<\/span> \n<span class=\"hljs-keyword\">import<\/span> { OrgaAI, OrgaAIProvider } <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'@orga-ai\/react'<\/span>;\nOrgaAI.init({\n<span class=\"hljs-attr\">logLevel<\/span>: <span class=\"hljs-string\">'debug'<\/span>,\n<span class=\"hljs-attr\">model<\/span>: <span class=\"hljs-string\">'orga-1-beta'<\/span>,\n<span class=\"hljs-attr\">voice<\/span>: <span class=\"hljs-string\">'alloy'<\/span>,\n<span class=\"hljs-attr\">fetchSessionConfig<\/span>: <span class=\"hljs-keyword\">async<\/span> () =&gt; {\n<span class=\"hljs-keyword\">const<\/span> res = <span class=\"hljs-keyword\">await<\/span> fetch(<span class=\"hljs-string\">'http:\/\/localhost:5000\/api\/orga-client-secrets'<\/span>);\n<span class=\"hljs-keyword\">if<\/span> (!res.ok) <span class=\"hljs-keyword\">throw<\/span> <span class=\"hljs-keyword\">new<\/span> <span class=\"hljs-built_in\">Error<\/span>(<span class=\"hljs-string\">'Failed to fetch session config'<\/span>);\n<span class=\"hljs-keyword\">const<\/span> { ephemeralToken, iceServers } = <span class=\"hljs-keyword\">await<\/span> res.json();\n<span class=\"hljs-keyword\">return<\/span> { ephemeralToken, iceServers };\n},\n});\n<span class=\"hljs-keyword\">export<\/span> <span class=\"hljs-function\"><span class=\"hljs-keyword\">function<\/span> <span class=\"hljs-title\">OrgaClientProvider<\/span>(<span class=\"hljs-params\">{ children }: { children: React.ReactNode }<\/span>) <\/span>{\n<span class=\"hljs-keyword\">return<\/span> {children};\n}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-4\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Then wrap your app with the provider:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-5\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> type { ReactNode } <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'react'<\/span>;\n<span class=\"hljs-keyword\">import<\/span> { OrgaClientProvider } <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'.\/providers\/OrgaClientProvider'<\/span>;\n<span class=\"hljs-keyword\">export<\/span> <span class=\"hljs-keyword\">default<\/span> <span class=\"hljs-function\"><span class=\"hljs-keyword\">function<\/span> <span class=\"hljs-title\">RootLayout<\/span>(<span class=\"hljs-params\">{ children }: { children: ReactNode }<\/span>) <\/span>{\n<span class=\"hljs-keyword\">return<\/span> (\n<span class=\"xml\"><span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">html<\/span> <span class=\"hljs-attr\">lang<\/span>=<span class=\"hljs-string\">\"en\"<\/span>&gt;<\/span>\n<span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">body<\/span>&gt;<\/span>\n<span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">OrgaClientProvider<\/span>&gt;<\/span>{children}<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">OrgaClientProvider<\/span>&gt;<\/span>\n<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">body<\/span>&gt;<\/span>\nTypeScript\n<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">html<\/span>&gt;<\/span><\/span>\n);\n}<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-5\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Now you\u2019re ready to create your main file and import the <strong>useOrgaAI<\/strong> hook. It exposes everything you need: <strong>startSession()<\/strong>, <strong>endSession()<\/strong>, and real-time state like <strong>connectionState<\/strong>.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-6\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-string\">'use client'<\/span>\n \n<span class=\"hljs-keyword\">import<\/span> {\n  useOrgaAI,\n  OrgaVideo,\n  OrgaAudio,\n} <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'@orga-ai\/react'<\/span>;\n \n<span class=\"hljs-keyword\">export<\/span> <span class=\"hljs-keyword\">default<\/span> <span class=\"hljs-function\"><span class=\"hljs-keyword\">function<\/span> <span class=\"hljs-title\">Home<\/span>(<span class=\"hljs-params\"><\/span>) <\/span>{\n  <span class=\"hljs-keyword\">const<\/span> {\n    startSession,\n    endSession,\n    connectionState,\n    toggleCamera,\n    toggleMic,\n    isCameraOn,\n    isMicOn,\n    userVideoStream,\n    aiAudioStream,\n  } = useOrgaAI();\n \n  <span class=\"hljs-keyword\">const<\/span> isConnected = connectionState === <span class=\"hljs-string\">'connected'<\/span>;\n  <span class=\"hljs-keyword\">const<\/span> isIdle = connectionState === <span class=\"hljs-string\">'disconnected'<\/span>;\n \n  <span class=\"hljs-keyword\">return<\/span> (\n    <span class=\"xml\"><span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">main<\/span> <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"mx-auto flex max-w-2xl flex-col gap-6 p-8\"<\/span>&gt;<\/span>\n      <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">header<\/span>&gt;<\/span>\n        <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">h1<\/span> <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"text-3xl font-bold\"<\/span>&gt;<\/span>Orga React SDK Quick Start<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">h1<\/span>&gt;<\/span>\n        <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">p<\/span> <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"text-gray-600\"<\/span>&gt;<\/span>Status: {connectionState}<span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">p<\/span>&gt;<\/span>\n      <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">header<\/span>&gt;<\/span>\n \n      <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">section<\/span> <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"grid grid-cols-2 gap-4\"<\/span>&gt;<\/span>\n        <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">button<\/span>\n          <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"rounded bg-blue-600 px-4 py-2 text-white disabled:opacity-50\"<\/span>\n          <span class=\"hljs-attr\">disabled<\/span>=<span class=\"hljs-string\">{!isIdle}<\/span>\n          <span class=\"hljs-attr\">onClick<\/span>=<span class=\"hljs-string\">{()<\/span> =&gt;<\/span> startSession()}\n        &gt;\n          Start Session\n        <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">button<\/span>&gt;<\/span>\n        <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">button<\/span>\n          <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"rounded bg-red-600 px-4 py-2 text-white disabled:opacity-50\"<\/span>\n          <span class=\"hljs-attr\">disabled<\/span>=<span class=\"hljs-string\">{!isConnected}<\/span>\n          <span class=\"hljs-attr\">onClick<\/span>=<span class=\"hljs-string\">{()<\/span> =&gt;<\/span> endSession()}\n        &gt;\n          End Session\n        <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">button<\/span>&gt;<\/span>\n        <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">button<\/span>\n          <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"rounded border px-4 py-2 disabled:opacity-50\"<\/span>\n          <span class=\"hljs-attr\">disabled<\/span>=<span class=\"hljs-string\">{!isConnected}<\/span>\n          <span class=\"hljs-attr\">onClick<\/span>=<span class=\"hljs-string\">{toggleCamera}<\/span>\n        &gt;<\/span>\n          {isCameraOn ? 'Camera On' : 'Camera Off'}\n        <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">button<\/span>&gt;<\/span>\n        <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">button<\/span>\n          <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"rounded border px-4 py-2 disabled:opacity-50\"<\/span>\n          <span class=\"hljs-attr\">disabled<\/span>=<span class=\"hljs-string\">{!isConnected}<\/span>\n          <span class=\"hljs-attr\">onClick<\/span>=<span class=\"hljs-string\">{toggleMic}<\/span>\n        &gt;<\/span>\n          {isMicOn ? 'Mic On' : 'Mic Off'}\n        <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">button<\/span>&gt;<\/span>\n      <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">section<\/span>&gt;<\/span>\n \n      <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">OrgaVideo<\/span> <span class=\"hljs-attr\">stream<\/span>=<span class=\"hljs-string\">{userVideoStream}<\/span> <span class=\"hljs-attr\">className<\/span>=<span class=\"hljs-string\">\"h-64 w-full rounded bg-black\"<\/span> \/&gt;<\/span>\n      <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">OrgaAudio<\/span> <span class=\"hljs-attr\">stream<\/span>=<span class=\"hljs-string\">{aiAudioStream}<\/span> \/&gt;<\/span>\n    <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">main<\/span>&gt;<\/span><\/span>\n  );\n}\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-6\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>React Native (Expo)<\/strong><\/h3>\n\n\n\n<p>Building for mobile? Same API, native performance.&nbsp;<\/p>\n\n\n\n<p>The setup is almost identical to the React example, with just a few differences in how you import and initialize the SDK.<\/p>\n\n\n\n<p>Install the required dependencies:<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-7\" data-shcb-language-name=\"CSS\" data-shcb-language-slug=\"css\"><span><code class=\"hljs language-css\"><span class=\"hljs-selector-tag\">npm<\/span> <span class=\"hljs-selector-tag\">install<\/span> <span class=\"hljs-keyword\">@orga-ai<\/span>\/react-native react-native-webrtc react-native-incall-manager<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-7\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">CSS<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">css<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>With Expo, ensure you update the <strong>app.json<\/strong> to request camera\/mic access<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-8\" data-shcb-language-name=\"JSON \/ JSON with Comments\" data-shcb-language-slug=\"json\"><span><code class=\"hljs language-json\">{\n  <span class=\"hljs-attr\">\"expo\"<\/span>: {\n    <span class=\"hljs-attr\">\"ios\"<\/span>: {\n      <span class=\"hljs-attr\">\"infoPlist\"<\/span>: {\n        <span class=\"hljs-attr\">\"NSCameraUsageDescription\"<\/span>: <span class=\"hljs-string\">\"Allow $(PRODUCT_NAME) to access your camera\"<\/span>,\n        <span class=\"hljs-attr\">\"NSMicrophoneUsageDescription\"<\/span>: <span class=\"hljs-string\">\"Allow $(PRODUCT_NAME) to access your microphone\"<\/span>\n      }\n    },\n    <span class=\"hljs-attr\">\"android\"<\/span>: {\n      <span class=\"hljs-attr\">\"permissions\"<\/span>: &#91;\n        <span class=\"hljs-string\">\"android.permission.CAMERA\"<\/span>,\n        <span class=\"hljs-string\">\"android.permission.RECORD_AUDIO\"<\/span>\n      ]\n    }\n  }\n}\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-8\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JSON \/ JSON with Comments<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">json<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>When setting up the provider, this is where you tell the client SDK which backend endpoint to call. That endpoint returns the ephemeral token and the ICE servers, which the SDK then uses to establish a secure connection to Orga, without ever exposing your API key.&nbsp;<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-9\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> { Stack } <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'expo-router'<\/span>;\n<span class=\"hljs-keyword\">import<\/span> { OrgaAI, OrgaAIProvider } <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'@orga-ai\/react-native'<\/span>;\n \nOrgaAI.init({\n  <span class=\"hljs-attr\">logLevel<\/span>: <span class=\"hljs-string\">'debug'<\/span>,\n  <span class=\"hljs-attr\">model<\/span>: <span class=\"hljs-string\">'orga-1-beta'<\/span>,\n  <span class=\"hljs-attr\">voice<\/span>: <span class=\"hljs-string\">'alloy'<\/span>,\n  <span class=\"hljs-attr\">fetchSessionConfig<\/span>: <span class=\"hljs-keyword\">async<\/span> () =&gt; {\n    <span class=\"hljs-keyword\">const<\/span> response = <span class=\"hljs-keyword\">await<\/span> fetch(<span class=\"hljs-string\">'http:\/\/localhost:5000\/api\/orga-client-secrets'<\/span>);\n    <span class=\"hljs-keyword\">if<\/span> (!response.ok) <span class=\"hljs-keyword\">throw<\/span> <span class=\"hljs-keyword\">new<\/span> <span class=\"hljs-built_in\">Error<\/span>(<span class=\"hljs-string\">'Failed to fetch session config'<\/span>);\n    <span class=\"hljs-keyword\">const<\/span> { ephemeralToken, iceServers } = <span class=\"hljs-keyword\">await<\/span> response.json();\n    <span class=\"hljs-keyword\">return<\/span> { ephemeralToken, iceServers };\n  },\n});\n \n<span class=\"hljs-keyword\">export<\/span> <span class=\"hljs-keyword\">default<\/span> <span class=\"hljs-function\"><span class=\"hljs-keyword\">function<\/span> <span class=\"hljs-title\">RootLayout<\/span>(<span class=\"hljs-params\"><\/span>) <\/span>{\n  <span class=\"hljs-keyword\">return<\/span> (\n    <span class=\"xml\"><span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">OrgaAIProvider<\/span>&gt;<\/span>\n      <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">Stack<\/span> \/&gt;<\/span>\n    <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">OrgaAIProvider<\/span>&gt;<\/span><\/span>\n  );\n}\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-9\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p><\/p>\n\n\n\n<p>&nbsp;&nbsp;&nbsp;&nbsp;<\/p>\n\n\n\n<p>Now you\u2019re ready to create your main file and import the <strong>useOrgaAI<\/strong> hook. It exposes everything you need: <strong>startSession()<\/strong>, <strong>endSession()<\/strong>, and real-time state like <strong>connectionState<\/strong>.<\/p>\n\n\n<pre class=\"wp-block-code\" aria-describedby=\"shcb-language-10\" data-shcb-language-name=\"JavaScript\" data-shcb-language-slug=\"javascript\"><span><code class=\"hljs language-javascript\"><span class=\"hljs-keyword\">import<\/span> { StyleSheet, View } <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'react-native'<\/span>;\n<span class=\"hljs-keyword\">import<\/span> {\n  OrgaAICameraView,\n  OrgaAIControls,\n  useOrgaAI,\n} <span class=\"hljs-keyword\">from<\/span> <span class=\"hljs-string\">'@orga-ai\/react-native'<\/span>;\n \n<span class=\"hljs-keyword\">export<\/span> <span class=\"hljs-keyword\">default<\/span> <span class=\"hljs-function\"><span class=\"hljs-keyword\">function<\/span> <span class=\"hljs-title\">HomeScreen<\/span>(<span class=\"hljs-params\"><\/span>) <\/span>{\n  <span class=\"hljs-keyword\">const<\/span> {\n    connectionState,\n    isCameraOn,\n    isMicOn,\n    userVideoStream,\n    startSession,\n    endSession,\n    toggleCamera,\n    toggleMic,\n    flipCamera,\n  } = useOrgaAI();\n \n  <span class=\"hljs-keyword\">return<\/span> (\n    <span class=\"xml\"><span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">View<\/span> <span class=\"hljs-attr\">style<\/span>=<span class=\"hljs-string\">{styles.container}<\/span>&gt;<\/span>\n      <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">OrgaAICameraView<\/span>\n        <span class=\"hljs-attr\">streamURL<\/span>=<span class=\"hljs-string\">{userVideoStream?.toURL()}<\/span>\n        <span class=\"hljs-attr\">containerStyle<\/span>=<span class=\"hljs-string\">{styles.cameraContainer}<\/span>\n        <span class=\"hljs-attr\">style<\/span>=<span class=\"hljs-string\">{{<\/span> <span class=\"hljs-attr\">width:<\/span> '<span class=\"hljs-attr\">100<\/span>%', <span class=\"hljs-attr\">height:<\/span> '<span class=\"hljs-attr\">100<\/span>%' }}\n      &gt;<\/span>\n        <span class=\"hljs-tag\">&lt;<span class=\"hljs-name\">OrgaAIControls<\/span>\n          <span class=\"hljs-attr\">connectionState<\/span>=<span class=\"hljs-string\">{connectionState}<\/span>\n          <span class=\"hljs-attr\">isCameraOn<\/span>=<span class=\"hljs-string\">{isCameraOn}<\/span>\n          <span class=\"hljs-attr\">isMicOn<\/span>=<span class=\"hljs-string\">{isMicOn}<\/span>\n          <span class=\"hljs-attr\">onStartSession<\/span>=<span class=\"hljs-string\">{startSession}<\/span>\n          <span class=\"hljs-attr\">onEndSession<\/span>=<span class=\"hljs-string\">{endSession}<\/span>\n          <span class=\"hljs-attr\">onToggleCamera<\/span>=<span class=\"hljs-string\">{toggleCamera}<\/span>\n          <span class=\"hljs-attr\">onToggleMic<\/span>=<span class=\"hljs-string\">{toggleMic}<\/span>\n          <span class=\"hljs-attr\">onFlipCamera<\/span>=<span class=\"hljs-string\">{flipCamera}<\/span>\n        \/&gt;<\/span>\n      <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">OrgaAICameraView<\/span>&gt;<\/span>\n    <span class=\"hljs-tag\">&lt;\/<span class=\"hljs-name\">View<\/span>&gt;<\/span><\/span>\n  );\n}\n \n<span class=\"hljs-keyword\">const<\/span> styles = StyleSheet.create({\n  <span class=\"hljs-attr\">container<\/span>: { <span class=\"hljs-attr\">flex<\/span>: <span class=\"hljs-number\">1<\/span>, <span class=\"hljs-attr\">backgroundColor<\/span>: <span class=\"hljs-string\">'#0f172a'<\/span> },\n  <span class=\"hljs-attr\">cameraContainer<\/span>: { <span class=\"hljs-attr\">width<\/span>: <span class=\"hljs-string\">'100%'<\/span>, <span class=\"hljs-attr\">height<\/span>: <span class=\"hljs-string\">'100%'<\/span> },\n});\n<\/code><\/span><small class=\"shcb-language\" id=\"shcb-language-10\"><span class=\"shcb-language__label\">Code language:<\/span> <span class=\"shcb-language__name\">JavaScript<\/span> <span class=\"shcb-language__paren\">(<\/span><span class=\"shcb-language__slug\">javascript<\/span><span class=\"shcb-language__paren\">)<\/span><\/small><\/pre>\n\n\n<p>Unlike React for web, building with React Native and Expo introduces a few mobile-specific details. WebRTC and audio require native access to the device\u2019s camera and microphone, so you\u2019ll need to configure permissions in your <strong>app.json.<\/strong><\/p>\n\n\n\n<p><strong>&nbsp;Note<\/strong>: Because Orga depends on native modules, the SDK will not run inside Expo Go. You\u2019ll need to use a dev client.&nbsp;<\/p>\n\n\n\n<p>When developing locally, keep in mind that your backend endpoint must be reachable from your mobile device. Use your local network IP (e.g. http:\/\/192.168.x.x:5000) or a tunneling tool like ngrok when testing.&nbsp;<\/p>\n\n\n\n<p>For more detailed setup guides, troubleshooting steps, and in-depth examples for React, React Native, and Node, check out our full documentation. You\u2019ll find everything you need to go from setup to production-ready.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\"><strong>What Can You Build?<\/strong><\/h2>\n\n\n\n<p>Orga doesn&#8217;t sound like a robot. The voice is natural. The timing feels human. Users forget they&#8217;re talking to AI \u2014 so they speak freely and trust the answers. Here&#8217;s what that looks like:<\/p>\n\n\n\n<p><strong>1. Customer support.<\/strong> A user shows their router. The AI spots a red light on port 3 and says: &#8220;That port has a hardware fault. Try a different cable.&#8221; No more &#8220;describe your issue.&#8221; The AI just sees it.<\/p>\n\n\n\n<p><strong>2. Accessibility.<\/strong> A visually impaired user holds up their phone in a coffee shop. &#8220;What&#8217;s on the menu?&#8221; The AI reads the board aloud. Later: &#8220;Is my coffee ready?&#8221; It spots the cup with their name on the counter.<\/p>\n\n\n\n<p><strong>3. Field service.<\/strong> A technician points their phone at an unfamiliar control panel. &#8220;Where&#8217;s the reset switch?&#8221; The AI finds it behind a small cover on the left and walks them through the steps \u2014 hands-free.<\/p>\n\n\n\n<p>If your users need to show something and talk about it, Orga was built for that.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\"><strong>Make It Yours with a System Prompt<\/strong><\/h3>\n\n\n\n<p>Every Orga agent starts with a system prompt. It&#8217;s just text that tells the AI how to act. Want a friendly support rep? A strict safety checker? A patient tutor?<br><br><strong>Just write it.<\/strong><\/p>\n\n\n\n<p><em>&nbsp;You are a technician for HomeHelp.<br>Speak calmly. When users show a broken device, identify the model first, then guide them through the fix.<br>If you see exposed wires or water damage,<br>tell them to call a professional.<\/em><\/p>\n\n\n\n<h2 class=\"wp-block-heading\" id=\"h-start-building-today\"><strong>Start Building Today<\/strong><\/h2>\n\n\n\n<p>Create a free account at <strong>platform.orga-ai.com<\/strong> to build and test a full prototype.<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Sign up at <a href=\"http:\/\/platform.orga-ai.com\">platform.orga-ai.com<\/a><\/li>\n\n\n\n<li>Grab your API key from the dashboard<\/li>\n\n\n\n<li>Run the starter command for your framework<\/li>\n\n\n\n<li>Explore the docs at <a href=\"http:\/\/docs.orga-ai.com\">docs.orga-ai.com<\/a><\/li>\n<\/ol>\n\n\n\n<p>You&#8217;ve spent enough time wiring services together. Build the product instead.<\/p>\n\n\n\n<p><strong>What will you create?<\/strong><\/p>\n","protected":false},"excerpt":{"rendered":"<p>So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it. You&#8217;ll need a speech-to-text API. A vision model. A language model. A text-to-speech service. WebRTC for real-time streaming. WebSockets for low-latency communication. Then you stitch them all together,&#8230; <a class=\"more-link\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/\">Read more<\/a><\/p>\n","protected":false},"author":177,"featured_media":34839,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","_uag_custom_page_level_css":"","_genesis_hide_title":false,"_genesis_hide_breadcrumbs":false,"_genesis_hide_singular_image":false,"_genesis_hide_footer_widgets":false,"_genesis_custom_body_class":"","_genesis_custom_post_class":"","_genesis_layout":"","footnotes":""},"categories":[46],"tags":[10003],"collections":[11387],"class_list":{"0":"post-34801","1":"post","2":"type-post","3":"status-publish","4":"format-standard","5":"has-post-thumbnail","7":"category-ai-ml","8":"tag-ai","9":"collections-top-of-the-week","10":"entry"},"yoast_head":"<!-- This site is optimized with the Yoast SEO Premium plugin v26.9 (Yoast SEO v26.9) - https:\/\/yoast.com\/product\/yoast-seo-premium-wordpress\/ -->\n<title>Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes - Codemotion Magazine<\/title>\n<meta name=\"description\" content=\"So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes\" \/>\n<meta property=\"og:description\" content=\"So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/\" \/>\n<meta property=\"og:site_name\" content=\"Codemotion Magazine\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/Codemotion.Italy\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-12-10T12:57:44+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-10T15:28:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png\" \/>\n\t<meta property=\"og:image:width\" content=\"1920\" \/>\n\t<meta property=\"og:image:height\" content=\"1080\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/png\" \/>\n<meta name=\"author\" content=\"Natalia de Pablo Garcia\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:creator\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:site\" content=\"@CodemotionIT\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Natalia de Pablo Garcia\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"5 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"Article\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#article\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/\"},\"author\":{\"name\":\"Natalia de Pablo Garcia\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/2450f8e4083152e4feaea1ada456aeee\"},\"headline\":\"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes\",\"datePublished\":\"2025-12-10T12:57:44+00:00\",\"dateModified\":\"2025-12-10T15:28:50+00:00\",\"mainEntityOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/\"},\"wordCount\":1105,\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png\",\"keywords\":[\"AI\"],\"articleSection\":[\"AI\/ML\"],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/\",\"name\":\"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes - Codemotion Magazine\",\"isPartOf\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\"},\"primaryImageOfPage\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#primaryimage\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#primaryimage\"},\"thumbnailUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png\",\"datePublished\":\"2025-12-10T12:57:44+00:00\",\"dateModified\":\"2025-12-10T15:28:50+00:00\",\"description\":\"So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it.\",\"breadcrumb\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/\"]}]},{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#primaryimage\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png\",\"width\":1920,\"height\":1080},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"AI\/ML\",\"item\":\"https:\/\/www.codemotion.com\/magazine\/ai-ml\/\"},{\"@type\":\"ListItem\",\"position\":3,\"name\":\"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#website\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"name\":\"Codemotion Magazine\",\"description\":\"We code the future. Together\",\"publisher\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\"},\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"},{\"@type\":\"Organization\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#organization\",\"name\":\"Codemotion\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/\",\"logo\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png\",\"width\":225,\"height\":225,\"caption\":\"Codemotion\"},\"image\":{\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/\"},\"sameAs\":[\"https:\/\/www.facebook.com\/Codemotion.Italy\/\",\"https:\/\/x.com\/CodemotionIT\"]},{\"@type\":\"Person\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/2450f8e4083152e4feaea1ada456aeee\",\"name\":\"Natalia de Pablo Garcia\",\"image\":{\"@type\":\"ImageObject\",\"inLanguage\":\"en-US\",\"@id\":\"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/\",\"url\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/11\/Untitled-design-100x100.jpg\",\"contentUrl\":\"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/11\/Untitled-design-100x100.jpg\",\"caption\":\"Natalia de Pablo Garcia\"},\"sameAs\":[\"www.linkedin.com\/in\/nataliadepablo\"],\"url\":\"https:\/\/www.codemotion.com\/magazine\/author\/natalia-de-pablo-garcia\/\"}]}<\/script>\n<!-- \/ Yoast SEO Premium plugin. -->","yoast_head_json":{"title":"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes - Codemotion Magazine","description":"So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/","og_locale":"en_US","og_type":"article","og_title":"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes","og_description":"So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it.","og_url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/","og_site_name":"Codemotion Magazine","article_publisher":"https:\/\/www.facebook.com\/Codemotion.Italy\/","article_published_time":"2025-12-10T12:57:44+00:00","article_modified_time":"2025-12-10T15:28:50+00:00","og_image":[{"width":1920,"height":1080,"url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png","type":"image\/png"}],"author":"Natalia de Pablo Garcia","twitter_card":"summary_large_image","twitter_creator":"@CodemotionIT","twitter_site":"@CodemotionIT","twitter_misc":{"Written by":"Natalia de Pablo Garcia","Est. reading time":"5 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"Article","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#article","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/"},"author":{"name":"Natalia de Pablo Garcia","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/2450f8e4083152e4feaea1ada456aeee"},"headline":"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes","datePublished":"2025-12-10T12:57:44+00:00","dateModified":"2025-12-10T15:28:50+00:00","mainEntityOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/"},"wordCount":1105,"publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png","keywords":["AI"],"articleSection":["AI\/ML"],"inLanguage":"en-US"},{"@type":"WebPage","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/","url":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/","name":"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes - Codemotion Magazine","isPartOf":{"@id":"https:\/\/www.codemotion.com\/magazine\/#website"},"primaryImageOfPage":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#primaryimage"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#primaryimage"},"thumbnailUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png","datePublished":"2025-12-10T12:57:44+00:00","dateModified":"2025-12-10T15:28:50+00:00","description":"So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it.","breadcrumb":{"@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#primaryimage","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png","width":1920,"height":1080},{"@type":"BreadcrumbList","@id":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/build-ai-apps-that-see-hear-and-talk-back-in-under-30-minutes\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/www.codemotion.com\/magazine\/"},{"@type":"ListItem","position":2,"name":"AI\/ML","item":"https:\/\/www.codemotion.com\/magazine\/ai-ml\/"},{"@type":"ListItem","position":3,"name":"Build AI Apps That See, Hear, and Talk Back \u2014 In Under 30 Minutes"}]},{"@type":"WebSite","@id":"https:\/\/www.codemotion.com\/magazine\/#website","url":"https:\/\/www.codemotion.com\/magazine\/","name":"Codemotion Magazine","description":"We code the future. Together","publisher":{"@id":"https:\/\/www.codemotion.com\/magazine\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/www.codemotion.com\/magazine\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/www.codemotion.com\/magazine\/#organization","name":"Codemotion","url":"https:\/\/www.codemotion.com\/magazine\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2019\/11\/codemotionlogo.png","width":225,"height":225,"caption":"Codemotion"},"image":{"@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/Codemotion.Italy\/","https:\/\/x.com\/CodemotionIT"]},{"@type":"Person","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/2450f8e4083152e4feaea1ada456aeee","name":"Natalia de Pablo Garcia","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/www.codemotion.com\/magazine\/#\/schema\/person\/image\/","url":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/11\/Untitled-design-100x100.jpg","contentUrl":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2023\/11\/Untitled-design-100x100.jpg","caption":"Natalia de Pablo Garcia"},"sameAs":["www.linkedin.com\/in\/nataliadepablo"],"url":"https:\/\/www.codemotion.com\/magazine\/author\/natalia-de-pablo-garcia\/"}]}},"featured_image_src":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-600x400.png","featured_image_src_square":"https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-600x600.png","author_info":{"display_name":"Natalia de Pablo Garcia","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/natalia-de-pablo-garcia\/"},"uagb_featured_image_src":{"full":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png",1920,1080,false],"thumbnail":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-150x150.png",150,150,true],"medium":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-300x169.png",300,169,true],"medium_large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-768x432.png",768,432,true],"large":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-1024x576.png",1024,576,true],"1536x1536":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-1536x864.png",1536,864,true],"2048x2048":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080.png",1920,1080,false],"small-home-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-100x100.png",100,100,true],"sidebar-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-180x128.png",180,128,true],"genesis-singular-images":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-896x504.png",896,504,true],"archive-featured":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-400x225.png",400,225,true],"gb-block-post-grid-landscape":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-600x400.png",600,400,true],"gb-block-post-grid-square":["https:\/\/www.codemotion.com\/magazine\/wp-content\/uploads\/2025\/12\/orga_image_1920x1080-600x600.png",600,600,true]},"uagb_author_info":{"display_name":"Natalia de Pablo Garcia","author_link":"https:\/\/www.codemotion.com\/magazine\/author\/natalia-de-pablo-garcia\/"},"uagb_comment_info":0,"uagb_excerpt":"So, you want to build an AI that watches a video feed, listens to users, and responds with natural speech. Sounds cool, right? Now try building it. You&#8217;ll need a speech-to-text API. A vision model. A language model. A text-to-speech service. WebRTC for real-time streaming. WebSockets for low-latency communication. Then you stitch them all together,&#8230;&hellip;","lang":"en","_links":{"self":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/34801","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/users\/177"}],"replies":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/comments?post=34801"}],"version-history":[{"count":2,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/34801\/revisions"}],"predecessor-version":[{"id":34826,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/posts\/34801\/revisions\/34826"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media\/34839"}],"wp:attachment":[{"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/media?parent=34801"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/categories?post=34801"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/tags?post=34801"},{"taxonomy":"collections","embeddable":true,"href":"https:\/\/www.codemotion.com\/magazine\/wp-json\/wp\/v2\/collections?post=34801"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}