Let’s discover how we can create a genuine chat with a virtual assistant based on an AI model similar to ChatGPT, without having to communicate with a server but entirely within the browser!
Is it really possible to do this at zero cost entirely client-side with JavaScript? Will we actually get to something "similar to ChatGPT"? Are we heading towards a future with offline virtual assistants and total privacy control?
We’ll answer these questions with this tutorial dedicated to Transformers.js, focusing on creating a chatbot based on a real LLM model from Huggingface.
Practical Recipe for an AI-based Chatbot Integrated into a Web Page
Requirements
This recipe, like the previous one, is designed to be clear and accessible!
For example, we won’t use any bundler but a simple /public folder to be served with your favorite web server.
Ingredients
For our webapp, we’ll need three main ingredients corresponding to the four project files we’ll work on:
./public/
index.html
worker.js
app.js
style.css
Code language: PHP (php)
Minimalist UI for chat:
Inside our index.html we’ll have:
#chat-messages
where user, virtual assistant, and system messages will appear (with different styles)#chat-input-container
to send a message from the keyboard and with a button
A web worker (loaded as a module): it will contain our AI model and allow us to query it without blocking the main thread’s user interface.
The application logic of a classic chat: the user can send their messages, and so will the AI model.
Preparation
A Simple Chat
For this experiment, we’ll limit ourselves to a container to host the entire chat with a #chat-header
, #chat-messages
, and #chat-input-container
inside.
Let’s add everything we need to our index.html:
<div id="container">
<div id="chat-container">
<div id="chat-header">
<h2>My first LLM</h2>
</div>
<div id="chat-messages" class="chat-messages">
<!-- messages will appear here -->
</div>
<div id="chat-input-container">
<input type="text" id="chat-input" placeholder="Type your message...">
<button id="send-button" disabled>Send</button>
</div>
</div>
</div>
Code language: HTML, XML (xml)
Sending and Receiving Messages
Now let’s add to our app.js the system for sending messages written by the user in their box. We’ll hook into their sending to send the message to our AI… but first, let’s create a simple system for sending and receiving messages.
document.addEventListener("DOMContentLoaded", () => {
const sendButton = document.getElementById("send-button");
const chatInput = document.getElementById("chat-input");
const chatMessages = document.getElementById("chat-messages");
const disableUI = () => {
sendButton.setAttribute("disabled", true);
sendButton.innerText = "...";
};
const enableUI = () => {
sendButton.removeAttribute("disabled");
sendButton.innerText = "Send";
};
const chat = (text) => {
setTimeout(() => {
addMessage("Hello world", "assistant");
}, 1000);
};
const download = (modelURL) => {
disableUI();
setTimeout(() => {
addMessage(
'<small id="downloading-message">Downloading model...</small>',
"system"
);
}, 1000);
setTimeout(() => {
addMessage(
`<small>Model ready! More information here <a href="https://huggingface.co/${modelURL}" target="_blank">${modelURL}</a></small>`,
"system"
);
enableUI();
}, 2000);
};
const addMessage = (message, role) => {
const newMessageElement = document.createElement("div");
newMessageElement.classList.add("chat-message");
newMessageElement.classList.add(role);
newMessageElement.innerHTML = message;
chatMessages.appendChild(newMessageElement);
chatMessages.scrollTop = chatMessages.scrollHeight;
return newMessageElement;
};
const sendMessage = () => {
disableUI();
const question = chatInput.value;
addMessage(question, "user");
chat(question);
chatInput.value = "";
};
sendButton.addEventListener("click", sendMessage);
chatInput.addEventListener("keypress", (event) => {
if (event.key === "Enter") {
sendMessage();
}
});
download("HF_USER/HF_MODEL");
});
Code language: JavaScript (javascript)
With a pinch of CSS, it will take on the appearance and behavior of a classic chat!
body {
margin: 0;
font-family: system-ui;
}
a, a:visited, a:focus {
color: #ff5c00;
}
#container {
display: flex;
width: 100lvw;
height: 100lvh;
justify-content: center;
align-items: center;
background-color: #333;
}
#chat-container {
display: flex;
width: 60vw;
flex-direction: column;
max-width: 80%;
max-height: 80%;
background-color: white;
padding: 1rem;
border: 3px solid #666;
}
#chat-header {
display: flex;
justify-content: space-between;
}
#chat-header button {
color: #ff5c00;
background-color: transparent;
border: 0;
font-size: 3rem;
padding: 0;
margin: 0;
}
#chat-messages {
height: 50vh;
overflow-y: auto;
display: flex;
flex-direction: column;
}
#chat-input-container {
display: flex;
}
#chat-input {
width: 100%;
}
input[type=text], button {
font-size: 1rem;
border: 1px solid #ff5c00;
padding: 1rem;
margin: 1rem;
}
button {
background-color: #ff5c00;
color: white;
cursor: pointer;
}
button:disabled {
background-color: white;
cursor: wait;
color: #ff5c00;
border: 1px dashed #ff5c00;
}
input[type=text]:disabled {
background-color: white;
cursor: wait;
color: #ff5c00;
border: 1px dashed #ff5c00;
}
div.chat-message {
padding: 1rem;
margin-bottom: 1rem;
white-space: break-spaces;
width: 80%;
}
div.chat-message.user {
background-color: antiquewhite;
align-self: flex-end;
}
div.chat-message.assistant {
background-color: rgb(249, 205, 147);
align-self: flex-start;
}
div.chat-message.system {
margin: 0;
color: #666;
font-family: monospace;
padding: 0.5rem;
}
Code language: CSS (css)
The exterior part is ready! Can’t you already taste the result? 🤤
We’re ready for the actual communication system with our AI model!
Running an AI Model Inside a Web Worker 🌶️ 🌶️ 🌶️
We’ve reached the spiciest part of the recipe: creating a web worker to download and run the LLM model in the browser without blocking the main thread.
Don’t know Web Workers? It’s a great opportunity to try them!
In the app.js file, we’ll include the worker.js file as a module:
var aiWorker = new Worker('worker.js', {
type: "module"
});
Code language: JavaScript (javascript)
Let’s implement the two functions that will allow us to send messages to the web worker using the postMessage() method, replacing the fake ones:
// To send messages to the AI
const chat = (message) => {
aiWorker.postMessage({
action: "chat",
content: message,
});
};
// To load the AI model: happens only the first time
const download = (modelURL) => {
addMessage(
'<small id="downloading-message">Downloading model...</small>',
"system"
);
aiWorker.postMessage({
action: 'download',
modelURL: modelURL,
});
};
Code language: JavaScript (javascript)
To listen to the Web Worker’s responses, we simply need to add an event listener that will inform us of every message. Note that the event is always ‘message’, but the content passed to the callback will contain an object that you will define: in practice, you can invent your own protocol made of parameters and flags!
For this recipe, we only need to receive two types of messages:
- If the response contains the status property, then it’s the signal that the model is ready (i.e., the response to the message with action: ‘download’ sent by us once app.js is loaded)
- Otherwise, it’s the text generated by the model and contained in the result property
Let’s see our complete message reception system with the corresponding reactions of our UI:
aiWorker.addEventListener("message", (event) => {
const aiResponse = event.data;
if (aiResponse.status == "ready") {
addMessage(
`<small>Model ready! More information here <a href="https://huggingface.co/${aiResponse.modelURL}" target="_blank">${aiResponse.modelURL}</a></small>`,
"system"
);
} else {
const result = aiResponse.result;
addMessage(result, 'assistant');
enableUI();
}
});
// everything starts from this request!
download('Felladrin/onnx-Pythia-31M-Chat-v1');
Code language: JavaScript (javascript)
Yes, you read that right: we have a modelURL parameter! A few steps and we’ll discover what it’s for, but you can imagine it 🤓
Web Worker Stuffed with a Real Chatbot
Everything is ready to give intelligence to our virtual assistant based on text-generation models (LLM) loaded entirely in the browser thanks to Transformers.js!!!
First, let’s load the latest version of Transformers.js directly from a CDN service. Attention! 🔥 Don’t burn yourself: we can load this or other libraries in this way exclusively because we loaded the Web Worker with the type: “module” option 🙂↕️
import {
pipeline,
env,
} from "https://cdn.jsdelivr.net/npm/@xenova/transformers@2.17.1";
env.allowLocalModels = false; // we'll use remote models!
Code language: JavaScript (javascript)
Now all we have to do is implement our pipeline as in the examples of the official documentation but inside the web worker and when the app asks for it!
The downloadModel function will download the model files from Huggingface and finally create our generator which is a text-generation pipeline
You’ve surely noticed async and await! When we download the model, the Web Worker will wait for the download to complete and then notify our app that everything is ready with self.postMessage() with the status: “ready” property (which is exactly what our application logic is waiting for to activate the UI and thus be able to use the chat)
var generator;
const downloadModel = async (modelURL) => {
generator = await pipeline("text-generation", modelURL);
self.postMessage({
status: "ready",
task: "text-generation",
modelURL: modelURL,
});
};
Code language: JavaScript (javascript)
This is where the magic of text-generation models happens and their ability to seem “intelligent”: how exciting!
const generateResponse = async (content) => {
// text-generation models for chatbots take a chat as input
const messages = [
{
role: "system",
content: "You are a highly knowledgeable and friendly assistant.",
},
{
role: "user",
content: content,
},
];
// The chat messages with their roles are fed to a
// special tokenizer specific to that model that will transform them into vectors (embedding)
const textInput = generator.tokenizer.apply_chat_template(messages, {
tokenize: false,
add_generation_prompt: true,
});
// the pipeline in action! This is where we can pass many parameters to change the result of text generation
const output = await generator(textInput, {
max_new_tokens: 64,
do_sample: true,
});
// the conversation is returned to us in a model-specific format
// but by delving into the card on Huggingface we'll find all the information
// and we can extract the content of the last response.
// At the moment there is still no consensus on how a chat template should be, but to extract the last sentence (i.e. the AI's response) just cut what follows the last occurrence of the string `"assistant\n"` for example like this
const conversation = output[0].generated_text;
const start = conversation.lastIndexOf("assistant\n");
const lastMessage = conversation
.substr(start)
.replace("assistant\n", "");
// all ready to send the response generated by the AI
self.postMessage({
result: lastMessage,
});
};
Code language: PHP (php)
All that’s left is to amalgamate the Web Worker with requests from our app.js.
We had prepared everything previously to send two messages action: ‘download’ and action: ‘chat’ and here we do nothing but receive them and react accordingly!
self.addEventListener("message", (event) => {
const userRequest = event.data;
if (userRequest.action == "download") {
const modelURL = userRequest.modelURL;
downloadModel(modelURL);
} else if (userRequest.action == "chat") {
const content = userRequest.content;
generateResponse(content);
}
});
Code language: PHP (php)
Final Result and Observations
When you first interact with your chatbot, you might find its responses a bit limited or odd. This is normal for a small model, and it’s important to understand why:
- Model Size Matters: The model we used (Felladrin/onnx-Pythia-31M-Chat-v1) is very small, only about 31 million parameters. While this makes it quick to load and run in a browser, it significantly limits its capabilities.
- Improving Responses: You can tweak some parameters to potentially improve outputs:
const output = await generator(textInput, {
max_new_tokens: 1024, // Allows for longer responses
repetition_penalty: 1.2, // Reduces word repetition
do_sample: true,
});
Code language: JavaScript (javascript)
- Bigger Models, Better Results: For more coherent and capable responses, consider using larger models:
- Felladrin/onnx-TinyMistral-248M-Chat-v2 (248 million parameters)
- Xenova/Qwen1.5-0.5B-Chat (500 million parameters) These larger models will take longer to load but offer significantly improved performance.
Next Steps and Challenges
Now that you’ve built a basic AI chatbot, here are some ways to expand your project:
- Smooth Animations: Implement a typing animation for the chatbot’s responses to make the interaction feel more natural.
- Server-Side Integration: Create a Python backend to interact with even larger language models (7-8B parameters) for more advanced capabilities.
- Specialized Assistants: Adapt the chatbot for specific purposes, like creating an NPC (Non-Player Character) for a game.
- Explore Other AI Tasks: Try implementing computer vision or speech recognition models using Transformers.js. Look for models by the library’s author on Hugging Face for compatible options.
- UI Improvements: Enhance the chat interface with features like message history, user profiles, or theme customization.
- Error Handling and Robustness: Implement better error handling for model loading failures or network issues.
Remember, the field of AI and natural language processing is rapidly evolving. Keep experimenting, learning, and staying updated with the latest developments in transformer models and browser-based AI applications.
We hope you enjoyed this tutorial and found it valuable for understanding how to create AI-powered chatbots directly in the browser. Happy coding!