Mocktalk · Founding Member · 2024
Real-time AI Interview Coaching from Scratch
Role
Founding Software Engineer
Timeline
May 2023 - Jun 2024
Location
San Diego, CA
Stack
Overview
What is Mocktalk
Mocktalk is an AI-powered mock interview platform. You talk to an AI interviewer in real time, just like a real technical interview. It listens, responds, asks follow-ups, and evaluates your answers with contextual code awareness.
As a founding engineer on a 4-person team, I architected the entire product from zero. The platform served 300+ users across 2,500+ sessions.
Problem
No Tool Simulates Real Interview Pressure
Interview prep is broken. You either pay $100+ per hour for a human mock interviewer, or you practice alone with no feedback. Existing tools offer text-based Q&A or pre-recorded questions. None of them simulate the real pressure of a live technical conversation.
The core gap is interactivity. Real interviews are dynamic: the interviewer reacts to your answers, probes weaknesses, and adjusts difficulty on the fly.
Research
Conversations with Job Seekers
We talked to 50+ CS students and recent graduates actively preparing for technical interviews. The frustration was universal: they knew the theory, but froze when speaking under pressure. Reading LeetCode solutions and practicing alone did not prepare them for the conversational flow of a real interview.
We also looked at the competitive landscape. Pramp offered peer matching but had inconsistent quality. Interviewing.io was expensive and limited in availability. No product combined real-time voice, adaptive questioning, and instant feedback in a single session.
The insight: people fail interviews not from lack of knowledge, but from inability to communicate thinking under pressure.
Frontend
Code Editor and Shared Component Library
I owned the React frontend and built a real-time code editor with WebSocket-based context injection, enabling AI feedback as candidates typed. The interview UI, dashboard, analytics views, and session history all share a component library I built from scratch.
Component reusability cut production time by 20%. The same card, modal, and form components work across the interview flow, the post-session review, and the admin dashboard.
AI & Real-time
Two-Second Voice Loop
The AI interviewer runs on a WebSocket connection that streams audio bidirectionally. Your speech hits Azure AI Speech for transcription, goes to GPT-4 Turbo for response generation, and comes back through ElevenLabs or LiveKit as natural-sounding voice. The full loop runs in under 2 seconds.
The model sees your code editor state, conversation history, and problem constraints. It asks follow-up questions that reference your actual code, not generic prompts. That contextual awareness drove 94% accuracy in technical interview interactions.
The speech pipelines handle interruptions, silence detection, and turn-taking. You can cut the interviewer off mid-sentence, just like a real conversation. The system detects when you are thinking versus when you are done speaking.
Architecture
FastAPI, PostgreSQL, and Session History
I designed the database schema and REST API layer with FastAPI and PostgreSQL. The schema supports user accounts, interview sessions with full transcript history, performance analytics, and question banks organized by topic and difficulty. Transcripts and audio references live in object storage.
Decisions
Why WebSockets and a Hybrid Speech Stack
We chose WebSockets over HTTP polling for the interview session. WebSockets give us a persistent, bidirectional connection that feels instant. The tradeoff is more complex connection management and reconnection logic, but for a real-time voice product, there was no alternative.
FastAPI over Express or Django was about type safety and async performance. FastAPI validates request and response shapes at runtime, catches bugs before they hit production, and handles concurrent WebSocket connections efficiently.
For speech synthesis, we started with Azure AI Speech alone but switched to a hybrid approach. ElevenLabs produces more natural-sounding voice for the interviewer persona. Azure handles transcription, where accuracy matters more than tone. LiveKit manages the real-time audio transport layer.
Quality
Zero Critical Bugs Over Three Months
I set up Jest and Cypress test coverage for critical paths, enabling the team to ship daily with zero critical bugs over 3 months. Jest covers the business logic: scoring algorithms, session state machines, and API contract validation. Cypress E2E tests simulate full interview flows from login through session completion.
The hardest part to test was the real-time audio pipeline. We built mock WebSocket servers that replay recorded sessions, so CI can verify the full conversation loop without hitting external APIs.
Reflection
Latency, MVPs, and Building from Zero
Mocktalk was my first time building a product from absolute zero. Every decision, from the database schema to the deployment pipeline, was mine to make and mine to live with.
The biggest challenge was latency. Voice conversations feel wrong when there is even a 500ms delay. We spent weeks optimizing the speech pipeline: streaming partial transcriptions, pre-generating response openings, and using edge servers closer to users. Getting it under 2 seconds made the product feel real.
What I would do differently: start with a simpler MVP. We built the full real-time voice system before validating that users would pay for it. A text-based version with voice as a premium feature would have let us test the market faster.