Yes. MIT licensed open source. Groq and Gemini both have free tiers that are enough for daily use. Ollama is completely free with no limits. There is no pro plan, no premium tier, no credit card needed.

Can I use it without internet?

Yes. Install Ollama, download a Whisper model (small is 480MB), and Rota works 100% offline. No API keys, no accounts, no internet. Your voice data never leaves your machine.

Is Rota AI a free alternative to Wispr Flow?

Yes, Rota AI is a free, open source alternative to Wispr Flow. It offers similar features like AI-powered text cleanup, context-aware dictation, and works across multiple apps. Unlike Wispr Flow, Rota AI is completely free with no subscriptions, offers offline mode, has zero telemetry, and the full source code is available under MIT license.

Rota AI has zero telemetry. The desktop app never phones home. Your voice data goes only to the transcription service you choose (Groq, Gemini, or local Ollama). API keys are encrypted at rest using your OS keychain.

Does Rota AI work with VS Code?

Yes. Rota detects VS Code and preserves camelCase, snake_case, and code syntax. Your code comments come out clean without extra punctuation. It also works in terminals and IDEs.

What are the system requirements?

Windows 10/11, macOS 13+, or Linux (Ubuntu 20.04+, Fedora 36+, Arch). 4GB RAM minimum, 8GB recommended. For local GPU transcription: NVIDIA GPU with 4GB+ VRAM. CPU-only works on any modern quad-core.

Why I Built Rota AI

The 14-Day Free Trial That Started Everything

I am a student. When I first tried Wispr Flow, I was amazed.

The way it understood what I was saying, cleaned it up, and dropped it into whatever app I was using. It was a glimpse of the future. Typing felt slow by comparison. My fingers felt inefficient.

After a week of using it, going back to a keyboard was a downgrade.

Then my 14-day free trial ended. I could not go back to typing.

So I started thinking: why can't I build something like this myself? How hard could it be?

It turned out to be very hard.

What I Had to Learn

I spent months learning how voice dictation actually works under the hood. Things I never expected to need to know:

How speech recognition models process audio at the signal level
How Voice Activity Detection strips silence before transcription
How to inject text into any application on Windows without breaking anything
How Windows UI Automation works and why it exists
How to build a pipeline where every stage runs on its own thread so the UI never freezes
How to encrypt API keys with Windows DPAPI so they cannot be stolen even if someone copies your config file

I reverse engineered how Wispr Flow processes audio, how it does AI cleanup, how it detects what app you are in, how it injects text. I read source code. I read research papers. I built things, broke them, rebuilt them.

This project taught me more about real software engineering than any course ever did.

What Rota AI Does

Rota AI sits on your desktop. You hold a hotkey. You speak. Rota records your voice, transcribes it using Whisper, runs an AI cleanup pass to remove filler words and fix grammar, then types the result into whatever app your cursor is in.

No switching apps. No copy pasting. No subscription. No account. No internet required if you go fully local with Ollama.

It works in VS Code, Slack, Notion, Gmail, Word, Discord, your browser, your terminal. Any app with a text field.

The stack: Python, PySide6, Faster Whisper, Silero VAD, Groq, Gemini, SQLite. The UI is pure black, custom styled, floating pill overlay with an audio-reactive waveform. Every pixel is intentional.

The Technical Architecture

The pipeline has 7 stages, each on its own thread:

Stage 1 - Audio capture 16kHz mono PCM via PortAudio. Real-time RMS for waveform visualization.

Stage 2 - Voice activity detection Silero VAD v6 strips leading and trailing silence before anything goes to the transcription engine. 2MB model, under 1ms per chunk. This saves API credits and improves accuracy.

Stage 3 - Transcription Your choice of Groq Whisper (cloud, free tier), Gemini (cloud, free tier), or local Faster Whisper via Ollama (no internet ever needed).

Stage 4 - AI cleanup A second LLM pass removes filler words, resolves self-corrections, formats lists, adjusts tone. Skipped entirely for phrases under 5 words to keep latency low.

Stage 5 - Context detection Reads the active window title and process name. Classifies context: code editor, email, chat, notes. Adapts output accordingly. VS Code gets camelCase preserved. Slack gets casual tone. Outlook gets formal.

Stage 6 - Text injection Multiple methods: SendInput API for short text, clipboard + Ctrl+V for longer passages, fallback to pyautogui for stubborn apps.

Stage 7 - Persistence SQLite for session history, snippets, personal dictionary, analytics. DPAPI-encrypted config for API keys.

Why Free and Open Source

Wispr Flow costs $15 per month. SuperWhisper costs $8.49 per month. Both are great products. But not everyone can afford a subscription for a tool like this.

I am a student. Many people who would benefit from voice dictation are students, writers, developers, people with accessibility needs. People for whom $15 per month is not a small amount.

Rota AI is MIT licensed. Free. No account. No credit card. No subscription. Bring your own API keys (Groq and Gemini both have free tiers that handle daily use), or go fully local with Ollama.

The code is all on GitHub. Every line. You can audit exactly what it does with your voice data.

What is Next

The Windows version is solid. Linux support via AppImage is in testing. macOS is planned.

There is more to build: screen context reading (like SuperWhisper's Super Mode), file transcription, cross-device sync, mobile. But the core product works well right now and I am proud of it.

If you are a student and this inspires you to build something, do it. You do not need a big team. You do not need funding. You need curiosity and the stubbornness to keep going when things break (and they will, constantly).

Rota AI is free and open source. Star it on GitHub - it helps others find it.