Complete Implementation Guide
Build a Telugu English
AI Voice Tutor App
A step-by-step developer guide to build, deploy, and manage a voice-first English learning app for Telugu speakers — running on a zero-cost infrastructure stack, monetised at ₹1 per user per day.
01 / Overview
How this app works
The app is a voice conversation loop. The AI (in your cloned voice) asks English questions in a Telugu-friendly style. The user responds by voice. The AI corrects grammar, gives the improved sentence back as audio, and continues the lesson.
Conversation loop
1 → AI asks a question (audio)
Your cloned voice speaks a question in slow, clear English. E.g. "Tell me what you did this morning."
2 → User speaks in English
User holds mic button, speaks their answer (5–15 seconds). Audio recorded on device.
3 → Whisper converts speech → text
Audio sent to your server. Whisper.cpp transcribes it. Result: raw English text from user.
4 → Gemini corrects grammar
Gemini returns: corrected sentence + short explanation + next question. All in 2–3 sentences max.
5 → ElevenLabs speaks the response
Corrected text sent to ElevenLabs using your cloned voice ID. Audio returned to app and played back.
6 → Loop continues
Session ends after 5 exchanges. Progress saved. Daily limit enforced by subscription status.
02 / Tech Stack
Complete zero-cost stack
Every component chosen for maximum free tier generosity and minimum maintenance burden for a solo developer.
| Layer | Service | Cost | Why this choice |
|---|---|---|---|
| STT | Whisper.cpp (self-hosted) | $0 forever | No per-minute cost. Runs on Oracle free VM. Best Telugu-accent English accuracy. |
| LLM | Google Gemini 2.0 Flash | Free ≤300 users | 1,500 req/day free. Understands Indian English context. Strong multilingual tokenizer. |
| TTS + Voice Clone | ElevenLabs | $5/mo Starter | Only paid component. Clones your voice from 1 min recording. Flash model = 0.5 credit/char. |
| Hosting | Oracle Cloud Always Free | $0 forever | 4 ARM CPUs, 24 GB RAM, 200 GB disk. Runs Node.js + Whisper simultaneously. |
| Database | SQLite (on Oracle VM) | $0 | Simple, zero-config, perfect for single-server MVP. Upgrade to Postgres at 1000+ users. |
| Mobile App | Flutter | $0 | Single codebase → Android + iOS. Dart is easy to learn. Excellent audio recording support. |
| Payments | Razorpay UPI AutoPay | 2% per txn | Only Indian payment gateway supporting ₹1/day recurring UPI. ₹0 setup fee. |
| Play Store | Google Play | ₹1,500 one-time | One-time fee. No yearly renewal for Android. Launch here before iOS. |
03 / Architecture
System architecture
The architecture is intentionally simple — a single Node.js server on Oracle Cloud handles all API orchestration. The Flutter app talks only to your server, never directly to external APIs.
FLUTTER APP (Android/iOS)
Node.js API
local process
grammar + response
your cloned voice
ALSO ON ORACLE VM:
users, sessions, limits
subscription status
repeat-after-me
API endpoints on your server
| Endpoint | Method | What it does |
|---|---|---|
POST /api/transcribe | POST | Receives audio file, returns transcribed text via Whisper.cpp |
POST /api/respond | POST | Receives transcribed text, returns grammar correction + audio URL |
POST /api/translate | POST | Telugu → English translation via Gemini |
GET /api/session | GET | Returns today's session count for a user |
POST /api/repeat | POST | Generates or serves cached audio for repeat-after-me phrases |
POST /webhook/razorpay | POST | Handles payment events — activates/deactivates subscriptions |
04 / Server Setup
Oracle Cloud Always Free VM
Oracle's Always Free tier gives you a powerful ARM VM at zero cost forever. This single server runs your entire backend.
VM spec to select
Initial server setup
# Update system
sudo apt update && sudo apt upgrade -y
# Install Node.js 20 LTS
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs
# Install build tools (needed for Whisper.cpp)
sudo apt install -y build-essential cmake git ffmpeg
# Install PM2 (keeps Node.js running after SSH disconnect)
sudo npm install -g pm2
# Create app directory
mkdir ~/tutor-app && cd ~/tutor-app
npm init -y
npm install express multer axios dotenv better-sqlite3 node-fetch form-data
Open firewall ports
In Oracle Console → Networking → Security Lists → add ingress rules:
Port 22 TCP 0.0.0.0/0 # SSH (already open)
Port 80 TCP 0.0.0.0/0 # HTTP
Port 443 TCP 0.0.0.0/0 # HTTPS
Port 3000 TCP 0.0.0.0/0 # Node.js (dev only, close in prod)
sudo iptables -I INPUT -p tcp --dport 80 -j ACCEPT
sudo iptables -I INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -I INPUT -p tcp --dport 3000 -j ACCEPT
sudo netfilter-persistent save
Install SSL (free via Let's Encrypt)
sudo apt install certbot nginx -y
sudo certbot --nginx -d yourdomain.com
# Auto-renews every 90 days via cron
05 / Speech to Text
Whisper.cpp — self-hosted STT
Whisper.cpp is a C++ port of OpenAI Whisper. It runs efficiently on CPU, making it perfect for the Oracle ARM VM with no GPU needed.
Install Whisper.cpp
cd ~
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
# Build (ARM optimized)
make -j4
# Download the 'small' model (~244 MB) — best speed/accuracy tradeoff
bash ./models/download-ggml-model.sh small
# Test it works
./main -m models/ggml-small.bin -f samples/jfk.wav
Model selection guide
| Model | Size | Speed on Oracle VM | Accuracy | Recommendation |
|---|---|---|---|---|
| tiny | 39 MB | ~0.5s per 10s audio | Good | MVP testing only |
| small | 244 MB | ~2s per 10s audio | Very good | Use this |
| medium | 769 MB | ~5s per 10s audio | Excellent | When users > 500 |
Node.js integration
const { exec } = require('child_process');
const path = require('path');
const fs = require('fs');
const WHISPER_BIN = '/home/ubuntu/whisper.cpp/main';
const WHISPER_MODEL = '/home/ubuntu/whisper.cpp/models/ggml-small.bin';
async function transcribeAudio(audioFilePath) {
// Convert to WAV 16kHz mono (Whisper requirement)
const wavPath = audioFilePath.replace(/\.\w+$/, '_16k.wav');
await runCommand(
`ffmpeg -i ${audioFilePath} -ar 16000 -ac 1 -c:a pcm_s16le ${wavPath} -y`
);
const result = await runCommand(
`${WHISPER_BIN} -m ${WHISPER_MODEL} -f ${wavPath} -nt -l en`
);
// Cleanup temp files
fs.unlinkSync(audioFilePath);
fs.unlinkSync(wavPath);
return result.trim();
}
function runCommand(cmd) {
return new Promise((resolve, reject) => {
exec(cmd, { maxBuffer: 10 * 1024 * 1024 }, (err, stdout, stderr) => {
if (err) return reject(err);
resolve(stdout || stderr);
});
});
}
module.exports = { transcribeAudio };
Upload endpoint
const express = require('express');
const multer = require('multer');
const { transcribeAudio } = require('../services/whisper');
const { checkDailyLimit } = require('../services/limits');
const router = express.Router();
const upload = multer({ dest: '/tmp/audio/', limits: { fileSize: 5 * 1024 * 1024 } });
router.post('/transcribe', upload.single('audio'), async (req, res) => {
const { userId } = req.body;
const allowed = await checkDailyLimit(userId);
if (!allowed) return res.status(429).json({ error: 'Daily limit reached. Please pay ₹1 to continue.' });
const text = await transcribeAudio(req.file.path);
res.json({ transcript: text });
});
module.exports = router;
06 / Voice Cloning
ElevenLabs — clone your voice
This is the most important step. Your cloned voice makes the AI feel personal and trustworthy to Telugu learners. Record once, use forever.
Step 1 — Record your voice sample
Environment
Sit in a quiet room. Use your phone mic close to your mouth. No background noise, no fan sounds.
What to say
Read 2–3 minutes of diverse English sentences. Include questions, statements, and some Telugu-style sentences like "I am going to market now." Speak clearly but naturally — this becomes your AI voice.
Format
Save as MP3 or WAV. File size should be 2–10 MB. Longer recordings = better voice quality.
Step 2 — Create the clone
Sign up at elevenlabs.io
Free account. Navigate to Voices → Add Voice → Instant Voice Cloning.
Upload your recording
Upload your MP3. Name it something like "Telugu Tutor - [YourName]". Click Save.
Copy your Voice ID
After saving, click the voice → you'll see a Voice ID like AbCdEfGhIjKlMnOpQrSt. Copy this — you'll need it in the server .env file.
Upgrade to Starter ($5/mo)
Required for commercial use. Gives 30,000 credits/month (~30 min of audio). This covers ~200 users at 5 sessions/day.
Node.js TTS service
const axios = require('axios');
const fs = require('fs');
const path = require('path');
const VOICE_ID = process.env.ELEVENLABS_VOICE_ID;
const API_KEY = process.env.ELEVENLABS_API_KEY;
async function textToSpeech(text, outputPath) {
const response = await axios.post(
`https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}`,
{
text,
model_id: 'eleven_flash_v2_5', // Flash = 0.5 credits/char (cheapest)
voice_settings: {
stability: 0.75, // Higher = more consistent, less expressive
similarity_boost: 0.85, // How closely to match your cloned voice
speed: 0.85 // Slightly slower for learners
}
},
{
headers: { 'xi-api-key': API_KEY, 'Content-Type': 'application/json' },
responseType: 'arraybuffer'
}
);
fs.writeFileSync(outputPath, response.data);
return outputPath;
}
module.exports = { textToSpeech };
eleven_flash_v2_5 model (not multilingual_v2). Flash costs 0.5 credits/char instead of 1, halving your TTS bill with no quality difference for short conversational sentences.07 / Language Model
Gemini 2.0 Flash — grammar & tutoring
Gemini handles all the "AI thinking" — grammar correction, generating the tutor's next question, and Telugu translation. The free tier handles ~300 users before any cost.
Get your API key
Go to aistudio.google.com
Sign in with Google. Click "Get API Key" → Create API key in new project. Copy the key.
Free tier limits
1,500 requests/day, 1 million tokens/minute. This supports ~300 users doing 5 sessions each per day for free.
The tutor system prompt
This is the most important piece. Your prompt shapes the entire learning experience.
module.exports = function getTutorPrompt(level = 'beginner') {
return `You are an English tutor for Telugu-speaking beginners in India.
ROLE:
- You speak like a friendly local teacher, not a formal professor
- You understand that the student thinks in Telugu and translates mentally
- You are patient, warm, and encouraging
RULES:
- Always respond in exactly 3 parts, separated by ||| delimiter:
1. CORRECTION: Fix the student's grammar mistake in 1 sentence. If correct, say "Perfect! Your sentence is correct."
2. EXPLANATION: Explain the rule simply, like talking to a 10-year-old. Use examples.
3. NEXT_QUESTION: Ask a simple follow-up question to continue the conversation.
- Keep ALL 3 parts together under 60 words total
- Use simple Indian English examples (market, auto, tiffin, etc.)
- NEVER use complex grammar terminology
- If the student said something in Telugu, gently ask them to try in English
LEVEL: ${level} (beginner = simple present/past tense only)
Example output:
"You should say 'I went to market' not 'I am go to market.'|||
We use 'went' for past actions, like 'I ate dosa', 'I slept early.'|||
What did you eat for breakfast today?"`;
};
Grammar correction service
const { GoogleGenerativeAI } = require('@google/generative-ai');
const getTutorPrompt = require('../prompts/tutor');
const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);
async function correctAndRespond(userText, userLevel = 'beginner') {
const model = genAI.getGenerativeModel({
model: 'gemini-2.0-flash',
systemInstruction: getTutorPrompt(userLevel),
generationConfig: { maxOutputTokens: 120, temperature: 0.4 }
});
const result = await model.generateContent(userText);
const raw = result.response.text();
const [correction, explanation, nextQuestion] = raw.split('|||').map(s => s.trim());
return { correction, explanation, nextQuestion,
fullResponse: `${correction} ${explanation} ${nextQuestion}` };
}
async function translateToEnglish(teluguText) {
const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });
const result = await model.generateContent(
`Translate this Telugu text to simple English. Return ONLY the translation, nothing else: ${teluguText}`
);
return result.response.text().trim();
}
module.exports = { correctAndRespond, translateToEnglish };
npm install @google/generative-ai
08 / Backend
Node.js API server
The complete Express server that ties all services together — receiving audio, orchestrating Whisper → Gemini → ElevenLabs, and returning a single audio response to the app.
Project structure
tutor-app/
├── server.js # Main entry point
├── .env # API keys (never commit this)
├── database.js # SQLite setup
├── routes/
│ ├── transcribe.js # POST /api/transcribe
│ ├── respond.js # POST /api/respond
│ ├── translate.js # POST /api/translate
│ ├── repeat.js # POST /api/repeat
│ └── webhook.js # POST /webhook/razorpay
├── services/
│ ├── whisper.js # Whisper.cpp wrapper
│ ├── gemini.js # Gemini API
│ ├── tts.js # ElevenLabs TTS
│ └── limits.js # Daily session limits
├── prompts/
│ └── tutor.js # System prompts
└── audio/
└── cache/ # Cached repeat-after-me audio files
Main server
require('dotenv').config();
const express = require('express');
const app = express();
app.use(express.json());
app.use('/audio', express.static('audio')); # Serve audio files
app.use('/api', require('./routes/transcribe'));
app.use('/api', require('./routes/respond'));
app.use('/api', require('./routes/translate'));
app.use('/api', require('./routes/repeat'));
app.use('/webhook', require('./routes/webhook'));
app.listen(3000, () => console.log('Tutor API running on :3000'));
Main respond endpoint (full pipeline)
const router = require('express').Router();
const { correctAndRespond } = require('../services/gemini');
const { textToSpeech } = require('../services/tts');
const { logSession } = require('../services/limits');
const path = require('path');
const crypto = require('crypto');
router.post('/respond', async (req, res) => {
const { userId, transcript, level } = req.body;
// 1. Get grammar correction + next question from Gemini
const { correction, explanation, nextQuestion, fullResponse }
= await correctAndRespond(transcript, level);
// 2. Convert response to audio using your cloned voice
const filename = `${crypto.randomUUID()}.mp3`;
const audioPath = path.join(__dirname, '../audio', filename);
await textToSpeech(fullResponse, audioPath);
// 3. Log this session exchange
await logSession(userId);
// 4. Return text + audio URL
res.json({
correction, explanation, nextQuestion,
audioUrl: `${process.env.BASE_URL}/audio/${filename}`,
transcript // Echo back what user said
});
});
module.exports = router;
Database setup
const Database = require('better-sqlite3');
const db = new Database('tutor.db');
db.exec(`
CREATE TABLE IF NOT EXISTS users (
id TEXT PRIMARY KEY,
phone TEXT UNIQUE,
level TEXT DEFAULT 'beginner',
is_active INTEGER DEFAULT 0,
subscription_id TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS sessions (
id INTEGER PRIMARY KEY AUTOINCREMENT,
user_id TEXT,
date TEXT,
exchanges INTEGER DEFAULT 0,
created_at TEXT DEFAULT (datetime('now'))
);
CREATE TABLE IF NOT EXISTS phrase_cache (
phrase_hash TEXT PRIMARY KEY,
audio_path TEXT,
created_at TEXT DEFAULT (datetime('now'))
);
`);
module.exports = db;
Start server with PM2
# Start server and keep it alive
pm2 start server.js --name tutor-api
# Auto-restart on server reboot
pm2 startup && pm2 save
# View logs
pm2 logs tutor-api
# Restart after code changes
pm2 restart tutor-api
09 / Mobile App
Flutter app
The Flutter app handles voice recording, playback, and the learning UI. It communicates only with your Node.js server — never directly with external APIs.
Project setup
flutter create telugu_tutor
cd telugu_tutor
# Add required packages to pubspec.yaml
flutter pub add http
flutter pub add record # Audio recording
flutter pub add audioplayers # Audio playback
flutter pub add shared_preferences # Store user ID locally
flutter pub add permission_handler # Mic permissions
Android permissions
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
Core tutor screen logic
import 'package:flutter/material.dart';
import 'package:record/record.dart';
import 'package:audioplayers/audioplayers.dart';
import '../services/api_service.dart';
class TutorScreen extends StatefulWidget {
@override
_TutorScreenState createState() => _TutorScreenState();
}
class _TutorScreenState extends State<TutorScreen> {
final _recorder = AudioRecorder();
final _player = AudioPlayer();
final _api = ApiService();
bool _isRecording = false;
bool _isProcessing = false;
String _status = 'Hold the button to speak';
String _transcript = '';
String _correction = '';
void _startRecording() async {
if (await _recorder.hasPermission()) {
await _recorder.start(RecordConfig(), path: '/tmp/user_audio.m4a');
setState(() { _isRecording = true; _status = 'Listening...'; });
}
}
void _stopAndProcess() async {
await _recorder.stop();
setState(() { _isRecording = false; _isProcessing = true; _status = 'Processing...'; });
// Send to server: transcribe → correct → generate audio
final result = await _api.sendAudio('/tmp/user_audio.m4a');
setState(() {
_transcript = result['transcript'] ?? '';
_correction = result['correction'] ?? '';
_isProcessing = false;
_status = 'Hold to speak again';
});
// Play the AI voice response
await _player.play(UrlSource(result['audioUrl']));
}
@override
Widget build(BuildContext context) {
return Scaffold(
backgroundColor: Color(0xFF0D0F14),
body: Column(children: [
/* Status + transcript + correction UI */
Text(_status),
if (_transcript.isNotEmpty) Text('You said: $_transcript'),
if (_correction.isNotEmpty) Text(_correction),
/* Hold-to-speak button */
GestureDetector(
onLongPressStart: (_) => _startRecording(),
onLongPressEnd: (_) => _stopAndProcess(),
child: CircleAvatar(radius: 40,
backgroundColor: _isRecording ? Colors.red : Colors.blue,
child: Icon(Icons.mic, color: Colors.white)),
),
]),
);
}
}
10 / Features
Grammar correction feature
The core feature. User speaks → Whisper transcribes → Gemini corrects → ElevenLabs speaks the correction back in your voice.
Common Telugu English errors to handle
Add these as examples in your system prompt to help Gemini recognize Telugu-speaker patterns:
| Telugu English error | Correct form | Telugu thinking pattern |
|---|---|---|
| "I am go to market" | "I am going to market" | Present continuous confusion |
| "Yesterday I am eating dosa" | "Yesterday I ate dosa" | Past tense using 'am' |
| "She don't know" | "She doesn't know" | Subject-verb agreement |
| "I have went" | "I have gone" / "I went" | Perfect tense confusion |
| "He is more taller" | "He is taller" | Double comparative |
| "What is your good name?" | "What is your name?" | Direct Telugu translation |
Correction response format
11 / Features
Telugu → English translation
Users can type or speak in Telugu to get the English translation. This helps when they don't know a word in English.
const router = require('express').Router();
const { translateToEnglish } = require('../services/gemini');
const { textToSpeech } = require('../services/tts');
const crypto = require('crypto');
const path = require('path');
router.post('/translate', async (req, res) => {
const { teluguText } = req.body;
const englishText = await translateToEnglish(teluguText);
const filename = `trans_${crypto.randomUUID()}.mp3`;
const audioPath = path.join(__dirname, '../audio', filename);
await textToSpeech(englishText, audioPath);
res.json({
teluguInput: teluguText,
englishTranslation: englishText,
audioUrl: `${process.env.BASE_URL}/audio/${filename}`
});
});
module.exports = router;
12 / Features
Repeat-after-me practice
The AI says a phrase slowly. User repeats it. Audio is cached — the same phrase never hits ElevenLabs API twice, saving credits.
const router = require('express').Router();
const { textToSpeech } = require('../services/tts');
const db = require('../database');
const crypto = require('crypto');
const path = require('path');
const fs = require('fs');
router.post('/repeat', async (req, res) => {
const { phrase } = req.body;
const hash = crypto.createHash('md5').update(phrase).digest('hex');
// Check if we already have audio for this phrase
const cached = db.prepare('SELECT audio_path FROM phrase_cache WHERE phrase_hash = ?').get(hash);
if (cached && fs.existsSync(cached.audio_path)) {
return res.json({ audioUrl: `${process.env.BASE_URL}/${cached.audio_path}`, cached: true });
}
// Generate new audio and cache it
const filename = `audio/cache/${hash}.mp3`;
await textToSpeech(phrase, filename);
db.prepare('INSERT INTO phrase_cache (phrase_hash, audio_path) VALUES (?, ?)').run(hash, filename);
res.json({ audioUrl: `${process.env.BASE_URL}/${filename}`, cached: false });
});
module.exports = router;
Starter phrase list
Pre-generate audio for these at app launch to warm the cache:
const phrases = [
"Good morning, how are you?",
"My name is [name]. What is your name?",
"I am going to the market.",
"Please repeat after me.",
"Very good! That was correct.",
"Try again, you can do it.",
"What did you eat for breakfast?",
"Speak slowly and clearly.",
];
// POST each to /api/repeat to warm the cache
13 / Features
Session tracking
const db = require('../database');
const FREE_DAILY_EXCHANGES = 3; // Free users get 3 exchanges/day
const PAID_DAILY_EXCHANGES = 50; // Paid users get 50 exchanges/day
async function checkDailyLimit(userId) {
const today = new Date().toISOString().split('T')[0];
const user = db.prepare('SELECT is_active FROM users WHERE id = ?').get(userId);
const session = db.prepare(
'SELECT exchanges FROM sessions WHERE user_id = ? AND date = ?'
).get(userId, today);
const limit = user?.is_active ? PAID_DAILY_EXCHANGES : FREE_DAILY_EXCHANGES;
const used = session?.exchanges || 0;
return used < limit;
}
async function logSession(userId) {
const today = new Date().toISOString().split('T')[0];
db.prepare(`
INSERT INTO sessions (user_id, date, exchanges) VALUES (?, ?, 1)
ON CONFLICT(user_id, date) DO UPDATE SET exchanges = exchanges + 1
`).run(userId, today);
}
module.exports = { checkDailyLimit, logSession };
14 / Payments
Razorpay ₹1/day UPI AutoPay
Razorpay is the only Indian payment gateway with UPI AutoPay support for amounts as low as ₹1. Setup takes ~2 days for KYC approval.
Setup steps
Create Razorpay account
Go to razorpay.com → Sign up as individual/business. Submit Aadhaar, PAN, bank account. KYC approval takes 1–2 business days.
Create a Subscription Plan
Dashboard → Products → Subscriptions → Plans → Create Plan. Set: Amount = 100 paise (₹1), Period = daily, Interval = 1.
Get API keys
Dashboard → Settings → API Keys → Generate Key. Copy Key ID and Key Secret to your .env file.
Set webhook URL
Dashboard → Settings → Webhooks → Add Webhook. URL: https://yourdomain.com/webhook/razorpay. Select events: subscription.activated, subscription.charged, subscription.cancelled.
Webhook handler
const router = require('express').Router();
const crypto = require('crypto');
const db = require('../database');
router.post('/razorpay', express.raw({ type: '*/*' }), (req, res) => {
// Verify webhook signature
const signature = req.headers['x-razorpay-signature'];
const secret = process.env.RAZORPAY_WEBHOOK_SECRET;
const expected = crypto.createHmac('sha256', secret)
.update(req.body).digest('hex');
if (signature !== expected) return res.status(400).send('Invalid');
const event = JSON.parse(req.body);
const sub = event.payload.subscription.entity;
if (event.event === 'subscription.activated' || event.event === 'subscription.charged') {
db.prepare('UPDATE users SET is_active = 1 WHERE subscription_id = ?')
.run(sub.id);
}
if (event.event === 'subscription.cancelled' || event.event === 'subscription.halted') {
db.prepare('UPDATE users SET is_active = 0 WHERE subscription_id = ?')
.run(sub.id);
}
res.json({ received: true });
});
module.exports = router;
15 / Limits
Daily usage limits
| User type | Daily exchanges | Sessions | Features |
|---|---|---|---|
| Free (unregistered) | 3 | 1 demo session | Basic conversation only |
| Paid (₹1/day active) | 50 | Unlimited | All features including translation, repeat practice |
| Lapsed (payment failed) | 3 | 1 | Prompt to renew subscription |
16 / Launch
Play Store publishing
Build release APK
# Generate signing key (do this once)
keytool -genkey -v -keystore tutor-key.jks -keyAlias tutor \
-keyalg RSA -keysize 2048 -validity 10000
# Build release APK
flutter build apk --release
# Or build App Bundle (preferred by Play Store)
flutter build appbundle --release
Play Store checklist
- Create Google Play Developer account — $25 one-time fee
- Prepare app icon: 512×512 PNG, no alpha, no rounded corners
- Create feature graphic: 1024×500 PNG
- Take 2–8 screenshots on a Telugu phone showing the learning flow
- Write app description in Telugu + English
- Set content rating (Education → Everyone)
- Add privacy policy URL (required — host a simple HTML page on your server)
- Set app as "Education" category
- Submit for review — takes 2–7 days for new accounts
17 / Scaling
Scaling guide
At each user milestone, here's what to change and what to watch.
0–300 users — everything is free
300–1,000 users
Upgrade Gemini to pay-as-you-go
Enable billing on Google Cloud. At this scale: ~₹250/month. Still 90%+ margin.
Upgrade ElevenLabs to Creator ($22/mo)
100,000 credits/month. Covers ~600 paid users at 5 sessions/day using Flash model.
Add request queueing
Install bull npm package. Queue Whisper + TTS jobs to prevent Oracle VM overload during peak times.
1,000+ users
Move to Whisper API ($0.003/min)
Self-hosted Whisper.cpp may become a CPU bottleneck. Switch to OpenAI's GPT-4o Mini Transcribe API at this scale.
Migrate SQLite → PostgreSQL
Use Supabase free tier (500 MB) or Railway's $5/mo Postgres. SQLite isn't designed for concurrent writes at this scale.
Add a CDN for audio files
Use Cloudflare (free tier) to cache and serve audio files. Reduces Oracle VM bandwidth and latency.
18 / Reference
Environment variables
Create a .env file in your project root. Never commit this to Git — add it to .gitignore.
GEMINI_API_KEY=AIzaSy...
ELEVENLABS_API_KEY=sk_abc123...
ELEVENLABS_VOICE_ID=AbCdEfGhIjKl
RAZORPAY_KEY_ID=rzp_live_...
RAZORPAY_KEY_SECRET=your_secret_here
RAZORPAY_WEBHOOK_SECRET=your_webhook_secret
RAZORPAY_PLAN_ID=plan_xxx
BASE_URL=https://tutor.yourdomain.com
PORT=3000
19 / Reference
Cost tracker
Monthly costs at different user scales. Assumes: 5 sessions/user/day, 7s audio input, 200 char AI response (Flash TTS model).
| Service | 100 users | 300 users | 500 users | 1000 users |
|---|---|---|---|---|
| Whisper.cpp (self-hosted) | ₹0 | ₹0 | ₹0 | ₹0* |
| Gemini 2.0 Flash | ₹0 (free tier) | ₹0 (free tier) | ~₹180 | ~₹360 |
| ElevenLabs (Flash model) | ₹415 ($5) | ₹415 ($5) | ~₹830 ($10) | ~₹1,660 ($20) |
| Oracle Cloud VM | ₹0 | ₹0 | ₹0 | ₹0 |
| Razorpay (2%) | ₹60 | ₹180 | ₹300 | ₹600 |
| Total monthly cost | ~₹475 | ~₹595 | ~₹1,310 | ~₹2,620 |
| Total monthly revenue (₹1/day) | ₹3,000 | ₹9,000 | ₹15,000 | ₹30,000 |
| Net profit | ₹2,525 | ₹8,405 | ₹13,690 | ₹27,380 |
* At 1000 users, consider switching Whisper to API ($0.003/min) if Oracle VM CPU exceeds 80% consistently.
20 / Reference
Troubleshooting
Common issues
Whisper returning empty or garbled text
Make sure ffmpeg is converting to 16kHz mono WAV before passing to Whisper. Check with: ffprobe yourfile.wav — should show 16000 Hz, 1 channel.
ElevenLabs audio sounds robotic or rushed
Increase stability to 0.8 and reduce speed to 0.8 in the voice settings. The Flash model is optimized for speed — if quality is poor, try eleven_multilingual_v2 (costs 1 credit/char instead of 0.5).
Gemini returning empty responses
You've hit the free tier limit (1,500 req/day). Either enable billing on Google Cloud or implement a fallback: cache the last 10 Gemini responses and rotate them for repeat users.
Oracle VM runs out of disk space
Audio files accumulate in /audio/. Add a cron job to delete non-cached audio files older than 1 hour:
# Delete old audio files every hour (keep cache folder)
0 * * * * find /home/ubuntu/tutor-app/audio -maxdepth 1 -name "*.mp3" -mmin +60 -delete
Flutter app crashes on audio recording
Most common cause: microphone permission not granted. Add a permission check before starting recording:
final status = await Permission.microphone.request();
if (status != PermissionStatus.granted) {
// Show dialog asking user to enable mic in settings
}
Razorpay webhook not triggering
Ensure your server's SSL certificate is valid (webhooks require HTTPS). Test the webhook manually from Razorpay Dashboard → Settings → Webhooks → Test. Check PM2 logs: pm2 logs tutor-api.
Users report slow response time (>10 seconds)
The bottleneck is usually Whisper.cpp on the Oracle ARM VM. Switch to ggml-tiny.bin model for faster transcription (~0.5s for 10s audio) or add request queuing with Bull so multiple users don't run Whisper simultaneously.
Monitoring commands
# Check server is running
pm2 status
# View last 100 log lines
pm2 logs tutor-api --lines 100
# Check disk usage
df -h
# Check Oracle VM CPU/memory
htop
# Count total users in DB
sqlite3 tutor.db "SELECT COUNT(*) FROM users;"
# Count paid users
sqlite3 tutor.db "SELECT COUNT(*) FROM users WHERE is_active = 1;"
# Today's session count
sqlite3 tutor.db "SELECT COUNT(*) FROM sessions WHERE date = date('now');"
# Check audio cache folder size
du -sh ~/tutor-app/audio/