Complete Implementation Guide

Build a Telugu English
AI Voice Tutor App

A step-by-step developer guide to build, deploy, and manage a voice-first English learning app for Telugu speakers — running on a zero-cost infrastructure stack, monetised at ₹1 per user per day.

Cost to run: ₹0 until 300+ users

Revenue model: ₹1/user/day UPI AutoPay

Voice: Your cloned voice via ElevenLabs

Target: Telugu speakers learning English

01 / Overview

How this app works

The app is a voice conversation loop. The AI (in your cloned voice) asks English questions in a Telugu-friendly style. The user responds by voice. The AI corrects grammar, gives the improved sentence back as audio, and continues the lesson.

API cost / user / day

₹0.08–0.15

at 5 sessions/day

Revenue / user / day

₹0.98

after Razorpay 2%

Profit margin

~83%

at MVP scale

Free user limit

~300

before any API cost

Conversation loop

1 → AI asks a question (audio)

Your cloned voice speaks a question in slow, clear English. E.g. "Tell me what you did this morning."

2 → User speaks in English

User holds mic button, speaks their answer (5–15 seconds). Audio recorded on device.

3 → Whisper converts speech → text

Audio sent to your server. Whisper.cpp transcribes it. Result: raw English text from user.

4 → Gemini corrects grammar

Gemini returns: corrected sentence + short explanation + next question. All in 2–3 sentences max.

5 → ElevenLabs speaks the response

Corrected text sent to ElevenLabs using your cloned voice ID. Audio returned to app and played back.

6 → Loop continues

Session ends after 5 exchanges. Progress saved. Daily limit enforced by subscription status.

02 / Tech Stack

Complete zero-cost stack

Every component chosen for maximum free tier generosity and minimum maintenance burden for a solo developer.

Layer	Service	Cost	Why this choice
STT	Whisper.cpp (self-hosted)	$0 forever	No per-minute cost. Runs on Oracle free VM. Best Telugu-accent English accuracy.
LLM	Google Gemini 2.0 Flash	Free ≤300 users	1,500 req/day free. Understands Indian English context. Strong multilingual tokenizer.
TTS + Voice Clone	ElevenLabs	$5/mo Starter	Only paid component. Clones your voice from 1 min recording. Flash model = 0.5 credit/char.
Hosting	Oracle Cloud Always Free	$0 forever	4 ARM CPUs, 24 GB RAM, 200 GB disk. Runs Node.js + Whisper simultaneously.
Database	SQLite (on Oracle VM)	$0	Simple, zero-config, perfect for single-server MVP. Upgrade to Postgres at 1000+ users.
Mobile App	Flutter	$0	Single codebase → Android + iOS. Dart is easy to learn. Excellent audio recording support.
Payments	Razorpay UPI AutoPay	2% per txn	Only Indian payment gateway supporting ₹1/day recurring UPI. ₹0 setup fee.
Play Store	Google Play	₹1,500 one-time	One-time fee. No yearly renewal for Android. Launch here before iOS.

Total monthly cost at launch (0–300 users): ₹415/month (ElevenLabs Starter $5). Everything else is free. At 170 paying users, ElevenLabs cost is covered.

03 / Architecture

System architecture

The architecture is intentionally simple — a single Node.js server on Oracle Cloud handles all API orchestration. The Flutter app talks only to your server, never directly to external APIs.

FLUTTER APP (Android/iOS)

User Device

─── HTTPS ──▶

Oracle VM
Node.js API

──▶

Whisper.cpp
local process

↓ transcribed text

Node.js

──▶

Gemini API
grammar + response

↓ corrected text

Node.js

──▶

ElevenLabs API
your cloned voice

↓ audio bytes ──────────────────────────────▶ Flutter plays audio

ALSO ON ORACLE VM:

SQLite DB
users, sessions, limits

Razorpay webhook
subscription status

/tmp audio cache
repeat-after-me

API endpoints on your server

Endpoint	Method	What it does
`POST /api/transcribe`	POST	Receives audio file, returns transcribed text via Whisper.cpp
`POST /api/respond`	POST	Receives transcribed text, returns grammar correction + audio URL
`POST /api/translate`	POST	Telugu → English translation via Gemini
`GET /api/session`	GET	Returns today's session count for a user
`POST /api/repeat`	POST	Generates or serves cached audio for repeat-after-me phrases
`POST /webhook/razorpay`	POST	Handles payment events — activates/deactivates subscriptions

04 / Server Setup

Oracle Cloud Always Free VM

Oracle's Always Free tier gives you a powerful ARM VM at zero cost forever. This single server runs your entire backend.

Sign up at: cloud.oracle.com — Use a valid credit card (required for verification, never charged for Always Free resources). Select region closest to India: Mumbai (ap-mumbai-1).

VM spec to select

Shape

VM.Standard.A1

ARM Ampere

OCPUs

Always free limit

RAM

24 GB

Always free limit

Storage

200 GB

Boot + block

Initial server setup

bash — SSH into your Oracle VM

# Update system
sudo apt update && sudo apt upgrade -y

# Install Node.js 20 LTS
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs

# Install build tools (needed for Whisper.cpp)
sudo apt install -y build-essential cmake git ffmpeg

# Install PM2 (keeps Node.js running after SSH disconnect)
sudo npm install -g pm2

# Create app directory
mkdir ~/tutor-app && cd ~/tutor-app
npm init -y
npm install express multer axios dotenv better-sqlite3 node-fetch form-data

Open firewall ports

In Oracle Console → Networking → Security Lists → add ingress rules:

Oracle Security List rules

Port 22   TCP   0.0.0.0/0   # SSH (already open)
Port 80   TCP   0.0.0.0/0   # HTTP
Port 443  TCP   0.0.0.0/0   # HTTPS
Port 3000 TCP   0.0.0.0/0   # Node.js (dev only, close in prod)

bash — also open in Ubuntu firewall

sudo iptables -I INPUT -p tcp --dport 80 -j ACCEPT
sudo iptables -I INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -I INPUT -p tcp --dport 3000 -j ACCEPT
sudo netfilter-persistent save

Install SSL (free via Let's Encrypt)

Important: Your Flutter app requires HTTPS. Get a free domain from freenom.com or buy a cheap one (~₹100/year on Namecheap) and point it to your Oracle VM's public IP.

bash — install Certbot

sudo apt install certbot nginx -y
sudo certbot --nginx -d yourdomain.com
# Auto-renews every 90 days via cron

05 / Speech to Text

Whisper.cpp — self-hosted STT

Whisper.cpp is a C++ port of OpenAI Whisper. It runs efficiently on CPU, making it perfect for the Oracle ARM VM with no GPU needed.

Install Whisper.cpp

bash

cd ~
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp

# Build (ARM optimized)
make -j4

# Download the 'small' model (~244 MB) — best speed/accuracy tradeoff
bash ./models/download-ggml-model.sh small

# Test it works
./main -m models/ggml-small.bin -f samples/jfk.wav

Model selection guide

Model	Size	Speed on Oracle VM	Accuracy	Recommendation
tiny	39 MB	~0.5s per 10s audio	Good	MVP testing only
small	244 MB	~2s per 10s audio	Very good	Use this
medium	769 MB	~5s per 10s audio	Excellent	When users > 500

Node.js integration

server/services/whisper.js

const { exec } = require('child_process');
const path = require('path');
const fs = require('fs');

const WHISPER_BIN = '/home/ubuntu/whisper.cpp/main';
const WHISPER_MODEL = '/home/ubuntu/whisper.cpp/models/ggml-small.bin';

async function transcribeAudio(audioFilePath) {
  // Convert to WAV 16kHz mono (Whisper requirement)
  const wavPath = audioFilePath.replace(/\.\w+$/, '_16k.wav');
  
  await runCommand(
    `ffmpeg -i ${audioFilePath} -ar 16000 -ac 1 -c:a pcm_s16le ${wavPath} -y`
  );

  const result = await runCommand(
    `${WHISPER_BIN} -m ${WHISPER_MODEL} -f ${wavPath} -nt -l en`
  );

  // Cleanup temp files
  fs.unlinkSync(audioFilePath);
  fs.unlinkSync(wavPath);

  return result.trim();
}

function runCommand(cmd) {
  return new Promise((resolve, reject) => {
    exec(cmd, { maxBuffer: 10 * 1024 * 1024 }, (err, stdout, stderr) => {
      if (err) return reject(err);
      resolve(stdout || stderr);
    });
  });
}

module.exports = { transcribeAudio };

Upload endpoint

server/routes/transcribe.js

const express = require('express');
const multer = require('multer');
const { transcribeAudio } = require('../services/whisper');
const { checkDailyLimit } = require('../services/limits');

const router = express.Router();
const upload = multer({ dest: '/tmp/audio/', limits: { fileSize: 5 * 1024 * 1024 } });

router.post('/transcribe', upload.single('audio'), async (req, res) => {
  const { userId } = req.body;

  const allowed = await checkDailyLimit(userId);
  if (!allowed) return res.status(429).json({ error: 'Daily limit reached. Please pay ₹1 to continue.' });

  const text = await transcribeAudio(req.file.path);
  res.json({ transcript: text });
});

module.exports = router;

06 / Voice Cloning

ElevenLabs — clone your voice

This is the most important step. Your cloned voice makes the AI feel personal and trustworthy to Telugu learners. Record once, use forever.

Step 1 — Record your voice sample

Environment

Sit in a quiet room. Use your phone mic close to your mouth. No background noise, no fan sounds.

What to say

Read 2–3 minutes of diverse English sentences. Include questions, statements, and some Telugu-style sentences like "I am going to market now." Speak clearly but naturally — this becomes your AI voice.

Format

Save as MP3 or WAV. File size should be 2–10 MB. Longer recordings = better voice quality.

Step 2 — Create the clone

Sign up at elevenlabs.io

Free account. Navigate to Voices → Add Voice → Instant Voice Cloning.

Upload your recording

Upload your MP3. Name it something like "Telugu Tutor - [YourName]". Click Save.

Copy your Voice ID

After saving, click the voice → you'll see a Voice ID like AbCdEfGhIjKlMnOpQrSt. Copy this — you'll need it in the server .env file.

Upgrade to Starter ($5/mo)

Required for commercial use. Gives 30,000 credits/month (~30 min of audio). This covers ~200 users at 5 sessions/day.

Node.js TTS service

server/services/tts.js

const axios = require('axios');
const fs = require('fs');
const path = require('path');

const VOICE_ID = process.env.ELEVENLABS_VOICE_ID;
const API_KEY = process.env.ELEVENLABS_API_KEY;

async function textToSpeech(text, outputPath) {
  const response = await axios.post(
    `https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}`,
    {
      text,
      model_id: 'eleven_flash_v2_5',  // Flash = 0.5 credits/char (cheapest)
      voice_settings: {
        stability: 0.75,          // Higher = more consistent, less expressive
        similarity_boost: 0.85,   // How closely to match your cloned voice
        speed: 0.85               // Slightly slower for learners
      }
    },
    {
      headers: { 'xi-api-key': API_KEY, 'Content-Type': 'application/json' },
      responseType: 'arraybuffer'
    }
  );

  fs.writeFileSync(outputPath, response.data);
  return outputPath;
}

module.exports = { textToSpeech };

Cost tip: Always use eleven_flash_v2_5 model (not multilingual_v2). Flash costs 0.5 credits/char instead of 1, halving your TTS bill with no quality difference for short conversational sentences.

07 / Language Model

Gemini 2.0 Flash — grammar & tutoring

Gemini handles all the "AI thinking" — grammar correction, generating the tutor's next question, and Telugu translation. The free tier handles ~300 users before any cost.

Get your API key

Go to aistudio.google.com

Free tier limits

1,500 requests/day, 1 million tokens/minute. This supports ~300 users doing 5 sessions each per day for free.

The tutor system prompt

This is the most important piece. Your prompt shapes the entire learning experience.

server/prompts/tutor.js

module.exports = function getTutorPrompt(level = 'beginner') {
  return `You are an English tutor for Telugu-speaking beginners in India.

ROLE:
- You speak like a friendly local teacher, not a formal professor
- You understand that the student thinks in Telugu and translates mentally
- You are patient, warm, and encouraging

RULES:
- Always respond in exactly 3 parts, separated by ||| delimiter:
  1. CORRECTION: Fix the student's grammar mistake in 1 sentence. If correct, say "Perfect! Your sentence is correct."
  2. EXPLANATION: Explain the rule simply, like talking to a 10-year-old. Use examples.
  3. NEXT_QUESTION: Ask a simple follow-up question to continue the conversation.
- Keep ALL 3 parts together under 60 words total
- Use simple Indian English examples (market, auto, tiffin, etc.)
- NEVER use complex grammar terminology
- If the student said something in Telugu, gently ask them to try in English

LEVEL: ${level} (beginner = simple present/past tense only)

Example output:
"You should say 'I went to market' not 'I am go to market.'|||
We use 'went' for past actions, like 'I ate dosa', 'I slept early.'|||
What did you eat for breakfast today?"`;
};

Grammar correction service

server/services/gemini.js

const { GoogleGenerativeAI } = require('@google/generative-ai');
const getTutorPrompt = require('../prompts/tutor');

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

async function correctAndRespond(userText, userLevel = 'beginner') {
  const model = genAI.getGenerativeModel({
    model: 'gemini-2.0-flash',
    systemInstruction: getTutorPrompt(userLevel),
    generationConfig: { maxOutputTokens: 120, temperature: 0.4 }
  });

  const result = await model.generateContent(userText);
  const raw = result.response.text();

  const [correction, explanation, nextQuestion] = raw.split('|||').map(s => s.trim());

  return { correction, explanation, nextQuestion,
    fullResponse: `${correction} ${explanation} ${nextQuestion}` };
}

async function translateToEnglish(teluguText) {
  const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });
  const result = await model.generateContent(
    `Translate this Telugu text to simple English. Return ONLY the translation, nothing else: ${teluguText}`
  );
  return result.response.text().trim();
}

module.exports = { correctAndRespond, translateToEnglish };

bash — install Gemini SDK

npm install @google/generative-ai

08 / Backend

Node.js API server

The complete Express server that ties all services together — receiving audio, orchestrating Whisper → Gemini → ElevenLabs, and returning a single audio response to the app.

Project structure

directory structure

tutor-app/
├── server.js              # Main entry point
├── .env                   # API keys (never commit this)
├── database.js            # SQLite setup
├── routes/
│   ├── transcribe.js      # POST /api/transcribe
│   ├── respond.js         # POST /api/respond
│   ├── translate.js       # POST /api/translate
│   ├── repeat.js          # POST /api/repeat
│   └── webhook.js         # POST /webhook/razorpay
├── services/
│   ├── whisper.js         # Whisper.cpp wrapper
│   ├── gemini.js          # Gemini API
│   ├── tts.js             # ElevenLabs TTS
│   └── limits.js          # Daily session limits
├── prompts/
│   └── tutor.js           # System prompts
└── audio/
    └── cache/             # Cached repeat-after-me audio files

Main server

server.js

require('dotenv').config();
const express = require('express');
const app = express();

app.use(express.json());
app.use('/audio', express.static('audio'));  # Serve audio files

app.use('/api', require('./routes/transcribe'));
app.use('/api', require('./routes/respond'));
app.use('/api', require('./routes/translate'));
app.use('/api', require('./routes/repeat'));
app.use('/webhook', require('./routes/webhook'));

app.listen(3000, () => console.log('Tutor API running on :3000'));

Main respond endpoint (full pipeline)

routes/respond.js

const router = require('express').Router();
const { correctAndRespond } = require('../services/gemini');
const { textToSpeech } = require('../services/tts');
const { logSession } = require('../services/limits');
const path = require('path');
const crypto = require('crypto');

router.post('/respond', async (req, res) => {
  const { userId, transcript, level } = req.body;

  // 1. Get grammar correction + next question from Gemini
  const { correction, explanation, nextQuestion, fullResponse }
    = await correctAndRespond(transcript, level);

  // 2. Convert response to audio using your cloned voice
  const filename = `${crypto.randomUUID()}.mp3`;
  const audioPath = path.join(__dirname, '../audio', filename);
  await textToSpeech(fullResponse, audioPath);

  // 3. Log this session exchange
  await logSession(userId);

  // 4. Return text + audio URL
  res.json({
    correction, explanation, nextQuestion,
    audioUrl: `${process.env.BASE_URL}/audio/${filename}`,
    transcript  // Echo back what user said
  });
});

module.exports = router;

Database setup

database.js

const Database = require('better-sqlite3');
const db = new Database('tutor.db');

db.exec(`
  CREATE TABLE IF NOT EXISTS users (
    id TEXT PRIMARY KEY,
    phone TEXT UNIQUE,
    level TEXT DEFAULT 'beginner',
    is_active INTEGER DEFAULT 0,
    subscription_id TEXT,
    created_at TEXT DEFAULT (datetime('now'))
  );

  CREATE TABLE IF NOT EXISTS sessions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id TEXT,
    date TEXT,
    exchanges INTEGER DEFAULT 0,
    created_at TEXT DEFAULT (datetime('now'))
  );

  CREATE TABLE IF NOT EXISTS phrase_cache (
    phrase_hash TEXT PRIMARY KEY,
    audio_path TEXT,
    created_at TEXT DEFAULT (datetime('now'))
  );
`);

module.exports = db;

Start server with PM2

bash

# Start server and keep it alive
pm2 start server.js --name tutor-api

# Auto-restart on server reboot
pm2 startup && pm2 save

# View logs
pm2 logs tutor-api

# Restart after code changes
pm2 restart tutor-api

09 / Mobile App

Flutter app

The Flutter app handles voice recording, playback, and the learning UI. It communicates only with your Node.js server — never directly with external APIs.

Project setup

bash

flutter create telugu_tutor
cd telugu_tutor

# Add required packages to pubspec.yaml
flutter pub add http
flutter pub add record              # Audio recording
flutter pub add audioplayers        # Audio playback
flutter pub add shared_preferences  # Store user ID locally
flutter pub add permission_handler  # Mic permissions

Android permissions

android/app/src/main/AndroidManifest.xml

<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

Core tutor screen logic

lib/screens/tutor_screen.dart

import 'package:flutter/material.dart';
import 'package:record/record.dart';
import 'package:audioplayers/audioplayers.dart';
import '../services/api_service.dart';

class TutorScreen extends StatefulWidget {
  @override
  _TutorScreenState createState() => _TutorScreenState();
}

class _TutorScreenState extends State<TutorScreen> {
  final _recorder = AudioRecorder();
  final _player = AudioPlayer();
  final _api = ApiService();

  bool _isRecording = false;
  bool _isProcessing = false;
  String _status = 'Hold the button to speak';
  String _transcript = '';
  String _correction = '';

  void _startRecording() async {
    if (await _recorder.hasPermission()) {
      await _recorder.start(RecordConfig(), path: '/tmp/user_audio.m4a');
      setState(() { _isRecording = true; _status = 'Listening...'; });
    }
  }

  void _stopAndProcess() async {
    await _recorder.stop();
    setState(() { _isRecording = false; _isProcessing = true; _status = 'Processing...'; });

    // Send to server: transcribe → correct → generate audio
    final result = await _api.sendAudio('/tmp/user_audio.m4a');

    setState(() {
      _transcript = result['transcript'] ?? '';
      _correction = result['correction'] ?? '';
      _isProcessing = false;
      _status = 'Hold to speak again';
    });

    // Play the AI voice response
    await _player.play(UrlSource(result['audioUrl']));
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      backgroundColor: Color(0xFF0D0F14),
      body: Column(children: [
        /* Status + transcript + correction UI */
        Text(_status),
        if (_transcript.isNotEmpty) Text('You said: $_transcript'),
        if (_correction.isNotEmpty) Text(_correction),
        /* Hold-to-speak button */
        GestureDetector(
          onLongPressStart: (_) => _startRecording(),
          onLongPressEnd: (_) => _stopAndProcess(),
          child: CircleAvatar(radius: 40,
            backgroundColor: _isRecording ? Colors.red : Colors.blue,
            child: Icon(Icons.mic, color: Colors.white)),
        ),
      ]),
    );
  }
}

10 / Features

Grammar correction feature

The core feature. User speaks → Whisper transcribes → Gemini corrects → ElevenLabs speaks the correction back in your voice.

Common Telugu English errors to handle

Add these as examples in your system prompt to help Gemini recognize Telugu-speaker patterns:

Telugu English error	Correct form	Telugu thinking pattern
"I am go to market"	"I am going to market"	Present continuous confusion
"Yesterday I am eating dosa"	"Yesterday I ate dosa"	Past tense using 'am'
"She don't know"	"She doesn't know"	Subject-verb agreement
"I have went"	"I have gone" / "I went"	Perfect tense confusion
"He is more taller"	"He is taller"	Double comparative
"What is your good name?"	"What is your name?"	Direct Telugu translation

Correction response format

Always keep Gemini responses under 60 words total. Long responses = more TTS cost and slower audio generation. The 3-part format (correction ||| explanation ||| next question) keeps it structured and brief.

11 / Features

Telugu → English translation

Users can type or speak in Telugu to get the English translation. This helps when they don't know a word in English.

routes/translate.js

const router = require('express').Router();
const { translateToEnglish } = require('../services/gemini');
const { textToSpeech } = require('../services/tts');
const crypto = require('crypto');
const path = require('path');

router.post('/translate', async (req, res) => {
  const { teluguText } = req.body;

  const englishText = await translateToEnglish(teluguText);

  const filename = `trans_${crypto.randomUUID()}.mp3`;
  const audioPath = path.join(__dirname, '../audio', filename);
  await textToSpeech(englishText, audioPath);

  res.json({
    teluguInput: teluguText,
    englishTranslation: englishText,
    audioUrl: `${process.env.BASE_URL}/audio/${filename}`
  });
});

module.exports = router;

12 / Features

Repeat-after-me practice

The AI says a phrase slowly. User repeats it. Audio is cached — the same phrase never hits ElevenLabs API twice, saving credits.

routes/repeat.js

const router = require('express').Router();
const { textToSpeech } = require('../services/tts');
const db = require('../database');
const crypto = require('crypto');
const path = require('path');
const fs = require('fs');

router.post('/repeat', async (req, res) => {
  const { phrase } = req.body;
  const hash = crypto.createHash('md5').update(phrase).digest('hex');

  // Check if we already have audio for this phrase
  const cached = db.prepare('SELECT audio_path FROM phrase_cache WHERE phrase_hash = ?').get(hash);

  if (cached && fs.existsSync(cached.audio_path)) {
    return res.json({ audioUrl: `${process.env.BASE_URL}/${cached.audio_path}`, cached: true });
  }

  // Generate new audio and cache it
  const filename = `audio/cache/${hash}.mp3`;
  await textToSpeech(phrase, filename);
  db.prepare('INSERT INTO phrase_cache (phrase_hash, audio_path) VALUES (?, ?)').run(hash, filename);

  res.json({ audioUrl: `${process.env.BASE_URL}/${filename}`, cached: false });
});

module.exports = router;

Caching saves money: 50 common phrases × 100 users = 5,000 potential TTS calls per day. With caching, it's just 50 calls once — saving ~4,950 ElevenLabs credits daily.

Starter phrase list

Pre-generate audio for these at app launch to warm the cache:

server/scripts/warm-cache.js — run once on deploy

const phrases = [
  "Good morning, how are you?",
  "My name is [name]. What is your name?",
  "I am going to the market.",
  "Please repeat after me.",
  "Very good! That was correct.",
  "Try again, you can do it.",
  "What did you eat for breakfast?",
  "Speak slowly and clearly.",
];
// POST each to /api/repeat to warm the cache

13 / Features

Session tracking

services/limits.js

const db = require('../database');

const FREE_DAILY_EXCHANGES = 3;   // Free users get 3 exchanges/day
const PAID_DAILY_EXCHANGES = 50;  // Paid users get 50 exchanges/day

async function checkDailyLimit(userId) {
  const today = new Date().toISOString().split('T')[0];
  const user = db.prepare('SELECT is_active FROM users WHERE id = ?').get(userId);

  const session = db.prepare(
    'SELECT exchanges FROM sessions WHERE user_id = ? AND date = ?'
  ).get(userId, today);

  const limit = user?.is_active ? PAID_DAILY_EXCHANGES : FREE_DAILY_EXCHANGES;
  const used = session?.exchanges || 0;

  return used < limit;
}

async function logSession(userId) {
  const today = new Date().toISOString().split('T')[0];
  db.prepare(`
    INSERT INTO sessions (user_id, date, exchanges) VALUES (?, ?, 1)
    ON CONFLICT(user_id, date) DO UPDATE SET exchanges = exchanges + 1
  `).run(userId, today);
}

module.exports = { checkDailyLimit, logSession };

14 / Payments

Razorpay ₹1/day UPI AutoPay

Razorpay is the only Indian payment gateway with UPI AutoPay support for amounts as low as ₹1. Setup takes ~2 days for KYC approval.

Setup steps

Create Razorpay account

Go to razorpay.com → Sign up as individual/business. Submit Aadhaar, PAN, bank account. KYC approval takes 1–2 business days.

Create a Subscription Plan

Dashboard → Products → Subscriptions → Plans → Create Plan. Set: Amount = 100 paise (₹1), Period = daily, Interval = 1.

Get API keys

Dashboard → Settings → API Keys → Generate Key. Copy Key ID and Key Secret to your .env file.

Set webhook URL

Dashboard → Settings → Webhooks → Add Webhook. URL: https://yourdomain.com/webhook/razorpay. Select events: subscription.activated, subscription.charged, subscription.cancelled.

Webhook handler

routes/webhook.js

const router = require('express').Router();
const crypto = require('crypto');
const db = require('../database');

router.post('/razorpay', express.raw({ type: '*/*' }), (req, res) => {
  // Verify webhook signature
  const signature = req.headers['x-razorpay-signature'];
  const secret = process.env.RAZORPAY_WEBHOOK_SECRET;
  const expected = crypto.createHmac('sha256', secret)
    .update(req.body).digest('hex');

  if (signature !== expected) return res.status(400).send('Invalid');

  const event = JSON.parse(req.body);
  const sub = event.payload.subscription.entity;

  if (event.event === 'subscription.activated' || event.event === 'subscription.charged') {
    db.prepare('UPDATE users SET is_active = 1 WHERE subscription_id = ?')
      .run(sub.id);
  }

  if (event.event === 'subscription.cancelled' || event.event === 'subscription.halted') {
    db.prepare('UPDATE users SET is_active = 0 WHERE subscription_id = ?')
      .run(sub.id);
  }

  res.json({ received: true });
});

module.exports = router;

Important: Razorpay UPI AutoPay requires users to set a mandate of ₹30 minimum, even though daily charge is ₹1. This is a NPCI regulation — the mandate is just a permission ceiling, not a fixed charge. Communicate this clearly in your app's payment flow.

15 / Limits

Daily usage limits

User type	Daily exchanges	Sessions	Features
Free (unregistered)	3	1 demo session	Basic conversation only
Paid (₹1/day active)	50	Unlimited	All features including translation, repeat practice
Lapsed (payment failed)	3	1	Prompt to renew subscription

3 free exchanges is deliberate: Enough for the user to experience grammar correction and hear your voice, but not enough to avoid paying. The "hook" moment happens in exchange 1–2 when they hear their mistake corrected in your voice.

16 / Launch

Play Store publishing

Build release APK

bash

# Generate signing key (do this once)
keytool -genkey -v -keystore tutor-key.jks -keyAlias tutor \
  -keyalg RSA -keysize 2048 -validity 10000

# Build release APK
flutter build apk --release

# Or build App Bundle (preferred by Play Store)
flutter build appbundle --release

Play Store checklist

Create Google Play Developer account — $25 one-time fee
Prepare app icon: 512×512 PNG, no alpha, no rounded corners
Create feature graphic: 1024×500 PNG
Take 2–8 screenshots on a Telugu phone showing the learning flow
Write app description in Telugu + English
Set content rating (Education → Everyone)
Add privacy policy URL (required — host a simple HTML page on your server)
Set app as "Education" category
Submit for review — takes 2–7 days for new accounts

Skip Google Play Billing: Do NOT use Google Play's in-app billing for ₹1/day — they take 30% (₹0.30 per user). Use Razorpay payment link via WebView inside the app instead. This is allowed as long as you don't offer a digital good exclusively through Play.

17 / Scaling

Scaling guide

At each user milestone, here's what to change and what to watch.

0–300 users — everything is free

Gemini free tier (1,500 req/day) + Oracle free VM + Whisper.cpp self-hosted. Only cost: ElevenLabs Starter ($5/mo). This stage is fully covered by ~170 paying users at ₹1/day.

300–1,000 users

Upgrade Gemini to pay-as-you-go

Enable billing on Google Cloud. At this scale: ~₹250/month. Still 90%+ margin.

Upgrade ElevenLabs to Creator ($22/mo)

100,000 credits/month. Covers ~600 paid users at 5 sessions/day using Flash model.

Add request queueing

Install bull npm package. Queue Whisper + TTS jobs to prevent Oracle VM overload during peak times.

1,000+ users

Move to Whisper API ($0.003/min)

Self-hosted Whisper.cpp may become a CPU bottleneck. Switch to OpenAI's GPT-4o Mini Transcribe API at this scale.

Migrate SQLite → PostgreSQL

Use Supabase free tier (500 MB) or Railway's $5/mo Postgres. SQLite isn't designed for concurrent writes at this scale.

Add a CDN for audio files

Use Cloudflare (free tier) to cache and serve audio files. Reduces Oracle VM bandwidth and latency.

18 / Reference

Environment variables

Create a .env file in your project root. Never commit this to Git — add it to .gitignore.

GEMINI_API_KEY

From Google AI Studio (aistudio.google.com → Get API Key)

ELEVENLABS_API_KEY

From ElevenLabs dashboard → Profile → API Key

ELEVENLABS_VOICE_ID

Your cloned voice ID from ElevenLabs Voices page

RAZORPAY_KEY_ID

From Razorpay Dashboard → Settings → API Keys

RAZORPAY_KEY_SECRET

From Razorpay Dashboard → Settings → API Keys

RAZORPAY_WEBHOOK_SECRET

From Razorpay Dashboard → Settings → Webhooks → Secret

RAZORPAY_PLAN_ID

Your ₹1/day subscription plan ID from Razorpay

BASE_URL

Your server URL e.g. https://tutor.yourdomain.com

PORT

3000 (or 80 if running behind nginx)

.env example

GEMINI_API_KEY=AIzaSy...
ELEVENLABS_API_KEY=sk_abc123...
ELEVENLABS_VOICE_ID=AbCdEfGhIjKl
RAZORPAY_KEY_ID=rzp_live_...
RAZORPAY_KEY_SECRET=your_secret_here
RAZORPAY_WEBHOOK_SECRET=your_webhook_secret
RAZORPAY_PLAN_ID=plan_xxx
BASE_URL=https://tutor.yourdomain.com
PORT=3000

19 / Reference

Cost tracker

Monthly costs at different user scales. Assumes: 5 sessions/user/day, 7s audio input, 200 char AI response (Flash TTS model).

Service	100 users	300 users	500 users	1000 users
Whisper.cpp (self-hosted)	₹0	₹0	₹0	₹0*
Gemini 2.0 Flash	₹0 (free tier)	₹0 (free tier)	~₹180	~₹360
ElevenLabs (Flash model)	₹415 ($5)	₹415 ($5)	~₹830 ($10)	~₹1,660 ($20)
Oracle Cloud VM	₹0	₹0	₹0	₹0
Razorpay (2%)	₹60	₹180	₹300	₹600
Total monthly cost	~₹475	~₹595	~₹1,310	~₹2,620
Total monthly revenue (₹1/day)	₹3,000	₹9,000	₹15,000	₹30,000
Net profit	₹2,525	₹8,405	₹13,690	₹27,380

* At 1000 users, consider switching Whisper to API ($0.003/min) if Oracle VM CPU exceeds 80% consistently.

20 / Reference

Troubleshooting

Common issues

Whisper returning empty or garbled text

Make sure ffmpeg is converting to 16kHz mono WAV before passing to Whisper. Check with: ffprobe yourfile.wav — should show 16000 Hz, 1 channel.

ElevenLabs audio sounds robotic or rushed

Increase stability to 0.8 and reduce speed to 0.8 in the voice settings. The Flash model is optimized for speed — if quality is poor, try eleven_multilingual_v2 (costs 1 credit/char instead of 0.5).

Gemini returning empty responses

You've hit the free tier limit (1,500 req/day). Either enable billing on Google Cloud or implement a fallback: cache the last 10 Gemini responses and rotate them for repeat users.

Oracle VM runs out of disk space

Audio files accumulate in /audio/. Add a cron job to delete non-cached audio files older than 1 hour:

bash cron (crontab -e)

# Delete old audio files every hour (keep cache folder)
0 * * * * find /home/ubuntu/tutor-app/audio -maxdepth 1 -name "*.mp3" -mmin +60 -delete

Flutter app crashes on audio recording

Most common cause: microphone permission not granted. Add a permission check before starting recording:

dart

final status = await Permission.microphone.request();
if (status != PermissionStatus.granted) {
  // Show dialog asking user to enable mic in settings
}

Razorpay webhook not triggering

Ensure your server's SSL certificate is valid (webhooks require HTTPS). Test the webhook manually from Razorpay Dashboard → Settings → Webhooks → Test. Check PM2 logs: pm2 logs tutor-api.

Users report slow response time (>10 seconds)

The bottleneck is usually Whisper.cpp on the Oracle ARM VM. Switch to ggml-tiny.bin model for faster transcription (~0.5s for 10s audio) or add request queuing with Bull so multiple users don't run Whisper simultaneously.

Monitoring commands

bash — useful daily checks

# Check server is running
pm2 status

# View last 100 log lines
pm2 logs tutor-api --lines 100

# Check disk usage
df -h

# Check Oracle VM CPU/memory
htop

# Count total users in DB
sqlite3 tutor.db "SELECT COUNT(*) FROM users;"

# Count paid users
sqlite3 tutor.db "SELECT COUNT(*) FROM users WHERE is_active = 1;"

# Today's session count
sqlite3 tutor.db "SELECT COUNT(*) FROM sessions WHERE date = date('now');"

# Check audio cache folder size
du -sh ~/tutor-app/audio/

Good luck with your build! Start with the Oracle VM setup (section 04), then Whisper (05), then Gemini (07), then the backend (08). Get the API working end-to-end before touching Flutter. Test with Postman first — it's much faster to debug. Your first real user paying ₹1 will feel incredible.

Build a Telugu EnglishAI Voice Tutor App

How this app works

Conversation loop

1 → AI asks a question (audio)

2 → User speaks in English

3 → Whisper converts speech → text

4 → Gemini corrects grammar

5 → ElevenLabs speaks the response

6 → Loop continues

Complete zero-cost stack

System architecture

API endpoints on your server

Oracle Cloud Always Free VM

VM spec to select

Initial server setup

Open firewall ports

Install SSL (free via Let's Encrypt)

Whisper.cpp — self-hosted STT

Install Whisper.cpp

Model selection guide

Node.js integration

Upload endpoint

ElevenLabs — clone your voice

Step 1 — Record your voice sample

Environment

What to say

Format

Step 2 — Create the clone

Sign up at elevenlabs.io

Upload your recording

Copy your Voice ID

Upgrade to Starter ($5/mo)

Node.js TTS service

Gemini 2.0 Flash — grammar & tutoring

Get your API key

Go to aistudio.google.com

Free tier limits

The tutor system prompt

Grammar correction service

Node.js API server

Project structure

Main server

Main respond endpoint (full pipeline)

Database setup

Start server with PM2

Flutter app

Project setup

Android permissions

Core tutor screen logic

Grammar correction feature

Common Telugu English errors to handle

Correction response format

Telugu → English translation

Repeat-after-me practice

Starter phrase list

Session tracking

Razorpay ₹1/day UPI AutoPay

Setup steps

Create Razorpay account

Create a Subscription Plan

Get API keys

Set webhook URL

Webhook handler

Daily usage limits

Play Store publishing

Build release APK

Play Store checklist

Scaling guide

0–300 users — everything is free

300–1,000 users

Upgrade Gemini to pay-as-you-go

Upgrade ElevenLabs to Creator ($22/mo)

Add request queueing

1,000+ users

Move to Whisper API ($0.003/min)

Migrate SQLite → PostgreSQL

Add a CDN for audio files

Environment variables

Cost tracker

Troubleshooting

Build a Telugu English
AI Voice Tutor App