Complete Implementation Guide

Build a Telugu English
AI Voice Tutor App

A step-by-step developer guide to build, deploy, and manage a voice-first English learning app for Telugu speakers — running on a zero-cost infrastructure stack, monetised at ₹1 per user per day.

Cost to run: ₹0 until 300+ users
Revenue model: ₹1/user/day UPI AutoPay
Voice: Your cloned voice via ElevenLabs
Target: Telugu speakers learning English

How this app works

The app is a voice conversation loop. The AI (in your cloned voice) asks English questions in a Telugu-friendly style. The user responds by voice. The AI corrects grammar, gives the improved sentence back as audio, and continues the lesson.

API cost / user / day
₹0.08–0.15
at 5 sessions/day
Revenue / user / day
₹0.98
after Razorpay 2%
Profit margin
~83%
at MVP scale
Free user limit
~300
before any API cost

Conversation loop

1 → AI asks a question (audio)

Your cloned voice speaks a question in slow, clear English. E.g. "Tell me what you did this morning."

2 → User speaks in English

User holds mic button, speaks their answer (5–15 seconds). Audio recorded on device.

3 → Whisper converts speech → text

Audio sent to your server. Whisper.cpp transcribes it. Result: raw English text from user.

4 → Gemini corrects grammar

Gemini returns: corrected sentence + short explanation + next question. All in 2–3 sentences max.

5 → ElevenLabs speaks the response

Corrected text sent to ElevenLabs using your cloned voice ID. Audio returned to app and played back.

6 → Loop continues

Session ends after 5 exchanges. Progress saved. Daily limit enforced by subscription status.

Complete zero-cost stack

Every component chosen for maximum free tier generosity and minimum maintenance burden for a solo developer.

LayerServiceCostWhy this choice
STTWhisper.cpp (self-hosted)$0 foreverNo per-minute cost. Runs on Oracle free VM. Best Telugu-accent English accuracy.
LLMGoogle Gemini 2.0 FlashFree ≤300 users1,500 req/day free. Understands Indian English context. Strong multilingual tokenizer.
TTS + Voice CloneElevenLabs$5/mo StarterOnly paid component. Clones your voice from 1 min recording. Flash model = 0.5 credit/char.
HostingOracle Cloud Always Free$0 forever4 ARM CPUs, 24 GB RAM, 200 GB disk. Runs Node.js + Whisper simultaneously.
DatabaseSQLite (on Oracle VM)$0Simple, zero-config, perfect for single-server MVP. Upgrade to Postgres at 1000+ users.
Mobile AppFlutter$0Single codebase → Android + iOS. Dart is easy to learn. Excellent audio recording support.
PaymentsRazorpay UPI AutoPay2% per txnOnly Indian payment gateway supporting ₹1/day recurring UPI. ₹0 setup fee.
Play StoreGoogle Play₹1,500 one-timeOne-time fee. No yearly renewal for Android. Launch here before iOS.
Total monthly cost at launch (0–300 users): ₹415/month (ElevenLabs Starter $5). Everything else is free. At 170 paying users, ElevenLabs cost is covered.

System architecture

The architecture is intentionally simple — a single Node.js server on Oracle Cloud handles all API orchestration. The Flutter app talks only to your server, never directly to external APIs.

FLUTTER APP (Android/iOS)

User Device
─── HTTPS ──▶
Oracle VM
Node.js API
──▶
Whisper.cpp
local process
↓ transcribed text
Node.js
──▶
Gemini API
grammar + response
↓ corrected text
Node.js
──▶
ElevenLabs API
your cloned voice
↓ audio bytes ──────────────────────────────▶ Flutter plays audio

ALSO ON ORACLE VM:

SQLite DB
users, sessions, limits
Razorpay webhook
subscription status
/tmp audio cache
repeat-after-me

API endpoints on your server

EndpointMethodWhat it does
POST /api/transcribePOSTReceives audio file, returns transcribed text via Whisper.cpp
POST /api/respondPOSTReceives transcribed text, returns grammar correction + audio URL
POST /api/translatePOSTTelugu → English translation via Gemini
GET /api/sessionGETReturns today's session count for a user
POST /api/repeatPOSTGenerates or serves cached audio for repeat-after-me phrases
POST /webhook/razorpayPOSTHandles payment events — activates/deactivates subscriptions

Oracle Cloud Always Free VM

Oracle's Always Free tier gives you a powerful ARM VM at zero cost forever. This single server runs your entire backend.

Sign up at: cloud.oracle.com — Use a valid credit card (required for verification, never charged for Always Free resources). Select region closest to India: Mumbai (ap-mumbai-1).

VM spec to select

Shape
VM.Standard.A1
ARM Ampere
OCPUs
4
Always free limit
RAM
24 GB
Always free limit
Storage
200 GB
Boot + block

Initial server setup

bash — SSH into your Oracle VM
# Update system
sudo apt update && sudo apt upgrade -y

# Install Node.js 20 LTS
curl -fsSL https://deb.nodesource.com/setup_20.x | sudo -E bash -
sudo apt-get install -y nodejs

# Install build tools (needed for Whisper.cpp)
sudo apt install -y build-essential cmake git ffmpeg

# Install PM2 (keeps Node.js running after SSH disconnect)
sudo npm install -g pm2

# Create app directory
mkdir ~/tutor-app && cd ~/tutor-app
npm init -y
npm install express multer axios dotenv better-sqlite3 node-fetch form-data

Open firewall ports

In Oracle Console → Networking → Security Lists → add ingress rules:

Oracle Security List rules
Port 22   TCP   0.0.0.0/0   # SSH (already open)
Port 80   TCP   0.0.0.0/0   # HTTP
Port 443  TCP   0.0.0.0/0   # HTTPS
Port 3000 TCP   0.0.0.0/0   # Node.js (dev only, close in prod)
bash — also open in Ubuntu firewall
sudo iptables -I INPUT -p tcp --dport 80 -j ACCEPT
sudo iptables -I INPUT -p tcp --dport 443 -j ACCEPT
sudo iptables -I INPUT -p tcp --dport 3000 -j ACCEPT
sudo netfilter-persistent save

Install SSL (free via Let's Encrypt)

Important: Your Flutter app requires HTTPS. Get a free domain from freenom.com or buy a cheap one (~₹100/year on Namecheap) and point it to your Oracle VM's public IP.
bash — install Certbot
sudo apt install certbot nginx -y
sudo certbot --nginx -d yourdomain.com
# Auto-renews every 90 days via cron

Whisper.cpp — self-hosted STT

Whisper.cpp is a C++ port of OpenAI Whisper. It runs efficiently on CPU, making it perfect for the Oracle ARM VM with no GPU needed.

Install Whisper.cpp

bash
cd ~
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp

# Build (ARM optimized)
make -j4

# Download the 'small' model (~244 MB) — best speed/accuracy tradeoff
bash ./models/download-ggml-model.sh small

# Test it works
./main -m models/ggml-small.bin -f samples/jfk.wav

Model selection guide

ModelSizeSpeed on Oracle VMAccuracyRecommendation
tiny39 MB~0.5s per 10s audioGoodMVP testing only
small244 MB~2s per 10s audioVery goodUse this
medium769 MB~5s per 10s audioExcellentWhen users > 500

Node.js integration

server/services/whisper.js
const { exec } = require('child_process');
const path = require('path');
const fs = require('fs');

const WHISPER_BIN = '/home/ubuntu/whisper.cpp/main';
const WHISPER_MODEL = '/home/ubuntu/whisper.cpp/models/ggml-small.bin';

async function transcribeAudio(audioFilePath) {
  // Convert to WAV 16kHz mono (Whisper requirement)
  const wavPath = audioFilePath.replace(/\.\w+$/, '_16k.wav');
  
  await runCommand(
    `ffmpeg -i ${audioFilePath} -ar 16000 -ac 1 -c:a pcm_s16le ${wavPath} -y`
  );

  const result = await runCommand(
    `${WHISPER_BIN} -m ${WHISPER_MODEL} -f ${wavPath} -nt -l en`
  );

  // Cleanup temp files
  fs.unlinkSync(audioFilePath);
  fs.unlinkSync(wavPath);

  return result.trim();
}

function runCommand(cmd) {
  return new Promise((resolve, reject) => {
    exec(cmd, { maxBuffer: 10 * 1024 * 1024 }, (err, stdout, stderr) => {
      if (err) return reject(err);
      resolve(stdout || stderr);
    });
  });
}

module.exports = { transcribeAudio };

Upload endpoint

server/routes/transcribe.js
const express = require('express');
const multer = require('multer');
const { transcribeAudio } = require('../services/whisper');
const { checkDailyLimit } = require('../services/limits');

const router = express.Router();
const upload = multer({ dest: '/tmp/audio/', limits: { fileSize: 5 * 1024 * 1024 } });

router.post('/transcribe', upload.single('audio'), async (req, res) => {
  const { userId } = req.body;

  const allowed = await checkDailyLimit(userId);
  if (!allowed) return res.status(429).json({ error: 'Daily limit reached. Please pay ₹1 to continue.' });

  const text = await transcribeAudio(req.file.path);
  res.json({ transcript: text });
});

module.exports = router;

ElevenLabs — clone your voice

This is the most important step. Your cloned voice makes the AI feel personal and trustworthy to Telugu learners. Record once, use forever.

Step 1 — Record your voice sample

Environment

Sit in a quiet room. Use your phone mic close to your mouth. No background noise, no fan sounds.

What to say

Read 2–3 minutes of diverse English sentences. Include questions, statements, and some Telugu-style sentences like "I am going to market now." Speak clearly but naturally — this becomes your AI voice.

Format

Save as MP3 or WAV. File size should be 2–10 MB. Longer recordings = better voice quality.

Step 2 — Create the clone

Sign up at elevenlabs.io

Free account. Navigate to Voices → Add Voice → Instant Voice Cloning.

Upload your recording

Upload your MP3. Name it something like "Telugu Tutor - [YourName]". Click Save.

Copy your Voice ID

After saving, click the voice → you'll see a Voice ID like AbCdEfGhIjKlMnOpQrSt. Copy this — you'll need it in the server .env file.

Upgrade to Starter ($5/mo)

Required for commercial use. Gives 30,000 credits/month (~30 min of audio). This covers ~200 users at 5 sessions/day.

Node.js TTS service

server/services/tts.js
const axios = require('axios');
const fs = require('fs');
const path = require('path');

const VOICE_ID = process.env.ELEVENLABS_VOICE_ID;
const API_KEY = process.env.ELEVENLABS_API_KEY;

async function textToSpeech(text, outputPath) {
  const response = await axios.post(
    `https://api.elevenlabs.io/v1/text-to-speech/${VOICE_ID}`,
    {
      text,
      model_id: 'eleven_flash_v2_5',  // Flash = 0.5 credits/char (cheapest)
      voice_settings: {
        stability: 0.75,          // Higher = more consistent, less expressive
        similarity_boost: 0.85,   // How closely to match your cloned voice
        speed: 0.85               // Slightly slower for learners
      }
    },
    {
      headers: { 'xi-api-key': API_KEY, 'Content-Type': 'application/json' },
      responseType: 'arraybuffer'
    }
  );

  fs.writeFileSync(outputPath, response.data);
  return outputPath;
}

module.exports = { textToSpeech };
Cost tip: Always use eleven_flash_v2_5 model (not multilingual_v2). Flash costs 0.5 credits/char instead of 1, halving your TTS bill with no quality difference for short conversational sentences.

Gemini 2.0 Flash — grammar & tutoring

Gemini handles all the "AI thinking" — grammar correction, generating the tutor's next question, and Telugu translation. The free tier handles ~300 users before any cost.

Get your API key

Go to aistudio.google.com

Sign in with Google. Click "Get API Key" → Create API key in new project. Copy the key.

Free tier limits

1,500 requests/day, 1 million tokens/minute. This supports ~300 users doing 5 sessions each per day for free.

The tutor system prompt

This is the most important piece. Your prompt shapes the entire learning experience.

server/prompts/tutor.js
module.exports = function getTutorPrompt(level = 'beginner') {
  return `You are an English tutor for Telugu-speaking beginners in India.

ROLE:
- You speak like a friendly local teacher, not a formal professor
- You understand that the student thinks in Telugu and translates mentally
- You are patient, warm, and encouraging

RULES:
- Always respond in exactly 3 parts, separated by ||| delimiter:
  1. CORRECTION: Fix the student's grammar mistake in 1 sentence. If correct, say "Perfect! Your sentence is correct."
  2. EXPLANATION: Explain the rule simply, like talking to a 10-year-old. Use examples.
  3. NEXT_QUESTION: Ask a simple follow-up question to continue the conversation.
- Keep ALL 3 parts together under 60 words total
- Use simple Indian English examples (market, auto, tiffin, etc.)
- NEVER use complex grammar terminology
- If the student said something in Telugu, gently ask them to try in English

LEVEL: ${level} (beginner = simple present/past tense only)

Example output:
"You should say 'I went to market' not 'I am go to market.'|||
We use 'went' for past actions, like 'I ate dosa', 'I slept early.'|||
What did you eat for breakfast today?"`;
};

Grammar correction service

server/services/gemini.js
const { GoogleGenerativeAI } = require('@google/generative-ai');
const getTutorPrompt = require('../prompts/tutor');

const genAI = new GoogleGenerativeAI(process.env.GEMINI_API_KEY);

async function correctAndRespond(userText, userLevel = 'beginner') {
  const model = genAI.getGenerativeModel({
    model: 'gemini-2.0-flash',
    systemInstruction: getTutorPrompt(userLevel),
    generationConfig: { maxOutputTokens: 120, temperature: 0.4 }
  });

  const result = await model.generateContent(userText);
  const raw = result.response.text();

  const [correction, explanation, nextQuestion] = raw.split('|||').map(s => s.trim());

  return { correction, explanation, nextQuestion,
    fullResponse: `${correction} ${explanation} ${nextQuestion}` };
}

async function translateToEnglish(teluguText) {
  const model = genAI.getGenerativeModel({ model: 'gemini-2.0-flash' });
  const result = await model.generateContent(
    `Translate this Telugu text to simple English. Return ONLY the translation, nothing else: ${teluguText}`
  );
  return result.response.text().trim();
}

module.exports = { correctAndRespond, translateToEnglish };
bash — install Gemini SDK
npm install @google/generative-ai

Node.js API server

The complete Express server that ties all services together — receiving audio, orchestrating Whisper → Gemini → ElevenLabs, and returning a single audio response to the app.

Project structure

directory structure
tutor-app/
├── server.js              # Main entry point
├── .env                   # API keys (never commit this)
├── database.js            # SQLite setup
├── routes/
│   ├── transcribe.js      # POST /api/transcribe
│   ├── respond.js         # POST /api/respond
│   ├── translate.js       # POST /api/translate
│   ├── repeat.js          # POST /api/repeat
│   └── webhook.js         # POST /webhook/razorpay
├── services/
│   ├── whisper.js         # Whisper.cpp wrapper
│   ├── gemini.js          # Gemini API
│   ├── tts.js             # ElevenLabs TTS
│   └── limits.js          # Daily session limits
├── prompts/
│   └── tutor.js           # System prompts
└── audio/
    └── cache/             # Cached repeat-after-me audio files

Main server

server.js
require('dotenv').config();
const express = require('express');
const app = express();

app.use(express.json());
app.use('/audio', express.static('audio'));  # Serve audio files

app.use('/api', require('./routes/transcribe'));
app.use('/api', require('./routes/respond'));
app.use('/api', require('./routes/translate'));
app.use('/api', require('./routes/repeat'));
app.use('/webhook', require('./routes/webhook'));

app.listen(3000, () => console.log('Tutor API running on :3000'));

Main respond endpoint (full pipeline)

routes/respond.js
const router = require('express').Router();
const { correctAndRespond } = require('../services/gemini');
const { textToSpeech } = require('../services/tts');
const { logSession } = require('../services/limits');
const path = require('path');
const crypto = require('crypto');

router.post('/respond', async (req, res) => {
  const { userId, transcript, level } = req.body;

  // 1. Get grammar correction + next question from Gemini
  const { correction, explanation, nextQuestion, fullResponse }
    = await correctAndRespond(transcript, level);

  // 2. Convert response to audio using your cloned voice
  const filename = `${crypto.randomUUID()}.mp3`;
  const audioPath = path.join(__dirname, '../audio', filename);
  await textToSpeech(fullResponse, audioPath);

  // 3. Log this session exchange
  await logSession(userId);

  // 4. Return text + audio URL
  res.json({
    correction, explanation, nextQuestion,
    audioUrl: `${process.env.BASE_URL}/audio/${filename}`,
    transcript  // Echo back what user said
  });
});

module.exports = router;

Database setup

database.js
const Database = require('better-sqlite3');
const db = new Database('tutor.db');

db.exec(`
  CREATE TABLE IF NOT EXISTS users (
    id TEXT PRIMARY KEY,
    phone TEXT UNIQUE,
    level TEXT DEFAULT 'beginner',
    is_active INTEGER DEFAULT 0,
    subscription_id TEXT,
    created_at TEXT DEFAULT (datetime('now'))
  );

  CREATE TABLE IF NOT EXISTS sessions (
    id INTEGER PRIMARY KEY AUTOINCREMENT,
    user_id TEXT,
    date TEXT,
    exchanges INTEGER DEFAULT 0,
    created_at TEXT DEFAULT (datetime('now'))
  );

  CREATE TABLE IF NOT EXISTS phrase_cache (
    phrase_hash TEXT PRIMARY KEY,
    audio_path TEXT,
    created_at TEXT DEFAULT (datetime('now'))
  );
`);

module.exports = db;

Start server with PM2

bash
# Start server and keep it alive
pm2 start server.js --name tutor-api

# Auto-restart on server reboot
pm2 startup && pm2 save

# View logs
pm2 logs tutor-api

# Restart after code changes
pm2 restart tutor-api

Flutter app

The Flutter app handles voice recording, playback, and the learning UI. It communicates only with your Node.js server — never directly with external APIs.

Project setup

bash
flutter create telugu_tutor
cd telugu_tutor

# Add required packages to pubspec.yaml
flutter pub add http
flutter pub add record              # Audio recording
flutter pub add audioplayers        # Audio playback
flutter pub add shared_preferences  # Store user ID locally
flutter pub add permission_handler  # Mic permissions

Android permissions

android/app/src/main/AndroidManifest.xml
<uses-permission android:name="android.permission.RECORD_AUDIO" />
<uses-permission android:name="android.permission.INTERNET" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />

Core tutor screen logic

lib/screens/tutor_screen.dart
import 'package:flutter/material.dart';
import 'package:record/record.dart';
import 'package:audioplayers/audioplayers.dart';
import '../services/api_service.dart';

class TutorScreen extends StatefulWidget {
  @override
  _TutorScreenState createState() => _TutorScreenState();
}

class _TutorScreenState extends State<TutorScreen> {
  final _recorder = AudioRecorder();
  final _player = AudioPlayer();
  final _api = ApiService();

  bool _isRecording = false;
  bool _isProcessing = false;
  String _status = 'Hold the button to speak';
  String _transcript = '';
  String _correction = '';

  void _startRecording() async {
    if (await _recorder.hasPermission()) {
      await _recorder.start(RecordConfig(), path: '/tmp/user_audio.m4a');
      setState(() { _isRecording = true; _status = 'Listening...'; });
    }
  }

  void _stopAndProcess() async {
    await _recorder.stop();
    setState(() { _isRecording = false; _isProcessing = true; _status = 'Processing...'; });

    // Send to server: transcribe → correct → generate audio
    final result = await _api.sendAudio('/tmp/user_audio.m4a');

    setState(() {
      _transcript = result['transcript'] ?? '';
      _correction = result['correction'] ?? '';
      _isProcessing = false;
      _status = 'Hold to speak again';
    });

    // Play the AI voice response
    await _player.play(UrlSource(result['audioUrl']));
  }

  @override
  Widget build(BuildContext context) {
    return Scaffold(
      backgroundColor: Color(0xFF0D0F14),
      body: Column(children: [
        /* Status + transcript + correction UI */
        Text(_status),
        if (_transcript.isNotEmpty) Text('You said: $_transcript'),
        if (_correction.isNotEmpty) Text(_correction),
        /* Hold-to-speak button */
        GestureDetector(
          onLongPressStart: (_) => _startRecording(),
          onLongPressEnd: (_) => _stopAndProcess(),
          child: CircleAvatar(radius: 40,
            backgroundColor: _isRecording ? Colors.red : Colors.blue,
            child: Icon(Icons.mic, color: Colors.white)),
        ),
      ]),
    );
  }
}

Grammar correction feature

The core feature. User speaks → Whisper transcribes → Gemini corrects → ElevenLabs speaks the correction back in your voice.

Common Telugu English errors to handle

Add these as examples in your system prompt to help Gemini recognize Telugu-speaker patterns:

Telugu English errorCorrect formTelugu thinking pattern
"I am go to market""I am going to market"Present continuous confusion
"Yesterday I am eating dosa""Yesterday I ate dosa"Past tense using 'am'
"She don't know""She doesn't know"Subject-verb agreement
"I have went""I have gone" / "I went"Perfect tense confusion
"He is more taller""He is taller"Double comparative
"What is your good name?""What is your name?"Direct Telugu translation

Correction response format

Always keep Gemini responses under 60 words total. Long responses = more TTS cost and slower audio generation. The 3-part format (correction ||| explanation ||| next question) keeps it structured and brief.

Telugu → English translation

Users can type or speak in Telugu to get the English translation. This helps when they don't know a word in English.

routes/translate.js
const router = require('express').Router();
const { translateToEnglish } = require('../services/gemini');
const { textToSpeech } = require('../services/tts');
const crypto = require('crypto');
const path = require('path');

router.post('/translate', async (req, res) => {
  const { teluguText } = req.body;

  const englishText = await translateToEnglish(teluguText);

  const filename = `trans_${crypto.randomUUID()}.mp3`;
  const audioPath = path.join(__dirname, '../audio', filename);
  await textToSpeech(englishText, audioPath);

  res.json({
    teluguInput: teluguText,
    englishTranslation: englishText,
    audioUrl: `${process.env.BASE_URL}/audio/${filename}`
  });
});

module.exports = router;

Repeat-after-me practice

The AI says a phrase slowly. User repeats it. Audio is cached — the same phrase never hits ElevenLabs API twice, saving credits.

routes/repeat.js
const router = require('express').Router();
const { textToSpeech } = require('../services/tts');
const db = require('../database');
const crypto = require('crypto');
const path = require('path');
const fs = require('fs');

router.post('/repeat', async (req, res) => {
  const { phrase } = req.body;
  const hash = crypto.createHash('md5').update(phrase).digest('hex');

  // Check if we already have audio for this phrase
  const cached = db.prepare('SELECT audio_path FROM phrase_cache WHERE phrase_hash = ?').get(hash);

  if (cached && fs.existsSync(cached.audio_path)) {
    return res.json({ audioUrl: `${process.env.BASE_URL}/${cached.audio_path}`, cached: true });
  }

  // Generate new audio and cache it
  const filename = `audio/cache/${hash}.mp3`;
  await textToSpeech(phrase, filename);
  db.prepare('INSERT INTO phrase_cache (phrase_hash, audio_path) VALUES (?, ?)').run(hash, filename);

  res.json({ audioUrl: `${process.env.BASE_URL}/${filename}`, cached: false });
});

module.exports = router;
Caching saves money: 50 common phrases × 100 users = 5,000 potential TTS calls per day. With caching, it's just 50 calls once — saving ~4,950 ElevenLabs credits daily.

Starter phrase list

Pre-generate audio for these at app launch to warm the cache:

server/scripts/warm-cache.js — run once on deploy
const phrases = [
  "Good morning, how are you?",
  "My name is [name]. What is your name?",
  "I am going to the market.",
  "Please repeat after me.",
  "Very good! That was correct.",
  "Try again, you can do it.",
  "What did you eat for breakfast?",
  "Speak slowly and clearly.",
];
// POST each to /api/repeat to warm the cache

Session tracking

services/limits.js
const db = require('../database');

const FREE_DAILY_EXCHANGES = 3;   // Free users get 3 exchanges/day
const PAID_DAILY_EXCHANGES = 50;  // Paid users get 50 exchanges/day

async function checkDailyLimit(userId) {
  const today = new Date().toISOString().split('T')[0];
  const user = db.prepare('SELECT is_active FROM users WHERE id = ?').get(userId);

  const session = db.prepare(
    'SELECT exchanges FROM sessions WHERE user_id = ? AND date = ?'
  ).get(userId, today);

  const limit = user?.is_active ? PAID_DAILY_EXCHANGES : FREE_DAILY_EXCHANGES;
  const used = session?.exchanges || 0;

  return used < limit;
}

async function logSession(userId) {
  const today = new Date().toISOString().split('T')[0];
  db.prepare(`
    INSERT INTO sessions (user_id, date, exchanges) VALUES (?, ?, 1)
    ON CONFLICT(user_id, date) DO UPDATE SET exchanges = exchanges + 1
  `).run(userId, today);
}

module.exports = { checkDailyLimit, logSession };

Razorpay ₹1/day UPI AutoPay

Razorpay is the only Indian payment gateway with UPI AutoPay support for amounts as low as ₹1. Setup takes ~2 days for KYC approval.

Setup steps

Create Razorpay account

Go to razorpay.com → Sign up as individual/business. Submit Aadhaar, PAN, bank account. KYC approval takes 1–2 business days.

Create a Subscription Plan

Dashboard → Products → Subscriptions → Plans → Create Plan. Set: Amount = 100 paise (₹1), Period = daily, Interval = 1.

Get API keys

Dashboard → Settings → API Keys → Generate Key. Copy Key ID and Key Secret to your .env file.

Set webhook URL

Dashboard → Settings → Webhooks → Add Webhook. URL: https://yourdomain.com/webhook/razorpay. Select events: subscription.activated, subscription.charged, subscription.cancelled.

Webhook handler

routes/webhook.js
const router = require('express').Router();
const crypto = require('crypto');
const db = require('../database');

router.post('/razorpay', express.raw({ type: '*/*' }), (req, res) => {
  // Verify webhook signature
  const signature = req.headers['x-razorpay-signature'];
  const secret = process.env.RAZORPAY_WEBHOOK_SECRET;
  const expected = crypto.createHmac('sha256', secret)
    .update(req.body).digest('hex');

  if (signature !== expected) return res.status(400).send('Invalid');

  const event = JSON.parse(req.body);
  const sub = event.payload.subscription.entity;

  if (event.event === 'subscription.activated' || event.event === 'subscription.charged') {
    db.prepare('UPDATE users SET is_active = 1 WHERE subscription_id = ?')
      .run(sub.id);
  }

  if (event.event === 'subscription.cancelled' || event.event === 'subscription.halted') {
    db.prepare('UPDATE users SET is_active = 0 WHERE subscription_id = ?')
      .run(sub.id);
  }

  res.json({ received: true });
});

module.exports = router;
Important: Razorpay UPI AutoPay requires users to set a mandate of ₹30 minimum, even though daily charge is ₹1. This is a NPCI regulation — the mandate is just a permission ceiling, not a fixed charge. Communicate this clearly in your app's payment flow.

Daily usage limits

User typeDaily exchangesSessionsFeatures
Free (unregistered)31 demo sessionBasic conversation only
Paid (₹1/day active)50UnlimitedAll features including translation, repeat practice
Lapsed (payment failed)31Prompt to renew subscription
3 free exchanges is deliberate: Enough for the user to experience grammar correction and hear your voice, but not enough to avoid paying. The "hook" moment happens in exchange 1–2 when they hear their mistake corrected in your voice.

Play Store publishing

Build release APK

bash
# Generate signing key (do this once)
keytool -genkey -v -keystore tutor-key.jks -keyAlias tutor \
  -keyalg RSA -keysize 2048 -validity 10000

# Build release APK
flutter build apk --release

# Or build App Bundle (preferred by Play Store)
flutter build appbundle --release

Play Store checklist

Skip Google Play Billing: Do NOT use Google Play's in-app billing for ₹1/day — they take 30% (₹0.30 per user). Use Razorpay payment link via WebView inside the app instead. This is allowed as long as you don't offer a digital good exclusively through Play.

Scaling guide

At each user milestone, here's what to change and what to watch.

0–300 users — everything is free

Gemini free tier (1,500 req/day) + Oracle free VM + Whisper.cpp self-hosted. Only cost: ElevenLabs Starter ($5/mo). This stage is fully covered by ~170 paying users at ₹1/day.

300–1,000 users

Upgrade Gemini to pay-as-you-go

Enable billing on Google Cloud. At this scale: ~₹250/month. Still 90%+ margin.

Upgrade ElevenLabs to Creator ($22/mo)

100,000 credits/month. Covers ~600 paid users at 5 sessions/day using Flash model.

Add request queueing

Install bull npm package. Queue Whisper + TTS jobs to prevent Oracle VM overload during peak times.

1,000+ users

Move to Whisper API ($0.003/min)

Self-hosted Whisper.cpp may become a CPU bottleneck. Switch to OpenAI's GPT-4o Mini Transcribe API at this scale.

Migrate SQLite → PostgreSQL

Use Supabase free tier (500 MB) or Railway's $5/mo Postgres. SQLite isn't designed for concurrent writes at this scale.

Add a CDN for audio files

Use Cloudflare (free tier) to cache and serve audio files. Reduces Oracle VM bandwidth and latency.

Environment variables

Create a .env file in your project root. Never commit this to Git — add it to .gitignore.

GEMINI_API_KEY
From Google AI Studio (aistudio.google.com → Get API Key)
ELEVENLABS_API_KEY
From ElevenLabs dashboard → Profile → API Key
ELEVENLABS_VOICE_ID
Your cloned voice ID from ElevenLabs Voices page
RAZORPAY_KEY_ID
From Razorpay Dashboard → Settings → API Keys
RAZORPAY_KEY_SECRET
From Razorpay Dashboard → Settings → API Keys
RAZORPAY_WEBHOOK_SECRET
From Razorpay Dashboard → Settings → Webhooks → Secret
RAZORPAY_PLAN_ID
Your ₹1/day subscription plan ID from Razorpay
BASE_URL
Your server URL e.g. https://tutor.yourdomain.com
PORT
3000 (or 80 if running behind nginx)
.env example
GEMINI_API_KEY=AIzaSy...
ELEVENLABS_API_KEY=sk_abc123...
ELEVENLABS_VOICE_ID=AbCdEfGhIjKl
RAZORPAY_KEY_ID=rzp_live_...
RAZORPAY_KEY_SECRET=your_secret_here
RAZORPAY_WEBHOOK_SECRET=your_webhook_secret
RAZORPAY_PLAN_ID=plan_xxx
BASE_URL=https://tutor.yourdomain.com
PORT=3000

Cost tracker

Monthly costs at different user scales. Assumes: 5 sessions/user/day, 7s audio input, 200 char AI response (Flash TTS model).

Service100 users300 users500 users1000 users
Whisper.cpp (self-hosted)₹0₹0₹0₹0*
Gemini 2.0 Flash₹0 (free tier)₹0 (free tier)~₹180~₹360
ElevenLabs (Flash model)₹415 ($5)₹415 ($5)~₹830 ($10)~₹1,660 ($20)
Oracle Cloud VM₹0₹0₹0₹0
Razorpay (2%)₹60₹180₹300₹600
Total monthly cost~₹475~₹595~₹1,310~₹2,620
Total monthly revenue (₹1/day)₹3,000₹9,000₹15,000₹30,000
Net profit₹2,525₹8,405₹13,690₹27,380

* At 1000 users, consider switching Whisper to API ($0.003/min) if Oracle VM CPU exceeds 80% consistently.

Troubleshooting

Common issues

Whisper returning empty or garbled text

Make sure ffmpeg is converting to 16kHz mono WAV before passing to Whisper. Check with: ffprobe yourfile.wav — should show 16000 Hz, 1 channel.

ElevenLabs audio sounds robotic or rushed

Increase stability to 0.8 and reduce speed to 0.8 in the voice settings. The Flash model is optimized for speed — if quality is poor, try eleven_multilingual_v2 (costs 1 credit/char instead of 0.5).

Gemini returning empty responses

You've hit the free tier limit (1,500 req/day). Either enable billing on Google Cloud or implement a fallback: cache the last 10 Gemini responses and rotate them for repeat users.

Oracle VM runs out of disk space

Audio files accumulate in /audio/. Add a cron job to delete non-cached audio files older than 1 hour:

bash cron (crontab -e)
# Delete old audio files every hour (keep cache folder)
0 * * * * find /home/ubuntu/tutor-app/audio -maxdepth 1 -name "*.mp3" -mmin +60 -delete

Flutter app crashes on audio recording

Most common cause: microphone permission not granted. Add a permission check before starting recording:

dart
final status = await Permission.microphone.request();
if (status != PermissionStatus.granted) {
  // Show dialog asking user to enable mic in settings
}

Razorpay webhook not triggering

Ensure your server's SSL certificate is valid (webhooks require HTTPS). Test the webhook manually from Razorpay Dashboard → Settings → Webhooks → Test. Check PM2 logs: pm2 logs tutor-api.

Users report slow response time (>10 seconds)

The bottleneck is usually Whisper.cpp on the Oracle ARM VM. Switch to ggml-tiny.bin model for faster transcription (~0.5s for 10s audio) or add request queuing with Bull so multiple users don't run Whisper simultaneously.


Monitoring commands

bash — useful daily checks
# Check server is running
pm2 status

# View last 100 log lines
pm2 logs tutor-api --lines 100

# Check disk usage
df -h

# Check Oracle VM CPU/memory
htop

# Count total users in DB
sqlite3 tutor.db "SELECT COUNT(*) FROM users;"

# Count paid users
sqlite3 tutor.db "SELECT COUNT(*) FROM users WHERE is_active = 1;"

# Today's session count
sqlite3 tutor.db "SELECT COUNT(*) FROM sessions WHERE date = date('now');"

# Check audio cache folder size
du -sh ~/tutor-app/audio/
Good luck with your build! Start with the Oracle VM setup (section 04), then Whisper (05), then Gemini (07), then the backend (08). Get the API working end-to-end before touching Flutter. Test with Postman first — it's much faster to debug. Your first real user paying ₹1 will feel incredible.