The Vision
The oral examination — the viva — is the most human assessment in academia. It is also the most broken. A student who studies for three years walks into a room for ten minutes. What happens in those ten minutes depends not on what they know but on who is sitting across from them.
TWELVE ends that.
TWELVE is a kiosk-based AI examination system installed on college-owned computers. It is not a replacement for education. It is a replacement for the single most biased moment in education — the oral viva conducted by a human examiner who has opinions, moods, favourites, and blind spots.
We are not anti-teacher. We are anti-bias.
What TWELVE Does
- Identifies the student before the session begins
- Retrieves their submission history
- Conducts a structured oral examination through voice and text
- Evaluates responses against the college's own rubric
- Produces a final score with a complete, auditable transcript
Every student. Same quality of questioning. Same standard of evaluation. Score reflects only what they know.
The Name
Twelve is a number associated with completeness. Twelve months in a year. Twelve hours on a clock face. Twelve members on a jury. TWELVE the system is exactly that: a complete, full-cycle assessment that misses nothing, favours no one, and produces a judgment that stands.
The Core Problem
The oral examination system in Indian colleges is broken at three levels simultaneously. Each villain hurts a different group. All three are enabled by the same root cause: there is no system.
What they do: Assign marks based on favouritism, personal relationships, attendance records, perceived attitude, or simple dislike.
Who gets hurt: The sincere student — the one who studied, understood the material, but was never given space to properly demonstrate knowledge.
Why devastating: Unlike written exams, a viva leaves no record. No transcript. No evidence. The student has no way to prove what they said or challenge what they were marked.
What they do: Submit AI-generated or copy-pasted work without understanding. Prepare scripted answers for likely questions. Walk out with full marks because the examiner didn't drill deep enough.
Why solvable: A dishonest student cannot answer a question they didn't prepare for. TWELVE asks questions they cannot prepare for — because they are generated from the student's own submission in real time.
What they do: Run oral exams with no consistent rubric, no audit trail, no way to compare results across sections or years, and no appeals process.
Who gets hurt: Everyone. Parents who cannot understand marks. Students who cannot appeal without evidence. Regulators with no visibility.
The root cause: No one built a system to replace human subjectivity in oral examination with something better. TWELVE is that system.
The Solution
TWELVE replaces the biased human examiner in the oral viva with a structured, AI-driven interview system. The system:
- Authenticates the student before the session begins
- Reads their actual submitted work before asking a single question
- Generates personalised, case-based questions from that work
- Drills down on every answer with cross-questions
- Detects gaps between claimed understanding and actual understanding
- Tests core subject knowledge based on curriculum
- Evaluates every response against the college's own rubric
- Produces a final score using a panel of AI models
- Stores a complete transcript for appeal and audit
No favouritism. No mood. No personal history. No recognition of face or name. Only what the student knows.
Core Beliefs
Bias is not personal. It is structural.
Individual professors are not all bad people. The system gives them no constraints, no rubric, no record, and no accountability. The bias is the inevitable output of a broken structure. Fix the structure. Fix the bias.
A viva should test knowledge, not confidence.
The student who is nervous and knows everything should not be penalised for their nerves. The student who is confident and knows nothing should not be rewarded for their confidence. TWELVE separates the two completely.
Cross-questioning is the only real test.
Any student can memorise an answer. A student who actually understands a concept can answer the same concept asked six different ways. TWELVE asks the same concept six different ways. Memorisation fails. Understanding passes.
The appeal process is not a weakness.
The AI decides. The professor reviews on appeal. This is not a compromise. This is the correct structure. The AI is the examiner. The professor is the appellate authority. The transcript is the evidence.
Transparency is the product.
Every question TWELVE asked. Every answer the student gave. Every score with reasoning. All of it stored. All of it accessible. Nothing hidden. Transparency is not a feature. It is the foundation.
Start narrow. Prove the core. Then scale.
TWELVE starts with BTech CSE viva examinations. One subject domain where answers are objective, verifiable, and defensible. Prove it works there first. Then expand.
Session Flow
This is what happens when a student sits down for their TWELVE viva. Every step is deliberate. Every step serves a purpose.
Kiosk Activation
The college computer boots into TWELVE kiosk mode. No other application is accessible — no browser, no file explorer, no task manager. The student sees only the TWELVE interface.
Student Identification
Camera activates. TWELVE prompts the student to state their Name and Roll Number. The student types or speaks their credentials. TWELVE cross-verifies against the college's student database. At POC stage, identification is by Name and Roll Number only. Camera is active exclusively for eye-contact monitoring, not identity verification.
Submission Retrieval
Once identity is confirmed, TWELVE silently retrieves the student's submission data: project submission, Problem Statement assigned, assignment history for the current semester, and any prior assessment records.
Pre-Session Analysis
TWELVE reads the student's submission and builds a question tree silently while a welcome screen is shown. The AI identifies key technical terms used, the PS and approach taken, technologies mentioned, and any claims that can be probed.
PS Verification and Project Deep Dive
TWELVE opens with the student's own Problem Statement, then drills into the submission using the student's own words against them. Questions cannot be prepared for because they are built from the student's submission in real time.
Cross-Questioning
The same concept is asked again from a different angle, in a different scenario, with different framing. A student who memorised an answer will give the same scripted response to all variants. A student who understood will adapt. TWELVE detects the difference.
Core Subject Knowledge
After project-specific questioning, TWELVE shifts to curriculum-based knowledge for the student's current semester. Questions are randomised from the curated question bank.
Session Close and Feedback
TWELVE closes with brief spoken feedback: what areas were strong, what needs development, one concrete suggestion for improvement. This feedback is developmental only — not part of the score.
Score Generation and Logging
After the student leaves: final score computed by AI panel, full transcript saved, session log stored for audit and appeal, next candidate called.
System Architecture
The AI Panel
TWELVE does not use a single AI model to evaluate answers. It uses a panel of multiple models — each trained on a different dataset with a different reasoning approach — to evaluate every student response independently.
Evaluates factual and technical correctness against academic standards. Trained on CSE textbooks, university syllabi, and exam papers.
Evaluates whether the student can apply the concept. Trained on industry problem-solving patterns, technical interviews, and case studies.
Evaluates internal consistency across the session. Trained on cross-examination transcripts and Q&A sessions.
The Aggregator
After all three models score independently, the Aggregator collects scores, applies the college's pre-defined rubric weightage per section, computes a weighted rational average, and flags any answer where panel disagreement exceeds a defined threshold for potential human review.
Key principle: Three models trained on different data are not an echo chamber. They will produce genuinely different evaluations. The aggregator is not averaging sameness — it is reconciling difference.
Questioning Strategy
Layer 1 — Submission-Specific Questions
Generated from reading the student's own submitted work. No two students receive the same questions because no two students submitted the same work. Questions never ask what the student already wrote — they ask what follows from it.
Layer 2 — Drill-Down Questions
For every answer given, TWELVE generates a follow-up that goes one level deeper.
Layer 3 — Case-Based Variant Questions
Same concept. Different scenario. Different framing. Asked to verify understanding is transferable. A student who memorised "merge sort is O(n log n)" will give the same answer to all three variants. A student who understands sorting will give three different answers.
Layer 4 — Core Subject Questions
From the curated question bank. Semester and year specific. Randomised per session. Cannot be predicted or prepared for as a specific set.
Marking and Evaluation Framework
Who Defines the Rubric
The college defines the marking rubric before the examination begins. Professors set the criteria beforehand: what sections are assessed, what weightage each section carries, what constitutes a passing answer, and what the total marks are. TWELVE applies the institution's standards with perfect consistency — something a human examiner cannot guarantee across 60 students over two hours.
What Gets Scored
- Project understanding — PS explanation, approach, and implementation reasoning
- Cross-question performance — ability to handle variant questions
- Core subject knowledge — curriculum-based questions
What Is Never Scored
- Eye contact — anti-cheat flag only, never a mark
- Vocal confidence — feedback only, never marks
- Speed of answering — time taken is logged but not scored
- Appearance or presentation
The Appeal Process
The AI decision is the default final mark. On appeal, initiated by the student within a defined window, the professor reviews:
- The full session transcript — every question TWELVE asked, word for word
- Every answer the student gave, transcribed and timestamped
- The AI's score for each answer with the reasoning provided
- The marking rubric used for evaluation
- Any eye-contact flags logged during the session
The professor may override the AI's mark if they find a clear and articulable error. The override must be documented — the professor must state what was wrong with the AI's scoring for the override to be recorded.
This is not a compromise. This is the correct structure.
The AI is the examiner. The professor is the appellate authority. The transcript is the evidence. This mirrors how functional judicial systems work — the trial court decides; appeals go to a higher authority; the record is the evidence.
If a student disputes their mark, the professor can point to the exact transcript. There is no "the professor didn't like me" argument available because the professor did not conduct the examination. The override itself is also logged and auditable.
Technology Stack
Frontend — Kiosk Interface
- Kiosk shell: Electron.js — cross-platform desktop kiosk, OS-level lockdown
- UI framework: React with Tailwind CSS
- Speech-to-text: OpenAI Whisper (local)
- Text-to-speech: ElevenLabs API with pyttsx3 local fallback
- Eye-contact analysis: MediaPipe Face Mesh via OpenCV
Backend — Session Engine
- API server: FastAPI (Python) — high performance, async
- Session management: Redis
- LLM integration: Anthropic Claude API (primary), OpenAI GPT-4o (fallback)
- Fine-tuned models: Hugging Face Transformers
- Document parsing: PyMuPDF, python-docx, pypdf
AI Panel Models
- Model A (Curriculum): Mistral 7B fine-tuned on academic CSE textbooks
- Model B (Application): LLaMA 3 fine-tuned on technical interview datasets
- Model C (Coherence): Mistral 7B fine-tuned on Q&A transcripts
- Question Generator: Claude API (prompt-engineered)
Database
- Primary DB: PostgreSQL with JSONB support
- Session cache: Redis
- File storage: Local filesystem (POC) then AWS S3 at scale
Kiosk OS Lockdown
- OS: Windows 10/11 LTSC — most common in Indian college labs
- Kiosk mode: Windows Assigned Access with Electron
- USB blocking: Windows Group Policy
- Network: Local network only during session
Roadmap
2 weeks
Prove the Core
Submission parser. Question generator via Claude API. Student answer interface. Basic scoring. Transcript output. Local prototype only.
4–8 weeks
First Real Batch
Kiosk mode. Full session flow. Identity verification. Multi-model panel. Eye-contact monitoring. Admin portal. Run one real batch at own college in parallel with traditional viva.
First External Sale
Comparative pilot data ready. Approach two to three nearby engineering colleges. Multi-college isolation. Formal admin portal. Begin question bank expansion.
Scale
LMS integration. Multi-department expansion. Self-service onboarding. Biometric verification post legal clearance. 20+ college target.
The final test for every feature:
Does this make the viva fairer — or does it just make it faster? If it just makes it faster, rebuild it until it makes it fairer. A faster unfair examination is not a product. It is a more efficient injustice.
The One Thing That Cannot Be Compromised
The transcript is sacred. Every question TWELVE asked. Every answer the student gave. Every score with its reasoning. Stored. Immutable. Accessible.
If TWELVE ever deletes a transcript, edits a transcript, or hides a transcript — it has become the same biased examiner it was built to replace. The transcript is the difference between TWELVE and everything that came before it.