AI-Powered Viva Assessment

Fairness
Through AI

No fear. No favour. Only knowledge.

Version 1.0  —  May 2026  —  Stage: Pre-Build / Ideation Complete

10 min Per Student
0% Examiner Bias
100% Transparency
3x Faster Throughput

The Vision

The oral examination — the viva — is the most human assessment in academia. It is also the most broken. A student who studies for three years walks into a room for ten minutes. What happens in those ten minutes depends not on what they know but on who is sitting across from them.

TWELVE ends that.

TWELVE is a kiosk-based AI examination system installed on college-owned computers. It is not a replacement for education. It is a replacement for the single most biased moment in education — the oral viva conducted by a human examiner who has opinions, moods, favourites, and blind spots.

We are not anti-teacher. We are anti-bias.

What TWELVE Does

  • Identifies the student before the session begins
  • Retrieves their submission history
  • Conducts a structured oral examination through voice and text
  • Evaluates responses against the college's own rubric
  • Produces a final score with a complete, auditable transcript

Every student. Same quality of questioning. Same standard of evaluation. Score reflects only what they know.

The Name

Twelve is a number associated with completeness. Twelve months in a year. Twelve hours on a clock face. Twelve members on a jury. TWELVE the system is exactly that: a complete, full-cycle assessment that misses nothing, favours no one, and produces a judgment that stands.

The Core Problem

The oral examination system in Indian colleges is broken at three levels simultaneously. Each villain hurts a different group. All three are enabled by the same root cause: there is no system.

Villain 1 — The Biased Professor

What they do: Assign marks based on favouritism, personal relationships, attendance records, perceived attitude, or simple dislike.

Who gets hurt: The sincere student — the one who studied, understood the material, but was never given space to properly demonstrate knowledge.

Why devastating: Unlike written exams, a viva leaves no record. No transcript. No evidence. The student has no way to prove what they said or challenge what they were marked.

Villain 2 — The Dishonest Student

What they do: Submit AI-generated or copy-pasted work without understanding. Prepare scripted answers for likely questions. Walk out with full marks because the examiner didn't drill deep enough.

Why solvable: A dishonest student cannot answer a question they didn't prepare for. TWELVE asks questions they cannot prepare for — because they are generated from the student's own submission in real time.

Villain 3 — The Unstandardized Institution

What they do: Run oral exams with no consistent rubric, no audit trail, no way to compare results across sections or years, and no appeals process.

Who gets hurt: Everyone. Parents who cannot understand marks. Students who cannot appeal without evidence. Regulators with no visibility.

The root cause: No one built a system to replace human subjectivity in oral examination with something better. TWELVE is that system.

The Solution

TWELVE replaces the biased human examiner in the oral viva with a structured, AI-driven interview system. The system:

  • Authenticates the student before the session begins
  • Reads their actual submitted work before asking a single question
  • Generates personalised, case-based questions from that work
  • Drills down on every answer with cross-questions
  • Detects gaps between claimed understanding and actual understanding
  • Tests core subject knowledge based on curriculum
  • Evaluates every response against the college's own rubric
  • Produces a final score using a panel of AI models
  • Stores a complete transcript for appeal and audit

No favouritism. No mood. No personal history. No recognition of face or name. Only what the student knows.

Core Beliefs

01
Bias is not personal. It is structural.

Individual professors are not all bad people. The system gives them no constraints, no rubric, no record, and no accountability. The bias is the inevitable output of a broken structure. Fix the structure. Fix the bias.

02
A viva should test knowledge, not confidence.

The student who is nervous and knows everything should not be penalised for their nerves. The student who is confident and knows nothing should not be rewarded for their confidence. TWELVE separates the two completely.

03
Cross-questioning is the only real test.

Any student can memorise an answer. A student who actually understands a concept can answer the same concept asked six different ways. TWELVE asks the same concept six different ways. Memorisation fails. Understanding passes.

04
The appeal process is not a weakness.

The AI decides. The professor reviews on appeal. This is not a compromise. This is the correct structure. The AI is the examiner. The professor is the appellate authority. The transcript is the evidence.

05
Transparency is the product.

Every question TWELVE asked. Every answer the student gave. Every score with reasoning. All of it stored. All of it accessible. Nothing hidden. Transparency is not a feature. It is the foundation.

06
Start narrow. Prove the core. Then scale.

TWELVE starts with BTech CSE viva examinations. One subject domain where answers are objective, verifiable, and defensible. Prove it works there first. Then expand.

Session Flow

This is what happens when a student sits down for their TWELVE viva. Every step is deliberate. Every step serves a purpose.

0:00
Kiosk Activation

The college computer boots into TWELVE kiosk mode. No other application is accessible — no browser, no file explorer, no task manager. The student sees only the TWELVE interface.

0:30
Student Identification

Camera activates. TWELVE prompts the student to state their Name and Roll Number. The student types or speaks their credentials. TWELVE cross-verifies against the college's student database. At POC stage, identification is by Name and Roll Number only. Camera is active exclusively for eye-contact monitoring, not identity verification.

1:00
Submission Retrieval

Once identity is confirmed, TWELVE silently retrieves the student's submission data: project submission, Problem Statement assigned, assignment history for the current semester, and any prior assessment records.

1:30
Pre-Session Analysis

TWELVE reads the student's submission and builds a question tree silently while a welcome screen is shown. The AI identifies key technical terms used, the PS and approach taken, technologies mentioned, and any claims that can be probed.

3:00
PS Verification and Project Deep Dive

TWELVE opens with the student's own Problem Statement, then drills into the submission using the student's own words against them. Questions cannot be prepared for because they are built from the student's submission in real time.

6:00
Cross-Questioning

The same concept is asked again from a different angle, in a different scenario, with different framing. A student who memorised an answer will give the same scripted response to all variants. A student who understood will adapt. TWELVE detects the difference.

7:30
Core Subject Knowledge

After project-specific questioning, TWELVE shifts to curriculum-based knowledge for the student's current semester. Questions are randomised from the curated question bank.

8:30
Session Close and Feedback

TWELVE closes with brief spoken feedback: what areas were strong, what needs development, one concrete suggestion for improvement. This feedback is developmental only — not part of the score.

10:00
Score Generation and Logging

After the student leaves: final score computed by AI panel, full transcript saved, session log stored for audit and appeal, next candidate called.

System Architecture

[College Admin Interface] | | uploads student data, curriculum, rubric v [TWELVE Data Layer] — Student records — Submission files — Curriculum question bank — Marking rubric | v [Kiosk Interface — College Computer] — Full OS lockdown — Camera feed (eye-contact monitoring) — Audio input (student voice) — Display (questions + AI avatar) | v [Session Engine] — Identity verification module — Submission retrieval — Question tree generator — Real-time response evaluator — Cross-question logic engine — Session transcript logger | v [AI Panel — Multi-Model Scoring] — Model A: Factual correctness (40%) — Model B: Application depth (40%) — Model C: Coherence consistency (20%) — Aggregator: Weighted average | v [Output Layer] — Final score — Full session transcript — Audit log — Feedback report — Appeal-ready package

The AI Panel

TWELVE does not use a single AI model to evaluate answers. It uses a panel of multiple models — each trained on a different dataset with a different reasoning approach — to evaluate every student response independently.

Model A Curriculum Accuracy 40%

Evaluates factual and technical correctness against academic standards. Trained on CSE textbooks, university syllabi, and exam papers.

Model B Application Depth 40%

Evaluates whether the student can apply the concept. Trained on industry problem-solving patterns, technical interviews, and case studies.

Model C Coherence Consistency 20%

Evaluates internal consistency across the session. Trained on cross-examination transcripts and Q&A sessions.

The Aggregator

After all three models score independently, the Aggregator collects scores, applies the college's pre-defined rubric weightage per section, computes a weighted rational average, and flags any answer where panel disagreement exceeds a defined threshold for potential human review.

Key principle: Three models trained on different data are not an echo chamber. They will produce genuinely different evaluations. The aggregator is not averaging sameness — it is reconciling difference.

Questioning Strategy

Layer 1 — Submission-Specific Questions

Generated from reading the student's own submitted work. No two students receive the same questions because no two students submitted the same work. Questions never ask what the student already wrote — they ask what follows from it.

Layer 2 — Drill-Down Questions

For every answer given, TWELVE generates a follow-up that goes one level deeper.

"You said you used a hash table for O(1) lookup. What happens to your lookup time when hash collisions occur?" "You said the API returned JSON. What would you do if the API returned malformed JSON?" "You said you tested this on 100 records. What would break if it ran on 10 million records?"

Layer 3 — Case-Based Variant Questions

Same concept. Different scenario. Different framing. Asked to verify understanding is transferable. A student who memorised "merge sort is O(n log n)" will give the same answer to all three variants. A student who understands sorting will give three different answers.

Layer 4 — Core Subject Questions

From the curated question bank. Semester and year specific. Randomised per session. Cannot be predicted or prepared for as a specific set.

Marking and Evaluation Framework

Who Defines the Rubric

The college defines the marking rubric before the examination begins. Professors set the criteria beforehand: what sections are assessed, what weightage each section carries, what constitutes a passing answer, and what the total marks are. TWELVE applies the institution's standards with perfect consistency — something a human examiner cannot guarantee across 60 students over two hours.

What Gets Scored

  • Project understanding — PS explanation, approach, and implementation reasoning
  • Cross-question performance — ability to handle variant questions
  • Core subject knowledge — curriculum-based questions

What Is Never Scored

  • Eye contact — anti-cheat flag only, never a mark
  • Vocal confidence — feedback only, never marks
  • Speed of answering — time taken is logged but not scored
  • Appearance or presentation

The Appeal Process

The AI decision is the default final mark. On appeal, initiated by the student within a defined window, the professor reviews:

  1. The full session transcript — every question TWELVE asked, word for word
  2. Every answer the student gave, transcribed and timestamped
  3. The AI's score for each answer with the reasoning provided
  4. The marking rubric used for evaluation
  5. Any eye-contact flags logged during the session

The professor may override the AI's mark if they find a clear and articulable error. The override must be documented — the professor must state what was wrong with the AI's scoring for the override to be recorded.

This is not a compromise. This is the correct structure.

The AI is the examiner. The professor is the appellate authority. The transcript is the evidence. This mirrors how functional judicial systems work — the trial court decides; appeals go to a higher authority; the record is the evidence.

If a student disputes their mark, the professor can point to the exact transcript. There is no "the professor didn't like me" argument available because the professor did not conduct the examination. The override itself is also logged and auditable.

Technology Stack

Frontend — Kiosk Interface

  • Kiosk shell: Electron.js — cross-platform desktop kiosk, OS-level lockdown
  • UI framework: React with Tailwind CSS
  • Speech-to-text: OpenAI Whisper (local)
  • Text-to-speech: ElevenLabs API with pyttsx3 local fallback
  • Eye-contact analysis: MediaPipe Face Mesh via OpenCV

Backend — Session Engine

  • API server: FastAPI (Python) — high performance, async
  • Session management: Redis
  • LLM integration: Anthropic Claude API (primary), OpenAI GPT-4o (fallback)
  • Fine-tuned models: Hugging Face Transformers
  • Document parsing: PyMuPDF, python-docx, pypdf

AI Panel Models

  • Model A (Curriculum): Mistral 7B fine-tuned on academic CSE textbooks
  • Model B (Application): LLaMA 3 fine-tuned on technical interview datasets
  • Model C (Coherence): Mistral 7B fine-tuned on Q&A transcripts
  • Question Generator: Claude API (prompt-engineered)

Database

  • Primary DB: PostgreSQL with JSONB support
  • Session cache: Redis
  • File storage: Local filesystem (POC) then AWS S3 at scale

Kiosk OS Lockdown

  • OS: Windows 10/11 LTSC — most common in Indian college labs
  • Kiosk mode: Windows Assigned Access with Electron
  • USB blocking: Windows Group Policy
  • Network: Local network only during session

Roadmap

Now — POC
2 weeks
Prove the Core

Submission parser. Question generator via Claude API. Student answer interface. Basic scoring. Transcript output. Local prototype only.

Next — Pilot
4–8 weeks
First Real Batch

Kiosk mode. Full session flow. Identity verification. Multi-model panel. Eye-contact monitoring. Admin portal. Run one real batch at own college in parallel with traditional viva.

Month 3–4
First External Sale

Comparative pilot data ready. Approach two to three nearby engineering colleges. Multi-college isolation. Formal admin portal. Begin question bank expansion.

Month 6+
Scale

LMS integration. Multi-department expansion. Self-service onboarding. Biometric verification post legal clearance. 20+ college target.

The final test for every feature:

Does this make the viva fairer — or does it just make it faster? If it just makes it faster, rebuild it until it makes it fairer. A faster unfair examination is not a product. It is a more efficient injustice.

The One Thing That Cannot Be Compromised

The transcript is sacred. Every question TWELVE asked. Every answer the student gave. Every score with its reasoning. Stored. Immutable. Accessible.

If TWELVE ever deletes a transcript, edits a transcript, or hides a transcript — it has become the same biased examiner it was built to replace. The transcript is the difference between TWELVE and everything that came before it.