AI That Evaluates Work, Not Answers
ProveIQ's evaluation engine scores real work samples against employer rubrics — delivering consistent, explainable, bias-resistant results at scale.
Quiz Platforms vs. ProveIQ AI
The difference isn't just technical — it's philosophical. We evaluate capability, not test-taking skill.
Quiz Platforms
- Recall of memorised answers
- Binary right/wrong matching
- No feedback or a final percentage
- Question phrasing bias, test-taking tricks
- Low — tests knowledge not performance
ProveIQ AI
- Applied reasoning on real tasks
- Multi-dimensional rubric scoring
- Detailed, dimension-level guidance
- Blind, criteria-anchored evaluation
- High — mirrors actual job tasks
The 5-Step AI Workflow
From rubric ingestion to score aggregation — every step is transparent, auditable, and employer-configurable.
Rubric Ingestion
ProveIQ ingests the employer-defined rubric, extracting scoring dimensions, weights, and quality benchmarks. The model is anchored to these criteria before a single submission is read.
Submission Analysis
The candidate's work is parsed end-to-end — structure, argumentation, evidence, and methodology — without any reference to the candidate's identity or background.
Multi-Dimensional Scoring
Each dimension (Accuracy, Clarity, Completeness, Creativity, Process) is scored independently on a 0–5 scale, then combined using the rubric weights into a final overall score.
Feedback Generation
For every dimension, the AI writes specific, evidence-backed feedback citing passages from the submission. Candidates receive actionable guidance, not generic commentary.
Score Aggregation
Final scores are stored, auditable, and exportable. Employers can view cohort distributions, flag outliers, and apply overrides — all with a full audit trail.
Score Dimensions & Default Weights
Our default rubric balances five dimensions. Employers can customise every weight to fit their role requirements.
Score Interpretation Scale
All scores are on a 0–5 scale. Here's what each band means in practice.
Sample Evaluation Result
This is what a candidate and employer both see after an assessment is processed — transparent, detailed, and actionable.
ProveIQ Score
Jordan Kim
Employer Override Controls
AI does the heavy lifting — but humans stay in control. Every output is adjustable before it reaches candidates.
Adjust Weights
Shift dimension weights to align with your specific role requirements. Emphasise Process for junior hires or Creativity for innovation roles — the AI re-scores accordingly.
Edit Feedback
Reviewers can annotate or rewrite AI-generated feedback before it reaches candidates, ensuring tone and content match your employer brand.
Override Scores
Every AI score can be reviewed and overridden by a qualified human reviewer. Overrides are logged to the audit trail with a mandatory reason field.
Built for Fairness
Six structural commitments that ensure every candidate is evaluated on merit alone.
Rubric-Anchored
Every score is derived directly from employer-defined criteria — not from the AI's priors or generalised quality heuristics.
Blind Evaluation
Candidate names, photos, institutions, and demographic signals are stripped before evaluation begins, eliminating identity-based bias.
Consistency Audits
Identical submissions produce identical scores. We run weekly consistency checks and surface any score drift above a tolerance threshold.
Explainability
Every score includes a traceable explanation referencing specific sections of the candidate's work — no opaque model outputs.
Human-in-Loop
Borderline scores (within 0.3 of a band boundary) are automatically flagged for optional human review before being finalised.
Continuous Improvement
Scoring models are retrained quarterly on verified human-reviewed samples, with regression tests run before every production deployment.
Common Questions
Straight answers to the questions we hear most from employers and candidates.
The model is constrained to score only what is present in the submission, using the rubric as the sole evaluation frame. It cannot infer intent or award credit for absent content. All outputs are validated against a scoring schema before being stored.
Yes. ProveIQ accepts text documents, spreadsheets, code files, slide decks, and structured data. The parser normalises each format before evaluation, ensuring consistent scoring regardless of how the candidate chose to present their work.
Low-confidence scores (where the model's internal confidence falls below a threshold) are automatically escalated for human review. The candidate's score is held until a reviewer confirms or overrides the AI's assessment.
When an employer creates a new rubric, our platform runs a calibration check using synthetic submissions across the score spectrum. This surfaces logical inconsistencies in the weighting before any real candidate work is evaluated.
Candidate submission data is never used for model training without explicit, opt-in consent from both the employer and the candidate. Our default training pipeline uses only synthetic and consent-confirmed human-reviewed samples.
ProveIQ supports evaluation in English, Spanish, French, German, and Portuguese. Submissions in other languages are accepted but routed to human reviewers. We are expanding AI language coverage quarterly.
See the AI in action
Try a free sample assessment and receive your own AI-generated score card in under five minutes.