AI Translation Evaluation for Classical Chinese Texts

01 — Snapshot

Project Snapshot

Role

Product Research Consultant

Domain-Expert Evaluator

Timeline

Q2 2026

Product Type

AI-Assisted Classical Chinese Translation Tool

Confidentiality

NDA — Client & Product Anonymized

13

Iterative Builds Tested

10

AI Models Compared

10

Testers Recruited

5,000+

Words of Classical Text Tested

What I did

Evaluated iterative AI translation builds through model comparison, usability testing, and domain-expert review.

What surfaced

Translation quality, UI register, quota behavior, and release-readiness issues that affected product suitability.

What changed

Evidence informed production model choice, usage tiers, App Store readiness, and scholarly interface copy.

02 — Context

Overview

My Role

Product Research Consultant
Domain Expert Evaluator
Usability Researcher
Human-in-the-Loop Reviewer

Research Activities

Usability Testing
Model Comparison
Translation Review & Accuracy Evaluation
Human-in-the-Loop Evaluation
App Store Build Testing

Outcomes at a Glance

Production model selected
Usage quotas defined
App Store release readiness achieved
Interface copy aligned to audience

Context

Platform iOS Mobile App

Target Audience Scholars & Students

Content Domain Classical Chinese Texts

Primary Goal Improve readability and accessibility

Challenge

Translation Quality
Readability & Comprehension
User Experience
Audience Suitability

Evaluating AI translation for historical and literary texts required balancing usability, accuracy, and reader expectations for a specialized audience.

03 — Research

Research Focus

Objectives and guiding questions across 13 iterative product builds.

01

Evaluate Translation Quality

Assess translation accuracy, readability, and usefulness for intended users.

02

Identify Reliability Issues

Understand common quality risks and areas where output may degrade.

03

Assess User Experience

Evaluate whether the product experience supports the needs of its target audience.

04

Validate Product Assumptions

Review key design and business assumptions through testing and analysis.

05

Inform Product Decisions

Generate evidence-based recommendations for future development.

03 — Research

Evaluation Workflow

Research proceeded across 13 iterative builds, with each cycle informing the next.

1

Content Review

Curate and prepare Classical Chinese text samples.

2

Model Evaluation

Compare translation approaches and model outputs.

3

Expert Assessment

Review translation quality and audience suitability.

4

Findings Synthesis

Identify patterns, opportunities, and recurring issues.

5

Product Recommendations

Translate findings into actionable product decisions.

Methods

Translation Evaluation

Model Comparison
Expert Review
Output Quality Assessment

Product Research

Usability Testing
Workflow Evaluation
Build Testing

Content Analysis

Translation Quality Review
Reading Experience Assessment
Text Processing Review

Domain Expertise as Method

Classical Chinese translation requires specialized linguistic and cultural knowledge.

In addition to usability research, I served as a domain-expert evaluator, assessing translation quality, audience suitability, and reading experience for scholarly users.

04 — Analysis

Analysis & Findings

What the evaluation examined — and what it revealed.

Examined

Translation Quality

Model output readability, accuracy, and overall usefulness for scholarly readers.

Learned

The strongest approaches consistently balanced readability with source fidelity — directly informing the production model recommendation.

e.g., a single classical term rendered three ways by different models — only one preserved the original's literary tone.

Examined

Reliability & Trust

Recurring output-quality risks across builds and models.

Learned

Specific output patterns undermined reader trust and were flagged for review before release.

Examined

Audience Fit

Interface language and positioning for scholarly users.

Learned

Product copy needed register refinement to match scholarly expectations — adopted by the product team.

Also examined: Build Readiness — subscription, onboarding, and release workflows tested before deployment · Usage Calibration — text-length and usage analysis to support product planning

05 — Outcomes

Outcomes

Research contributions delivered across 13 iterative builds.

Deliverables

Framework

Evaluation Framework

Structured approach to assessing AI translation quality across iterative builds.

Report

Model Comparison Report

Evidence-based analysis to support production model selection.

Field Findings

UX & Build Findings

Usability and release-readiness feedback delivered per build cycle.

Strategy

Product Recommendations

Actionable guidance on product direction, copy, and feature priorities.

Impact

Model Selected

Evidence-based recommendation informed production translation engine choice

Quotas Defined

Usage-tier structure calibrated from real text-length and behavior analysis

App Store Ready

Release-readiness issues identified and resolved before deployment

Register Aligned

Interface copy refined to match scholarly audience expectations

06 — Reflection

Reflection

What this project revealed about research practice and AI evaluation.

What Worked

Combining Domain Expertise with UX Research

Embedding specialist knowledge directly into the evaluation process surfaced issues that standard usability methods would have missed.

What I'd Explore Next

Formalising Evaluation for Scale

A structured evaluation framework like this could be adapted for other AI products serving specialist audiences — legal, medical, or archival domains.

Key Takeaway

Product Suitability Is More Than Quality

For AI products serving niche audiences, evaluation must account for trust, register, and cultural fit — not just accuracy metrics.

AI Translation Evaluation for Classical Chinese Texts

Project Snapshot

Overview

Context

Challenge

Research Focus

Evaluation Workflow

Methods

Analysis & Findings

Outcomes

Deliverables

Impact

Reflection

Back to Research Portfolio