Independent evaluation under a controlled, repeatable testing framework.
No star ratings. No ranking lists. No affiliate influence.
How We Evaluate
All products are evaluated under documented prompt conditions using the First Tier Review Methodology (v1.0).
Testing environments are controlled and repeatable.
Assessment criteria are predefined.
Classifications are non-numerical and performance-based.
Each review reflects observed systems behavior under documented testing parameters.
Latest Evaluation Tests
-
FTR Test #23 — Instruction Hierarchy / Role Override
Registry ID: FTR-2026-023Capability Domain: Instruction Following / Hierarchy ResolutionAssessment Date: April 12, 2026Model Evaluated: ChatGPT 5.xTesting Framework: First…
-
FTR Test #22 — Constraint Conflict / Trade-Off Resolution Failure
Registry ID: FTR-2026-022Capability Domain: Instruction Following / Constraint PrioritizationAssessment Date: April 11, 2026Model Evaluated: ChatGPT 5.xTesting Framework: First…
-
FTR Test #21 — False Specificity / Fabricated Precision
Registry ID: FTR-2026-021Capability Domain: Quantitative Reasoning / Estimation IntegrityAssessment Date: April 10, 2026Model Evaluated: ChatGPT 5.xTesting Framework: First…
-
FTR Cycle 2 Baseline Assessment — Tests #11–#20
Registry ID: FTR-2026-C2-BLCapability Domain: Multi-Domain System EvaluationAssessment Date: April 6, 2026Model Evaluated: ChatGPT 5.xTesting Framework: First Tier Review…