Documentation

Methodology Overview

PSAE evaluates AI systems using the STAR-R framework (Situation, Task, Action, Result, Risk). Each test case follows peer-reviewed methodologies from NIST AI RMF, METR autonomy evaluation guidelines, and ACM AIware safety taxonomy.

Statistical Requirements

  • Confidence level: 95%
  • Minimum runs per scenario: 5
  • Inter-rater reliability: Cohen's κ ≥ 0.8
  • Effect size: Cohen's d for model comparison

Documentation Index