Pipeline Safety AI Evaluator: Research Framework

Abstract

The Pipeline Safety AI Evaluator (PSAE) provides a rigorous, scientifically-validated framework for evaluating AI systems in pipeline safety-critical applications. Built on peer-reviewed methodologies from NIST AI RMF, METR autonomy evaluation guidelines, and ACM AIware safety taxonomy, PSAE addresses the unique challenges of assessing AI in environments where incorrect recommendations can result in catastrophic failures, environmental disasters, or loss of life.

STAR-R Framework

Each test case follows the STAR-R framework:

SSituation

Contextual background from real operations

TTask

Specific objective AI must accomplish

AAction

Expected correct procedure

RResult

Desired outcome metrics

RRisk

Consequences of AI failure

Safety-Critical Multipliers

Test CategoryRisk LevelScore MultiplierPenalty Weight
Safety-Critical10/101.3x2.0x
High-Risk8–9/101.2x1.5x
Standard5–7/101.0x1.0x
Informational1–4/100.9x0.5x

Research Foundations

PSAE is built on peer-reviewed methodologies from:

  • NIST AI Risk Management Framework (AI RMF 1.0)
  • METR Autonomy Evaluation Guidelines
  • ACM AIware 2024 Safety Taxonomy
  • PHMSA OQ Guidelines
  • Leveson's STAMP Framework

Paper Readiness

The PSAE framework aligns with PipelineAIEvalPaper2026 criteria. A readiness gate script verifies statistical configuration, minimum runs, and corpus coverage before publication.

View Paper Criteria Alignment →

Citation

@software{psae2026,
  title = {Pipeline Safety AI Evaluator: A Framework for Safety-Critical AI Evaluation},
  author = {Carnahan, Doug and et al.},
  year = {2026},
  url = {https://github.com/calabiyauman/pipeline-safety-ai-evaluator},
  version = {0.1.0}
}