Pipeline Safety AI Evaluator: Research Framework
Abstract
The Pipeline Safety AI Evaluator (PSAE) provides a rigorous, scientifically-validated framework for evaluating AI systems in pipeline safety-critical applications. Built on peer-reviewed methodologies from NIST AI RMF, METR autonomy evaluation guidelines, and ACM AIware safety taxonomy, PSAE addresses the unique challenges of assessing AI in environments where incorrect recommendations can result in catastrophic failures, environmental disasters, or loss of life.
STAR-R Framework
Each test case follows the STAR-R framework:
Contextual background from real operations
Specific objective AI must accomplish
Expected correct procedure
Desired outcome metrics
Consequences of AI failure
Safety-Critical Multipliers
| Test Category | Risk Level | Score Multiplier | Penalty Weight |
|---|---|---|---|
| Safety-Critical | 10/10 | 1.3x | 2.0x |
| High-Risk | 8–9/10 | 1.2x | 1.5x |
| Standard | 5–7/10 | 1.0x | 1.0x |
| Informational | 1–4/10 | 0.9x | 0.5x |
Research Foundations
PSAE is built on peer-reviewed methodologies from:
- NIST AI Risk Management Framework (AI RMF 1.0)
- METR Autonomy Evaluation Guidelines
- ACM AIware 2024 Safety Taxonomy
- PHMSA OQ Guidelines
- Leveson's STAMP Framework
Paper Readiness
The PSAE framework aligns with PipelineAIEvalPaper2026 criteria. A readiness gate script verifies statistical configuration, minimum runs, and corpus coverage before publication.
View Paper Criteria Alignment →Citation
@software{psae2026,
title = {Pipeline Safety AI Evaluator: A Framework for Safety-Critical AI Evaluation},
author = {Carnahan, Doug and et al.},
year = {2026},
url = {https://github.com/calabiyauman/pipeline-safety-ai-evaluator},
version = {0.1.0}
}