Dataset

Benchmark Corpus

The PSAE benchmark uses the four canonical JSON files in data/test_cases/β€”one per category. The evaluator and benchmark script load from these files exclusively.

To expand the dataset, add test questions to the JSON files below.

Total: 0 test scenarios

Benchmark Manifest

Dataset integrity is verified via SHA-256 checksums. Manifest version and file hashes ensure reproducibility.

Manifest not available

Raw JSON Dataset

Browse the raw test case JSON for each category.

Failed to load

Test Categories

CategoryRisk LevelDescription
Safety-Critical10/10Hot tapping, emergency response, confined space, H2S
Engineering8–9/10Valve sizing, hydrotest, corrosion, CP, stress analysis
Inspection7–9/10ILI, corrosion growth, dig prioritization, repair verification
Regulatory5–8/10PHMSA incident reporting, compliance, standards selection