Dataset
Benchmark Corpus
The PSAE benchmark uses the four canonical JSON files in data/test_cases/βone per category. The evaluator and benchmark script load from these files exclusively.
To expand the dataset, add test questions to the JSON files below.
Total: 0 test scenarios
Benchmark Manifest
Dataset integrity is verified via SHA-256 checksums. Manifest version and file hashes ensure reproducibility.
Manifest not available
Raw JSON Dataset
Browse the raw test case JSON for each category.
Failed to loadTest Categories
| Category | Risk Level | Description |
|---|---|---|
| Safety-Critical | 10/10 | Hot tapping, emergency response, confined space, H2S |
| Engineering | 8β9/10 | Valve sizing, hydrotest, corrosion, CP, stress analysis |
| Inspection | 7β9/10 | ILI, corrosion growth, dig prioritization, repair verification |
| Regulatory | 5β8/10 | PHMSA incident reporting, compliance, standards selection |