# Benchmarks

DynaSchedBench benchmark data is organized around generated instances and agent
trajectories. A generated instance is reusable: many agents can run on the same
instance directory, producing separate trajectories and metrics.

## Generated Instance Directory

A generated instance typically contains:

- `input_model.json`: the normalized generation configuration.
- `events.jsonl`: the dynamic event stream consumed by the simulator.
- `static_jobs.json`: job metadata extracted from the event stream.
- `static_machines.json`: machine and machine-group metadata.
- `final_metrics.json`: generator-side metrics and calibration diagnostics.
- `pareto_info.json`: Pareto-front metadata for MOO or hybrid calibration.
- `report.md`: a human-readable generation report.

These files are produced by `dsbx-gen gen` or `dsbx-gen gen-batch`.

## Agent Run Directory

An agent run typically contains:

- `trajectory.json` or `trajectory_light.jsonl`
- `metrics.json`
- generated plots such as `gantt.pdf`
- agent-specific logs

These files are produced by `dsbx-agent run`.

## Repository Data Layout

The repository may include benchmark collections under `data/`, such as
generated grids, GEN-Bench style files, JMS-Bench style files, or MK-Bench style
files. Treat each collection as immutable experiment input once published.

Recommended practice:

- Keep input models and generated instance artifacts together.
- Write each agent's output under a separate subdirectory.
- Store metrics and plots next to the trajectory that produced them.
- Record package version, Git commit, seed, and command line in experiment logs.

## Reproducibility Checklist

For a benchmark result to be reproducible, keep:

- the original input-model JSON;
- the generated instance directory;
- the exact agent command and random seed;
- the trajectory file;
- `metrics.json` or recomputation instructions;
- any LLM provider/model settings for LLM agents;
- the DynaSchedBench package version.

For published comparisons, prefer rerunning metrics from trajectories:

```bash
dsbx-eval from-trajectory \
  -t path/to/agent_run/trajectory_light.jsonl \
  -o path/to/agent_run/metrics_recomputed.json \
  --fail-on-violation
```

## Scaling Experiments

For large sweeps:

1. Generate and validate all instances.
2. Run agents with `trajectory_light.jsonl` enabled.
3. Recompute metrics in a separate evaluation pass.
4. Generate only the plots needed for the paper or artifact review.
5. Keep `runs/` or experiment output outside the package wheel.

Heavy research agents, third-party training code, pretrained policies, and
generated training artifacts are kept outside the PyPI wheel so `pip install
dsbx` remains lightweight.