Benchmarks

DynaSchedBench benchmark data is organized around generated instances and agent trajectories. A generated instance is reusable: many agents can run on the same instance directory, producing separate trajectories and metrics.

Generated Instance Directory

A generated instance typically contains:

input_model.json: the normalized generation configuration.
events.jsonl: the dynamic event stream consumed by the simulator.
static_jobs.json: job metadata extracted from the event stream.
static_machines.json: machine and machine-group metadata.
final_metrics.json: generator-side metrics and calibration diagnostics.
pareto_info.json: Pareto-front metadata for MOO or hybrid calibration.
report.md: a human-readable generation report.

These files are produced by dsbx-gen gen or dsbx-gen gen-batch.

Agent Run Directory

An agent run typically contains:

trajectory.json or trajectory_light.jsonl
metrics.json
generated plots such as gantt.pdf
agent-specific logs

These files are produced by dsbx-agent run.

Repository Data Layout

The repository may include benchmark collections under data/, such as generated grids, GEN-Bench style files, JMS-Bench style files, or MK-Bench style files. Treat each collection as immutable experiment input once published.

Recommended practice:

Keep input models and generated instance artifacts together.
Write each agent’s output under a separate subdirectory.
Store metrics and plots next to the trajectory that produced them.
Record package version, Git commit, seed, and command line in experiment logs.

Reproducibility Checklist

For a benchmark result to be reproducible, keep:

the original input-model JSON;
the generated instance directory;
the exact agent command and random seed;
the trajectory file;
metrics.json or recomputation instructions;
any LLM provider/model settings for LLM agents;
the DynaSchedBench package version.

For published comparisons, prefer rerunning metrics from trajectories:

dsbx-eval from-trajectory \
  -t path/to/agent_run/trajectory_light.jsonl \
  -o path/to/agent_run/metrics_recomputed.json \
  --fail-on-violation

Scaling Experiments

For large sweeps:

Generate and validate all instances.
Run agents with trajectory_light.jsonl enabled.
Recompute metrics in a separate evaluation pass.
Generate only the plots needed for the paper or artifact review.
Keep runs/ or experiment output outside the package wheel.

Heavy research agents, third-party training code, pretrained policies, and generated training artifacts are kept outside the PyPI wheel so pip install dsbx remains lightweight.