Benchmarks

DynaSchedBench benchmark data is organized around generated instances and agent trajectories. A generated instance is reusable: many agents can run on the same instance directory, producing separate trajectories and metrics.

Generated Instance Directory

A generated instance typically contains:

  • input_model.json: the normalized generation configuration.

  • events.jsonl: the dynamic event stream consumed by the simulator.

  • static_jobs.json: job metadata extracted from the event stream.

  • static_machines.json: machine and machine-group metadata.

  • final_metrics.json: generator-side metrics and calibration diagnostics.

  • pareto_info.json: Pareto-front metadata for MOO or hybrid calibration.

  • report.md: a human-readable generation report.

These files are produced by dsbx-gen gen or dsbx-gen gen-batch.

Agent Run Directory

An agent run typically contains:

  • trajectory.json or trajectory_light.jsonl

  • metrics.json

  • generated plots such as gantt.pdf

  • agent-specific logs

These files are produced by dsbx-agent run.

Repository Data Layout

The repository may include benchmark collections under data/, such as generated grids, GEN-Bench style files, JMS-Bench style files, or MK-Bench style files. Treat each collection as immutable experiment input once published.

Recommended practice:

  • Keep input models and generated instance artifacts together.

  • Write each agent’s output under a separate subdirectory.

  • Store metrics and plots next to the trajectory that produced them.

  • Record package version, Git commit, seed, and command line in experiment logs.

Reproducibility Checklist

For a benchmark result to be reproducible, keep:

  • the original input-model JSON;

  • the generated instance directory;

  • the exact agent command and random seed;

  • the trajectory file;

  • metrics.json or recomputation instructions;

  • any LLM provider/model settings for LLM agents;

  • the DynaSchedBench package version.

For published comparisons, prefer rerunning metrics from trajectories:

dsbx-eval from-trajectory \
  -t path/to/agent_run/trajectory_light.jsonl \
  -o path/to/agent_run/metrics_recomputed.json \
  --fail-on-violation

Scaling Experiments

For large sweeps:

  1. Generate and validate all instances.

  2. Run agents with trajectory_light.jsonl enabled.

  3. Recompute metrics in a separate evaluation pass.

  4. Generate only the plots needed for the paper or artifact review.

  5. Keep runs/ or experiment output outside the package wheel.

Heavy research agents, third-party training code, pretrained policies, and generated training artifacts are kept outside the PyPI wheel so pip install dsbx remains lightweight.