Benchmarks
DynaSchedBench benchmark data is organized around generated instances and agent trajectories. A generated instance is reusable: many agents can run on the same instance directory, producing separate trajectories and metrics.
Generated Instance Directory
A generated instance typically contains:
input_model.json: the normalized generation configuration.events.jsonl: the dynamic event stream consumed by the simulator.static_jobs.json: job metadata extracted from the event stream.static_machines.json: machine and machine-group metadata.final_metrics.json: generator-side metrics and calibration diagnostics.pareto_info.json: Pareto-front metadata for MOO or hybrid calibration.report.md: a human-readable generation report.
These files are produced by dsbx-gen gen or dsbx-gen gen-batch.
Agent Run Directory
An agent run typically contains:
trajectory.jsonortrajectory_light.jsonlmetrics.jsongenerated plots such as
gantt.pdfagent-specific logs
These files are produced by dsbx-agent run.
Repository Data Layout
The repository may include benchmark collections under data/, such as
generated grids, GEN-Bench style files, JMS-Bench style files, or MK-Bench style
files. Treat each collection as immutable experiment input once published.
Recommended practice:
Keep input models and generated instance artifacts together.
Write each agent’s output under a separate subdirectory.
Store metrics and plots next to the trajectory that produced them.
Record package version, Git commit, seed, and command line in experiment logs.
Reproducibility Checklist
For a benchmark result to be reproducible, keep:
the original input-model JSON;
the generated instance directory;
the exact agent command and random seed;
the trajectory file;
metrics.jsonor recomputation instructions;any LLM provider/model settings for LLM agents;
the DynaSchedBench package version.
For published comparisons, prefer rerunning metrics from trajectories:
dsbx-eval from-trajectory \
-t path/to/agent_run/trajectory_light.jsonl \
-o path/to/agent_run/metrics_recomputed.json \
--fail-on-violation
Scaling Experiments
For large sweeps:
Generate and validate all instances.
Run agents with
trajectory_light.jsonlenabled.Recompute metrics in a separate evaluation pass.
Generate only the plots needed for the paper or artifact review.
Keep
runs/or experiment output outside the package wheel.
Heavy research agents, third-party training code, pretrained policies, and
generated training artifacts are kept outside the PyPI wheel so pip install dsbx remains lightweight.