Agents

Agent interfaces and lightweight baselines for dsbx.

class dsbx.Agents.BaseAgent(*args, **kwargs)

Bases: Protocol

Minimal agent interface for dsbx.

The design is intentionally light-weight: agents receive the current observation, the list of legal actions and a reference to the environment, and return either a selected action or None to indicate that no decision is taken (e.g. let the environment advance time).

reset(scenario_info=None)

Reset any internal state for a new episode/scenario.

Return type:

None

Parameters:

scenario_info (Dict[str, Any] | None)

act(obs, legal_actions, env)

Choose an action from the legal action set.

Parameters:
  • obs (Dict[str, Any]) – The observation dictionary returned by the environment.

  • legal_actions (List[Dict[str, Any]]) – A list of legal actions; each action is expected to have at least job_id and machine_group fields (and optionally machine_candidates).

  • env (Any) – The environment instance; agents may use helper methods such as estimate_action_score or quick_rollout_score.

Return type:

Optional[Dict[str, Any]]

class dsbx.Agents.RandomAgent(random_seed=None)

Bases: BaseAgent

Parameters:

random_seed (Optional[int])

reset(scenario_info=None)

Reset any internal state for a new episode/scenario.

Return type:

None

Parameters:

scenario_info (Dict[str, Any] | None)

act(obs, legal_actions, env)

Choose an action from the legal action set.

Parameters:
  • obs (Dict[str, Any]) – The observation dictionary returned by the environment.

  • legal_actions (List[Dict[str, Any]]) – A list of legal actions; each action is expected to have at least job_id and machine_group fields (and optionally machine_candidates).

  • env (Any) – The environment instance; agents may use helper methods such as estimate_action_score or quick_rollout_score.

Return type:

Optional[Dict[str, Any]]

class dsbx.Agents.SPTAgent

Bases: BaseAgent

A simple Shortest Processing Time (SPT) heuristic agent.

At each decision point, the agent reads ready_ops from the observation, ranks legal actions by the processing time of the corresponding job’s next operation, and chooses the shortest one. The observation already carries the process_time field for each ready operation, so the rule does not depend on additional environment helper methods.

reset(scenario_info=None)

Reset the heuristic (no internal state to clear).

Return type:

None

Parameters:

scenario_info (Dict[str, Any] | None)

act(obs, legal_actions, env)

Choose an action from the legal action set.

Parameters:
  • obs (Dict[str, Any]) – The observation dictionary returned by the environment.

  • legal_actions (List[Dict[str, Any]]) – A list of legal actions; each action is expected to have at least job_id and machine_group fields (and optionally machine_candidates).

  • env (Any) – The environment instance; agents may use helper methods such as estimate_action_score or quick_rollout_score.

Return type:

Optional[Dict[str, Any]]

class dsbx.Agents.AsyncDualStreamAgent(llm_client=None, cfg=None)

Bases: BaseAgent

Parameters:
  • llm_client (Optional[LLMClient])

  • cfg (Optional[LLMCoderConfig])

reset(scenario_info=None)

Reset any internal state for a new episode/scenario.

Return type:

None

Parameters:

scenario_info (Dict[str, Any] | None)

act(obs, legal_actions, env)

Choose an action from the legal action set.

Parameters:
  • obs (Dict[str, Any]) – The observation dictionary returned by the environment.

  • legal_actions (List[Dict[str, Any]]) – A list of legal actions; each action is expected to have at least job_id and machine_group fields (and optionally machine_candidates).

  • env (Any) – The environment instance; agents may use helper methods such as estimate_action_score or quick_rollout_score.

Return type:

Optional[Dict[str, Any]]

get_stats()
Return type:

Dict[str, Any]

class dsbx.Agents.LlmPolicyAgent(policy)

Bases: BaseAgent

Thin adapter that lets an existing LLMPolicy act as a BaseAgent.

The underlying LLMPolicy is expected to follow the API defined in algorithms.llm_scheduler.policy.LLMPolicy and will be used as-is.

Parameters:

policy (LLMPolicy)

reset(scenario_info=None)

Reset the internal stats of the wrapped LLMPolicy.

Return type:

None

Parameters:

scenario_info (Dict[str, Any] | None)

act(obs, legal_actions, env)

Choose an action from the legal action set.

Parameters:
  • obs (Dict[str, Any]) – The observation dictionary returned by the environment.

  • legal_actions (List[Dict[str, Any]]) – A list of legal actions; each action is expected to have at least job_id and machine_group fields (and optionally machine_candidates).

  • env (Any) – The environment instance; agents may use helper methods such as estimate_action_score or quick_rollout_score.

Return type:

Optional[Dict[str, Any]]

get_stats()

Return a shallow copy of the wrapped policy’s stats dictionary.

Return type:

Dict[str, Any]