Agents
Agent interfaces and lightweight baselines for dsbx.
- class dsbx.Agents.BaseAgent(*args, **kwargs)
Bases:
ProtocolMinimal agent interface for dsbx.
The design is intentionally light-weight: agents receive the current observation, the list of legal actions and a reference to the environment, and return either a selected action or
Noneto indicate that no decision is taken (e.g. let the environment advance time).- reset(scenario_info=None)
Reset any internal state for a new episode/scenario.
- Return type:
None- Parameters:
scenario_info (Dict[str, Any] | None)
- act(obs, legal_actions, env)
Choose an action from the legal action set.
- Parameters:
obs (
Dict[str,Any]) – The observation dictionary returned by the environment.legal_actions (
List[Dict[str,Any]]) – A list of legal actions; each action is expected to have at leastjob_idandmachine_groupfields (and optionallymachine_candidates).env (
Any) – The environment instance; agents may use helper methods such asestimate_action_scoreorquick_rollout_score.
- Return type:
Optional[Dict[str,Any]]
- class dsbx.Agents.RandomAgent(random_seed=None)
Bases:
BaseAgent- Parameters:
random_seed (Optional[int])
- reset(scenario_info=None)
Reset any internal state for a new episode/scenario.
- Return type:
None- Parameters:
scenario_info (Dict[str, Any] | None)
- act(obs, legal_actions, env)
Choose an action from the legal action set.
- Parameters:
obs (
Dict[str,Any]) – The observation dictionary returned by the environment.legal_actions (
List[Dict[str,Any]]) – A list of legal actions; each action is expected to have at leastjob_idandmachine_groupfields (and optionallymachine_candidates).env (
Any) – The environment instance; agents may use helper methods such asestimate_action_scoreorquick_rollout_score.
- Return type:
Optional[Dict[str,Any]]
- class dsbx.Agents.SPTAgent
Bases:
BaseAgentA simple Shortest Processing Time (SPT) heuristic agent.
At each decision point, the agent reads
ready_opsfrom the observation, ranks legal actions by the processing time of the corresponding job’s next operation, and chooses the shortest one. The observation already carries theprocess_timefield for each ready operation, so the rule does not depend on additional environment helper methods.- reset(scenario_info=None)
Reset the heuristic (no internal state to clear).
- Return type:
None- Parameters:
scenario_info (Dict[str, Any] | None)
- act(obs, legal_actions, env)
Choose an action from the legal action set.
- Parameters:
obs (
Dict[str,Any]) – The observation dictionary returned by the environment.legal_actions (
List[Dict[str,Any]]) – A list of legal actions; each action is expected to have at leastjob_idandmachine_groupfields (and optionallymachine_candidates).env (
Any) – The environment instance; agents may use helper methods such asestimate_action_scoreorquick_rollout_score.
- Return type:
Optional[Dict[str,Any]]
- class dsbx.Agents.AsyncDualStreamAgent(llm_client=None, cfg=None)
Bases:
BaseAgent- Parameters:
llm_client (Optional[LLMClient])
cfg (Optional[LLMCoderConfig])
- reset(scenario_info=None)
Reset any internal state for a new episode/scenario.
- Return type:
None- Parameters:
scenario_info (Dict[str, Any] | None)
- act(obs, legal_actions, env)
Choose an action from the legal action set.
- Parameters:
obs (
Dict[str,Any]) – The observation dictionary returned by the environment.legal_actions (
List[Dict[str,Any]]) – A list of legal actions; each action is expected to have at leastjob_idandmachine_groupfields (and optionallymachine_candidates).env (
Any) – The environment instance; agents may use helper methods such asestimate_action_scoreorquick_rollout_score.
- Return type:
Optional[Dict[str,Any]]
- get_stats()
- Return type:
Dict[str,Any]
- class dsbx.Agents.LlmPolicyAgent(policy)
Bases:
BaseAgentThin adapter that lets an existing
LLMPolicyact as a BaseAgent.The underlying LLMPolicy is expected to follow the API defined in
algorithms.llm_scheduler.policy.LLMPolicyand will be used as-is.- Parameters:
policy (LLMPolicy)
- reset(scenario_info=None)
Reset the internal stats of the wrapped LLMPolicy.
- Return type:
None- Parameters:
scenario_info (Dict[str, Any] | None)
- act(obs, legal_actions, env)
Choose an action from the legal action set.
- Parameters:
obs (
Dict[str,Any]) – The observation dictionary returned by the environment.legal_actions (
List[Dict[str,Any]]) – A list of legal actions; each action is expected to have at leastjob_idandmachine_groupfields (and optionallymachine_candidates).env (
Any) – The environment instance; agents may use helper methods such asestimate_action_scoreorquick_rollout_score.
- Return type:
Optional[Dict[str,Any]]
- get_stats()
Return a shallow copy of the wrapped policy’s stats dictionary.
- Return type:
Dict[str,Any]