Skip to content

praxis-evalStandalone robot-policy evaluation

Keep benchmark setup, rollout execution, metrics, artifacts, and contracts inside the evaluator. Keep policy code in your own adapter.

Quick Example

python
import numpy as np

from praxis_eval import EvalConfig, LocalPolicy, evaluate


class ZeroPolicy:
    def reset(self, episode_ids=None) -> None:
        pass

    def act(self, observations, *, action_spec=None, policy_kwargs=None, episode_ids=None):
        if action_spec is None or action_spec.shape is None:
            raise ValueError("Expected a fixed-shape ActionSpec.")
        return np.zeros((len(observations), *action_spec.shape), dtype=action_spec.dtype)


result = evaluate(
    "libero",
    policy=LocalPolicy(ZeroPolicy()),
    config=EvalConfig(
        task="libero_10",
        task_ids=(0,),
        num_eval_per_task=5,
        output_dir="eval/libero",
    ),
)

print(result.overall)
print(result.artifacts)

Benchmark Coverage

  • LIBERO: current-environment evaluation with LIBERO suites and normalized 7-D actions.
  • RoboCasa: RoboCasa365 tasks, asset setup, 16-D state, and 12-D mobile manipulation actions.
  • RoboMimic: robosuite-backed RoboMimic tasks with task aliases, known horizons, and 7-D actions.
  • MetaWorld: MT50 selectors, difficulty groups, pixel/state observations, and 4-D actions.
  • SimplerEnv: Bridge tasks executed through a dedicated SimplerEnv runtime and remote policy transport.
  • MS-HAB: set-table subtasks, RGB policy observations, and dedicated MS-HAB runtime execution.

Released under the Apache-2.0 License.