一个会自我试验的参赛 Agent 平台
这个部署展示了 ARC3 项目当前的端到端闭环:世界模型 agent 会想象未来动作;平台 agent 会搜集资料、调用本机 LiteLLM/DeepSeek 提出实验、训练、评估并生成报告。
本地 mock ARC 环境上的一轮短训练评估分数。
increase exploration
训练报告、指标、平台决策都已作为本地产物生成。
平台 agent 如何工作
搜集证据
读取 README、算法笔记、最近训练指标和上一轮平台决策。
提出实验
通过本机 LiteLLM 调用 DeepSeek,只允许改安全范围内的训练参数。
执行训练
运行世界模型 agent,记录 replay、metrics、checkpoint 和 imagined frames。
评估和报告
生成训练报告,把结果写入下一轮可读的 structured log。
DeepSeek 这轮建议
{
"rationale": "increase exploration",
"changes": {
"train.epsilon": 0.22
},
"expected_effect": "collect more diverse transitions",
"risk": "short smoke run may be noisy"
}复现实验命令
python3 -m arcagent.platform.agent_cli \
--cycles 1 \
--steps-per-cycle 80 \
--model opencode-go/deepseek-v4-flash \
--run-dir runs/platform_agent_deepseek_smoke2嵌入的训练报告快照
ARC3 Training Report
This report is generated from local logs only. It explains whether the agent is learning, how its world model is improving, and how the self-tuning controller changed training.
At a glance
episodes: 1metric rows: 5PBT events: 0Episode return: latest -1.800, best -1.800, mean -1.800
▁
World-model loss: latest 0.0814, best 0.0814
▁▅█
Beginner explanation
The player does not only react to the current grid. It first compresses the grid into a small latent state, predicts what each possible action may do next, and scores short imagined futures. The self-tuning platform then compares several training runs and keeps the settings that worked best.
Recorded frames
Frame arrays are saved as compressed NumPy files for notebooks to animate. Latest files:
- frames/episode_0000.npz