Open Source Project of the Day (Part 29): Open-AutoGLM - A Phone Agent Framework for Controlling Phones with Natural Language
Introduction "'Open Meituan and search for nearby hot pot restaurants.' 'Send a message to File Transfer Assistant: deployment successful.' — Spoken, and the phone does it automatically." This is P...

Source: DEV Community
Introduction "'Open Meituan and search for nearby hot pot restaurants.' 'Send a message to File Transfer Assistant: deployment successful.' — Spoken, and the phone does it automatically." This is Part 29 of the "Open Source Project of the Day" series. Today we explore Open-AutoGLM (GitHub), open-sourced by zai-org (Zhipu AI ecosystem). You want to control your phone with natural language: open apps, search, tap, type text — without doing it step by step yourself. Open-AutoGLM delivers two things: a Phone Agent framework (Python code running on your computer) that controls devices via ADB (Android) or HDC (HarmonyOS) in a loop of "screenshot → visual model understands the interface → outputs action (launch app, tap coordinates, type, etc.) → execute"; and the AutoGLM-Phone series of vision-language models (9B parameters) optimized for mobile interfaces, callable via Zhipu BigModel, ModelScope APIs, or your own vLLM/SGLang service. Users simply say something like "open Xiaohongshu and se