If your target apps are in English, the AutoGLM‑Phone‑9B‑Multilingual model is typically the safer default. This tutorial explains when to choose it, how to download it from official sources, and how to validate English UI behavior.
Even for English‑only apps, multilingual models often have broader UI text coverage and more robust handling of mixed‑language labels, localization menus, and embedded foreign strings. In practice, that reduces the chance of mis‑reading buttons or mis‑interpreting settings screens.
This is not a claim of superior performance for every use case. It is a practical default for evaluation.
Download the model from:
Avoid re‑hosted weights. Official sources ensure consistent licensing and checksums.
You can run the model in two main ways:
Option A (recommended): Use a deployed model service endpoint and pass --base-url.
Option B (local): Use a local GPU (24GB+ VRAM recommended) with vLLM or SGLang.
Option A is the fastest way to test whether the multilingual model behaves well on your UI flows.
Before you run the agent:
These notes help you reproduce results later.
The official README is the source of truth. The sequence below is a template to show the structure, not exact commands.
# TODO: replace with official run command for the multilingual model
python -m openautoglm.run \
--model /path/to/AutoGLM-Phone-9B-Multilingual \
--device adbIf you are using an endpoint:
# TODO: replace with official endpoint flags
python -m openautoglm.run \
--base-url https://your-endpoint.example.comUse concise, deterministic prompts:
These reduce ambiguous actions and keep the agent in safe territory.
Start with low‑risk tasks:
These tasks help you evaluate perception accuracy and basic navigation without risk.
To reduce ambiguity:
These small prompt changes can reduce mis‑clicks.
Even English apps can contain mixed UI strings (language selectors, locale switches, legal links). If the agent hesitates:
Do not allow automated purchase or password‑change flows in early tests. Always set up a manual confirmation step for destructive actions.
Score each task:
This keeps evaluations consistent across runs.
Waitlist
Get notified when guided Android regression testing workflows and safety checklists are ready.
We only use your email for the waitlist. You can opt out anytime.