MemGUI-Agent

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Guangyi Liu1,2, Gao Wu1,2, Congxiao Liu1,2, Pengxiang Zhao1,2, Liang Liu1, Mading Li2Project lead, Zhang Qi2, Mengyan Wang2, Liang Guo2, Yong Liu1 Corresponding author

1Zhejiang University    2Kuaishou Technology

Contact: guangyiliu@zju.edu.cn / yongliu@iipc.zju.edu.cn

MemGUI-Agent uses ConAct, a Context-as-Action interface that folds action history, updates UI memory, and emits the next GUI action in one structured response.

Project Video

Case studies from the paper

Each task groups the baseline failure and our successful rollout, with benchmark source, task query, and rollout length.

Featured Comparison

MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management

Four side-by-side comparisons covering MemGUI-Bench and MobileWorld with 8B and 235B backbones.

MemGUI-Bench 路 8B 路 AP News Digest

Long-horizon reading and note creation

Task query: In the AP News app, find the three most recent articles from the Technology section and the three most recent from the Business section. Read each full article, remember its title and main content, then create a Joplin note titled Tech & Business Digest with sectioned summaries.

Qwen3-VL-8B-Instruct

Failed 路 191 steps

ReAct-style baseline opens articles but loses the details before writing the Joplin note.

MemGUI-8B-SFT

Successful 路 140 steps

Our agent stores article evidence as structured memory and retrieves it when composing the final note.

MemGUI-Bench 路 235B 路 Compare Product Specs

Collecting product specs and writing a note

Task query: Open Amazon, collect screen size, battery, and storage for iPhone 15 Pro, Galaxy S24 Ultra, and Pixel 8 Pro, then write all nine facts into a Joplin note named Phone Spec Matrix.

Qwen3-VL-235B-Thinking

Failed 路 69 steps

Baseline remains inside Amazon after repeated scrolling and never writes the final Joplin note.

MemGUI-Agent-235B

Successful 路 53 steps

Our agent folds search spans, stores the nine facts, and completes the Joplin note.

MobileWorld 路 8B 路 Alibaba Headquarters Contact

Finding a business phone number and creating a contact

Task query: Find the phone number of Alibaba's Hangzhou headquarters on Google Maps, and based on that, create a new contact named Kevin Zhang with the company.

Qwen3-VL-8B-Instruct

Failed 路 20 steps

Baseline leaves Google Maps early and later hallucinates a phone number while editing Contacts.

MemGUI-8B-SFT

Successful 路 26 steps

Our agent extracts and persists the real number before creating the contact.

MobileWorld 路 235B 路 Mastodon Contact Update

Updating contact details from a social post

Task query: Olivia left new phone and email information in her latest Mastodon post. Update Olivia in Contacts, then send her the text message: Hello, how are you. Set the email label to internet.

Qwen3-VL-235B-Thinking

Failed 路 21 steps

Baseline exits Mastodon before Olivia's post loads and later reuses stale contact information.

MemGUI-Agent-235B

Successful 路 36 steps

Our agent stores Olivia's new phone and email, then uses them for Contacts and SMS.

01 / Main Results

Context efficiency and task success

MemGUI-Agent improves both zero-shot 235B and trained 8B settings on long-horizon mobile GUI benchmarks.

Context efficiency and benchmark performance of MemGUI-Agent
Context efficiency and benchmark performance of MemGUI-Agent.

Benchmark Standings

Benchmark standings

Click a table to open the corresponding public trajectory viewer.

02 / Method

ConAct: Context-as-Action

Instead of carrying an ever-growing raw transcript, MemGUI-Agent makes context updates explicit and executable inside each model response.

ConAct framework
ConAct treats context management as part of the action space: the policy updates folded history, folded UI state, and the recent step record while producing the next GUI action.

Folded Action History

Compresses completed interaction spans while preserving task-relevant progress.

Folded UI State

Stores persistent UI facts such as names, prices, values, and form state.

Recent Step Record

Keeps local interaction continuity for the next GUI action.

03 / Dataset and Case Study

MemGUI-3K and case study

MemGUI-3K supports SFT and offline analysis. The case study follows one long mobile workflow end to end.

MemGUI-3K dataset statistics
MemGUI-3K contains 2,956 trajectories and 64,430 reasonable-step SFT samples.
MemGUI-Agent case study
Paper case study showing compact context over a long mobile interaction.

04 / BibTeX

Reference

The paper is available on arXiv:2606.19926.

@misc{liu2026memguiagentendtoendlonghorizonmobile,
  title={MemGUI-Agent: An End-to-End Long-Horizon Mobile GUI Agent with Proactive Context Management},
  author={Guangyi Liu and Gao Wu and Congxiao Liu and Pengxiang Zhao and Liang Liu and Mading Li and Qi Zhang and Mengyan Wang and Liang Guo and Yong Liu},
  year={2026},
  eprint={2606.19926},
  archivePrefix={arXiv},
  primaryClass={cs.HC},
  url={https://arxiv.org/abs/2606.19926}
}