AUITestAgent: Natural Language-Driven GUI Functional Bug Tester

Yongxiang Hu¹, Xuan Wang¹, Yingchuan Wang¹, Yu Zhang², Shiyu Guo², Chaoyi Chen², Xin Wang^1,3 and Yangfan Zhou^1,3

¹School of Computer Science, Fudan University
²Meituan, China
³Shanghai Key Laboratory of Intelligent Information Processing, Shanghai, China

English | 简体中文 | 日本語

🌟 Introduction

AUITestAgent is the first automatic, natural language-driven GUI testing tool for mobile apps, capable of fully automating the entire process of GUI interaction and function verification. It takes test requirements written in natural language as input, generates and conducts UI interactions, and verifies whether the UI response aligns with the expectations outlined in the requirements.

To enhance the performance of LLM-based agents in the domain-specific area of UI testing, AUITestAgent decouples GUI interaction and function verification into two separate modules, performing verification after the interaction.

In terms of implementation, AUITestAgent extracts GUI interactions from test requirements using dynamically organized agents to tackle the diversity of requirement expressions. Then, a multi-dimensional data extraction strategy is employed to retrieve data relevant to the test requirements from the interaction trace and perform verification.

📺 Demo

Using AUITestAgent in Meituan

Task: View the rating of the first scenic spot in the scenic view, check whether its rating is consistent

demo1.mp4

Using AUITestAgent in Facebook

Task: Send a post with content 'Hello everyone' and like it, check whether it is correctly displayed, and whether the like button turns blue

demo2.mp4

📝 Evaluation

We evaluate AUITestAgent’s performance with two customized benchmark, interaction benchmark and verification benchmark, including 8 widely used commercial apps (i.e., Meituan, Little Reb Book, Douban, Facebook, Gmail, linkedIn, Google play and YouTube Music). To provide a comprehensive assessment, we categorized the difficulty of interaction tasks into three levels: easy (L1), moderate (L2), and difficult (L3). For each level, we constructed ten interaction tasks, with descriptions evenly split between English and Chinese.

Our experiments reveal that AUITestAgent accurately completes 100% tasks at Level 1, 80% of Level 2 tasks, and 50% of Level 3. Additionally, 94% of the interactions generated by AUITestAgent align with the ground truth through manual interactions. These metrics demonstrate that AUITestAgent significantly outperforms existing methods in translating natural language commands to GUI interactions. Moreover, AUITestAgent achieves a recall of 90% for injected GUI functional bugs while maintaining a low false positive rate of just 4.5%. Furthermore, its success in detecting unseen bugs in Meituan underscores the practical advantages of using AUITestAgent for GUI testing in complex commercial apps.

For detail information, please refer to our paper and evalution results.

GUI Interaction

For detail results, please refer to the interaction benchmark.

Baseline:

Function Verification

For detail results, please refer to the verification benchmark.

Since AUITestAgent is the first to focus on natural language driven GUI function verification and there are no existing studies in this field, we constructed a verification method based on multi-turn dialogue using GPT-4o as a baseline.

📚 Citation

If you find this work helpful to your research, please kindly consider citing our paper.

@misc{hu2024auitestagent,
      title={AUITestAgent: Automatic Requirements Oriented GUI Function Testing}, 
      author={Yongxiang Hu and Xuan Wang and Yingchuan Wang and Yu Zhang and Shiyu Guo and Chaoyi Chen and Xin Wang and Yangfan Zhou},
      year={2024},
      eprint={2407.09018},
      archivePrefix={arXiv},
      primaryClass={cs.SE}
}

🧑 Team introduction

AUITestAgent is joint work from Prof. Zhou’s team at Fudan University and the Meituan In-Store R&D platform. We have long been dedicated to the field of AI for full-stack front-end technology. In addition to AUITestAgent, we have developed several other technological innovations, including vision-ui, Appaction and AutoConsis.

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
assets		assets
evaluation_results		evaluation_results
README.md		README.md
README_ja.md		README_ja.md
README_zh.md		README_zh.md
interaction.md		interaction.md
interaction_zh.md		interaction_zh.md
verification.md		verification.md
verification_zh.md		verification_zh.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AUITestAgent: Natural Language-Driven GUI Functional Bug Tester

🌟 Introduction