WebAgent-90K: A Large-Scale Dataset for Fine-Tuning Agent for Automatic Web Browsing Tasks


In this work, we aim to provide the first large-scale dataset for fine-tuning large-language models for web browsing agent tasks. Previous works on designing agents for automatic web browsing tasks often only prompt the GPT-4 model for processing the webpage and generating action. This paradigm is not suitable for many tasks that may relate to the user’s privacy, not to mention the cost of API calls. If one can build offline open-sourced models that can perform this web browsing task, then many more applications can be implemented. We aim to address this gap by collecting over 90k high-quality and diverse interaction data between GPT-4V and a web browser. These data record the interaction trajectory of the agent using the web browser to complete various tasks from booking a restaurant to checking the ticket price for a flight. We curated over 8k tasks as seeds, and leveraged Evol-Instruct to further diversify the tasks during the data collection process. Our experiments on our proposed dataset demonstrate its potential in producing open-sourced GPT-4V level web browsing agents that can be used to automate various tasks in web browsing scenarios.

In submission to The Thirty-eight Conference on Neural Information Processing Systems Datasets and Benchmarks Track
Siwei Yang
Siwei Yang
Ph.D. in Computer Science

My research interests include distributed robotics, mobile computing and programmable matter.