Publications

WebAgent-90K: A Large-Scale Dataset for Fine-Tuning Agent for Automatic Web Browsing Tasks

In this paper we present WebAgent-90K, web-interaction dataset with around 90K tasks collected via Evol-Instruct and an automated web agent based on GPT-4V, which can be used for training an automated web agent with open-sourced VLM. The Llava-v1.5 we finetuned with WebAgent-90K yielded similar performance as GPT-4V on Webvoyager.

Bingchen Zhao, Siwei Yang, Cihang Xie

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

This study introduces HQ-Edit, a high-quality instruction-based image editing dataset with around 200K edits. Unlike prior approaches relying on human feedback, we devise a scalable data collection pipeline leveraging self-instruct with advanced foundation models, namely GPT-4V and DALL-E 3.

Mude Hui, Siwei Yang, Bingchen Zhao, Yichun Shi, Heng Wang, Peng Wang, Yuyin Zhou, Cihang Xie

HQ-Edit: A High-Quality Dataset for Instruction-based Image Editing

AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability in Algorithmic Environments

This paper presents AQA-Bench, a benchmark for evaluating LLMs’ sequential reasoning abilities via interactive environments requiring model executing algorithms such as Binary search, DFS, and BFS. Our find includes (1) the inverse scaling between model sizes and performance, (2) the nuanced impact of naive in-context examples due to over-fitting in ICL, (3) weak models failing mainly due to incapability of starting well and (4) impressive improvement from a few given predecessor steps following the optimal policy.

Siwei Yang, Bingchen Zhao, Cihang Xie

AQA-Bench: An Interactive Benchmark for Evaluating LLMs’ Sequential Reasoning Ability in Algorithmic Environments

AsyInst: Asymmetric Affinity with DepthGrad and Color for Box-Supervised Instance Segmentation

Due to the optimization problem of the former symmetric pairwise affinity loss, it is only compatible with color affinity but not with other modalities. Our method alleviates this issue by introducing asymmetry, which not only makes it compatible with depth gradient affinity but also improves the performance with color affinity.

Siwei Yang, Longlong Jing, Junfei Xiao, Hang Zhao, Alan Yuille, Yingwei Li

Contrastive Multi-Task Dense Prediction

We discover that in a multi-task model, task-specific features follow a cross-task contrastive distribution, e.g. pixels with the same semantic label have similar features for depth estimation. Therefore, we devise a regularization method that can improve multi-task performance by enhancing this distribution.

Siwei Yang, Hangrong Ye, Dan Xu

XCon: Learning with Experts for Fine-grained Category Discovery

Learning to do category discovery within a fine-grained dataset is challenging, we present a method that learns to do so by partitioning the dataset into k sub-groups, and shows improved performance on several fine-grained datasets.

Yixin Fei, Zhongkai Zhao, Siwei Yang, Bingchen Zhao

XCon: Learning with Experts for Fine-grained Category Discovery

Reducing the feature divergence of RGB and near-infrared images using Switchable Normalization

Instance normalization reduces feature divergence between modalities while batch normalization keeps the discriminative distribution. Thus, segmentation models achieve better performance by utilizing both kinds of normalization.

Siwei Yang, Shaozuo Yu, Bingchen Zhao, Yin Wang

Reducing the feature divergence of RGB and near-infrared images using Switchable Normalization