I am a Machine Learning Research Scientist in TikTok/ByteDance, Singapore.
I received my PhD in computer science (AI/ML) from Singapore University of Technology and Design (SUTD),
advised by Prof. Ngai-Man Cheung.
My research interests lie in multimodal AI, large language models and data / parameter - efficient training of AI models.
During my PhD, I am fortunate to have research experience with
Chao Du and Tianyu Pang and being advised by Prof. Shuicheng Yan and Min Lin
at Sea AI Lab.
I had great pleasure to collaborate with Henghui Ding and Houjing Huang at TikTok / Bytedance AI Lab.
I also spent a winter at Microsoft-Research Asia Lab and explored the capabilities of multimodal foundation models.
Large VLMs such as GPT-4 achieve unprecedented performance in response generation,
esp. with visual inputs, enabling more creative and adaptable interaction than LLMs like ChatGPT.
However, multimodal generation exacerbates safety concerns, since adversaries may successfully evade the entire system by subtly manipulating the most vulnerable modality (e.g., vision).
We evaluate the robustness of open-source large VLMs (e.g., MiniGPT-4, LLaVA, BLIP, UniDiffuser) in the most realistic and high-risk setting,
where adversaries have only black-box system access and seek to deceive the model into returning the targeted responses.
NeurIPS 2023, New Orleans, Louisiana, United States.
[Paper]
[Webpage]
[Code]
Through interpretable GAN dissection tools, we demonstrate that fine-tuning based methods cannot effectively remove knowledge that is incompatible
to the target domain after adaptation (e.g., trees /buildings on the sea) for few-shot image generation task.
We propose Remove In-Compatible Knowledge (RICK), an efficient and dynamic algorithm that estimates the filter importance and prune those are incompatible
to the target domain.
CVPR 2023, Vancouver, British Columbia, Canada.
[Paper]
[Webpage]
[Code]
We propose a method to improve the generalizability for cross-domain few-shot classification problem using born-again networks.
Our algorithm does not require additional parameters and training data and can be applied readily to many exisiting FSC models.
The key insight is to distill the dark knowledge from a teacher model with additional multi-task objectives designed specific for
cross-domain few-shot learning.
IEEE Trans. on Image Processing (TIP) 2023
[Paper]
[Webpage]
[Code]
When fine-tuning a pretrained image generator on few-shot target samples, we show that state-of-the-art algorithms perform no-better
than a simple baseline method when the target samples are distant to the source domain.
We propose AdAM, a parameter-efficient and target-aware method to select source knowledge important for few-shot adaptation.
NeurIPS 2022, New Orleans, Louisiana, United States.
[Paper]
[Webpage]
[Code]
We investigate the compatibility between label smoothing (LS) and knowledge distillation (KD), i.e., to smooth or not to smooth a teacher network?
We discover, analyze and validate the proposed systematic diffusion as the missing concept which is instrumental in understanding and resolving these contradictory findings in prior works.
This systematic diffusion essentially curtails the benefits of distilling from an LS-trained teacher, thereby rendering KD at increased temperatures ineffective.
ICML 2022, Baltimore, Maryland, United States.
[Paper]
[Webpage]
[Code]
We analyze the existing few-shot image generation algorithms in a unified testbed and find that
diversity degradation is the major issue during few-shot target adaptation.
Our proposed mutual information based algorithm can alleviate this issue and achieve state-of-the-art performance
on few-shot image generation tasks.
CVPR 2022, New Orleans, Louisiana, United States.
[Paper]
[Webpage]
[Code]
Workshop & Challenge
Explanation-guided Training for Cross-domain Few-shot Classification
Jiamei Sun, Sebastian Lapuschkin, Wojciech Samek, Yunqing Zhao,
Ngai-Man Cheung,
Alexander Binder†