Name: | Dave Zhenyu Chen |
---|---|
Position: | Ph.D Candidate |
E-Mail: | zhenyu.chen@tum.de |
Phone: | +49-89-289-17501 |
Room No: | 02.07.037 |
I'm a Ph.D. candidate at TUM Visual Computing Group, advised by Prof. Dr. Matthias Niessner and Prof. Dr. Angel X. Chang. Previously, I received my Master's Degree in Informatics at Ludwig Maximilians University of Munich (LMU). Prior to this, I got my Bachelor's Degree in Computer Science at University of Electronic Science and Technology of China (UESTC). Homepage
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding |
---|
Dave Zhenyu Chen, Ronghang Hu, Xinlei Chen, Matthias Nießner, Angel X. Chang |
arXiv |
we propose UniT3D, a simple yet effective fully unified transformer-based architecture for jointly solving 3D visual grounding and dense captioning. UniT3D enables learning a strong multimodal representation across the two tasks through a supervised joint pre-training scheme with bidirectional and seq-to-seq objectives. |
[bibtex][project page] |
D3Net: A Speaker-Listener Architecture for Semi-supervised Dense Captioning and Visual Grounding in RGB-D Scans |
---|
Dave Zhenyu Chen, Qirui Wu, Matthias Nießner, Angel X. Chang |
ECCV 2022 |
We present D3Net, an end-to-end neural speaker-listener architecture that can detect, describe and discriminate. Our D3Net unifies dense captioning and visual grounding in 3D in a self-critical manner. This self-critical property of D3Net also introduces discriminability during object caption generation and enables semi-supervised training on ScanNet data with partially annotated descriptions. |
[video][bibtex][project page] |
Scan2Cap: Context-aware Dense Captioning in RGB-D Scans |
---|
Dave Zhenyu Chen, Ali Gholami, Matthias Nießner, Angel X. Chang |
CVPR 2021 |
We introduce the new task of dense captioning in RGB-D scans with a model that can densely localize objects in a 3D scene and describe them using natural language in a single forward pass. |
[video][code][bibtex][project page] |
ScanRefer: 3D Object Localization in RGB-D Scans using Natural Language |
---|
Dave Zhenyu Chen, Angel X. Chang, Matthias Nießner |
ECCV 2020 |
We propose ScanRefer, a method that learns a fused descriptor from 3D object proposals and encoded sentence embeddings, to address the newly introduced task of 3D object localization in RGB-D scans using natural language descriptions. Along with the method we release a large-scale dataset of 51,583 descriptions of 11,046 objects from 800 ScanNet scenes. |
[video][code][bibtex][project page] |