The ACM on Web Conference (WWW) at Singapore (2024)
Singapore
Published in Proceedings of the 18th European Conference on Computer Vision (ECCV), 2024
This work has introduced a new training method that enhances general-purpose vision-language understanding and image-oriented question answering through visual self-questioning.
Recommended citation: Sun, G., Qin, C., Wang, J., Chen, Z., Xu, R., & Tao, Z. (2024). SQ-LLaVA: Self-Questioning for Large Vision-Language Assistant. ECCV
Download Paper
Published in Empirical Methods in Natural Language Processing (EMNLP), 2024
This work have introduced a novel self-training approach to enhance the data efficiency of training LVLMs for medical tasks
Recommended citation: Sun, G., Qin, C., Fu, H., Wang, L., & Tao, Z. (2024). STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical.
Download Paper
Published in Proceedings of the ACM on Web Conference (WWW), 2024
This work efficiently improve the pre-trained vision-language networks in terms of robustness and performance when handling ID and OOD cases in image-text retrieval tasks via evidence knowledge.
Recommended citation: Guohao Sun, Yue Bai, Xueying Yang, Yi Fang, Yun Fu, and Zhiqiang Tao. 2024. Aligning Out-of-Distribution Web Images and Caption Semantics via Evidential Learning. WWW.
Download Paper
Published in Proceedings of the 41st International Conference on Machine Learning (ICML), 2024
This work refines the feature representations via prototype-feature association
Recommended citation: Han, C., Lu, Y., Sun, G., Liang, J., Cao, Z., Wang, Q., Guan, Q., Dianat, S.A., Rao, R.M., Geng, T., Tao, Z., & Liu, D. (2024). Prototypical Transformer as Unified Motion Learners. ICML
Download Paper
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
We introduced T-MASS, where text is modeled as a stochastic embedding, facilitating joint learning of the text mass and video points.
Recommended citation: Wang, J., Sun, G., Wang, P., Liu, D., Dianat, S.A., Rabbani, M., Rao, R.M., & Tao, Z. (2024). Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval. CVPR
Download Paper
Published:
This is a description of your talk, which is a markdown files that can be all markdown-ified like any other post. Yay markdown!
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.