Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval
Published in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024
We introduced T-MASS, where text is modeled as a stochastic embedding, facilitating joint learning of the text mass and video points.
Recommended citation: Wang, J., Sun, G., Wang, P., Liu, D., Dianat, S.A., Rabbani, M., Rao, R.M., & Tao, Z. (2024). Text Is MASS: Modeling as Stochastic Embedding for Text-Video Retrieval. CVPR
Download Paper