Multi-modal learning

Many interesting data we encounter in real applications are multi-modal: there exists multiple types of data tha reflect the same concept. How can we learn about them?

multi-modal

References

2024

On the opportunities and challenges of foundation models for geospatial artificial intelligence

Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, and 1 more author

ACM Transactions on Spatial Algorithms and Systems, Mar 2024

arXiv

2021

Towers of babel: Combining images, language, and 3d geometry for learning multimodal vision

Xiaoshi Wu, Hadar Averbuch-Elor, Jin Sun, and Noah Snavely

In Proceedings of the IEEE/CVF International Conference on Computer Vision, Mar 2021

2017

Generating holistic 3d scene abstractions for text-based image retrieval

Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I Morariu, and Larry S Davis

In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Mar 2017