Multi-modal learning

Many interesting data we encounter in real applications are multi-modal: there exists multiple types of data tha reflect the same concept. How can we learn about them?

References

2024

  1. geoAI.png
    On the opportunities and challenges of foundation models for geospatial artificial intelligence
    Gengchen Mai, Weiming Huang, Jin Sun, Suhang Song, Deepak Mishra, Ninghao Liu, Song Gao, Tianming Liu, Gao Cong, Yingjie Hu, and 1 more author
    ACM Transactions on Spatial Algorithms and Systems, Mar 2024

2021

  1. Towers of babel: Combining images, language, and 3d geometry for learning multimodal vision
    Xiaoshi Wu, Hadar Averbuch-Elor, Jin Sun, and Noah Snavely
    In Proceedings of the IEEE/CVF International Conference on Computer Vision, Mar 2021

2017

  1. Generating holistic 3d scene abstractions for text-based image retrieval
    Ang Li, Jin Sun, Joe Yue-Hei Ng, Ruichi Yu, Vlad I Morariu, and Larry S Davis
    In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Mar 2017