The Deep Learning Approach
This course focuses on modern computer vision as representation learning, emphasizing foundation models and unified architectures that work across diverse visual tasks.
This course focuses on modern computer vision as representation learning, emphasizing foundation models and unified architectures that work across diverse visual tasks. We prioritize understanding how visual representations are learned, why certain architectures work, and how to effectively use and adapt state-of-the-art pretrained models.
Class introduction, overview
Math, coding, ML, and DL basics
CNN, UNet, Vision Transformers, etc
How to learn representations with visual signals alone?
How to learn representations with semantic labels?
Neural network approach for learning geometric representations
Diffusion models, token-based, video generation
VQA, GUI agent
How to design one model that rules them all?
Metrics, benchmarks, and how to tell if a model is good?
How to make sure our models are fair and ethical?
Basic knowledge of linear algebra, calculus, Python
Individual Exam (40%), Group Project (60%)
Jin Sun, PhD
School of Computing, UGA
Spring 2026
We view neural networks as the native solution to vision problems. We fully embrace the neural network approach, taking a fundamental view of learning representations in the computer vision domain.