Computer Vision

The Deep Learning Approach

This course focuses on modern computer vision as representation learning, emphasizing foundation models and unified architectures that work across diverse visual tasks.

View Schedule

Course Overview

This course focuses on modern computer vision as representation learning, emphasizing foundation models and unified architectures that work across diverse visual tasks. We prioritize understanding how visual representations are learned, why certain architectures work, and how to effectively use and adapt state-of-the-art pretrained models.

Learning Goals

✓ Understand how modern vision representations are learned
✓ Use and adapt state-of-the-art pretrained models
✓ Understand vision across 2D, 3D, time, and generation

Course Themes

• Foundations of Neural Visual Representation
• Self-Supervised & Foundation Vision Models
• Vision + Language, 3D & Geometry, Generation

Course Schedule

0: Introduction

Class introduction, overview

1: Foundations and basics

Math, coding, ML, and DL basics

2: Neural nets

CNN, UNet, Vision Transformers, etc

3: Unsupervised & Self-supervised learning

How to learn representations with visual signals alone?

4: Supervised learning with semantics

How to learn representations with semantic labels?

5: Supervised learning with geometry

Neural network approach for learning geometric representations

6: Generation

Diffusion models, token-based, video generation

7: Visual reasoning

VQA, GUI agent

8: Vision foundation models

How to design one model that rules them all?

9: Evaluation and benchmarks

Metrics, benchmarks, and how to tell if a model is good?

10: Bias and ethics

How to make sure our models are fair and ethical?

Course Information

📚

Prerequisites

Basic knowledge of linear algebra, calculus, Python

📝

Grading

Individual Exam (40%), Group Project (60%)

👨‍🏫

Instructor

Jin Sun, PhD
School of Computing, UGA

📅

Semester

Spring 2026

Course Philosophy

We view neural networks as the native solution to vision problems. We fully embrace the neural network approach, taking a fundamental view of learning representations in the computer vision domain.