CSCI 8000: New and Hot Topics in Computer Vision and Deep Learning

Spring 2023

Instructor: Prof. Jin Sun

4 Credit Hours

Catalog Description: Students will learn about the newest development and understanding of algorithms, systems, and best practices in computer vision and deep learning research and engineering. The class will focus on new and hot topics including: the family of vision and language transformers, diffusion models, neural scene representation and rendering, massive scale supervised and unsupervised learning, and neural network foundations. Students will read research papers and code implementations as exercises, present and lead discussions, and work on projects that are closely related to those interesting topics. The course is designed to provide students a state-of-the-art perspective to current computer vision and deep learning research with the goal to inspire future impactful research on those hot topics.

Prerequisties: Students should have knowledge about computer vision and deep learning basics.

Class Location and Times:

Tue & Thu	9:35 am - 10:50 am	~~222 Boyd~~ Miller Plant Science Building, Room 2102
Wed	10:20 am - 11:10 am	~~222 Boyd~~ Miller Plant Science Building, Room 1102
Google Map Direction

Reading Materials:

“Deep Learning” (2016) by Ian Goodfellow, Yoshua Bengio, and Aaron Courville. A free online version is available here.
“Dive into Deep Learning” (2021) by Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola. A free online version is available here.
“Computer Vision: Algorithms and Applications” (2022, 2nd ed) by Richard Szeliski. A free online version is available here.

Student Outcomes:

Demonstrate understanding of computer vision and deep neural network fundamentals.
Gain experience deploying deep learning models on computer vision and nlp problems.
Enhanced research skills, including reading papers, performing literature research, and analyzing cutting-edge research.

Instructor Contact:

Prof. Jin Sun
Office Hours: Thursdays 11 - 12 am or By Appointment
Office: 804 Boyd
Email: jinsun@uga.edu

Evaluation and Grading: The final course grade will be weighted as the follows:

Paper presentations	40%
Paper readings	20%
Course project	40%

Paper presentations: For each required reading paper, we will have around two students presenting the paper (about 45 mins) and leading the class discussion (about 20 mins). Since we will cover a wide-range of topics and problems, a good coverage of background or context for each paper will be very useful. Several essential components of a high-quality presentation are:

Background: What problem this paper is working on? Why is it important?
Related work: Before this work, how do other people work on this problem?
Motivation of the proposed work: What makes the author(s) propose this work?
Method: Describe the proposed algorithm and/or workflow.
Evaluation: How is the method evaluated? What are the results?
Summary and future directions: What is the main takeaway message? What follow-up work can be done?

Each student might present 1-2 times over the whole semester.

Paper readings: Every week, you will pick one of the required papers and write a short reading summary. This is to practice your skills in reading literature and critical thinking. Such a summary should include:

Main message: What does the paper propose? Describe the main points in two or three sentences.
Pros: What are the strengths of this paper? 1-3 bullet points are fine.
Cons: What are the weaknesses of this paper? 1-3 bullet points are fine.
Future directions: Discuss possible follow-up work from this paper. Two or three sentences are fine.

You don't need to write a summary for the week you are presenting a paper.

Team Project: You will work in a team on a course project. Each team should have 2-3 members. You are encouraged to design the project to solve a real-world application using deep learning and computer vision. Feel free to use any programming language or software packages of your choice. The schedule for the project is as follows:

Project Proposal: The project proposal should clearly state what your team plan to do. It should be four pages long (not including references). It should contain a timeline. You should list the questions the project will address and that will be discussed in the report. You should list what software you will be using or will build upon. Describe the datasets you will use and how will you know if the project is successful. Describe the hypotheses you will test and the related work. You should be able to reuse much of the text for the final report.

Project Milestone: You can re-use the project proposal for this report but expand it with additional content. You should talk about preliminary results and/or other measurable items listed in the proposal.

Project Report and Presentation: The final report contains a complete description of the project: what you have done and what the result looks like. It should be about six to eight pages long (not including references). You are encouraged to format it in CVPR format. We will have a presentation session for all projects at the last day of the class. Make sure every member in your team participate in the presentation.

Class Schedule

Date	Topic	Required Readings	Presenter(s)	Background and Additional Readings
Jan 10 (Tue)	Introduction and Background		Prof. Sun
Jan 11 (Wed)	Deep Learning Review		Prof. Sun
Jan 12 (Thu)	Discussion
Jan 17 (Tue)	Attention and Transformers	Attention Is All You Need (Transformer)	Pradeep Kumar Ragu Chanthar, Srinivasa Sai Deepak Varukol, [slides]	Illustrated Transformer Attention? Attention! Why multi-head self attention works?
Jan 18 (Wed)	Discussion	Deep learning development basics, Attention and transformer playground
Jan 19 (Thu)	Vision Transformer	An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)	Sixiang Zhang, Spencer King, [slides]	Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Jan 24 (Tue)	Transformers and Foundation Models	Language Models are Few-Shot Learners (GPT-3)	Vaishnavi Thesma, Akhila Devabhaktuni, Zihao Wu, [slides]
Jan 25 (Wed)	Discussion	Vision Transformer playground d2l, uva
Jan 26 (Thu)	Transformers and Foundation Models	Finetuned language models are zero-shot learners	Hemanth Reddy Jakkannapally, Wen Zhang, [slides]	PaLM: Scaling Language Modeling with Pathways Transformers learn in-context by gradient descent
Jan 31 (Tue)	Transformers and Foundation Models	Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks	Yuchen Zhang, Kriti Ghosh, [slides]
Feb 1 (Wed)	Discussion	Data processing
Feb 2 (Thu)	Transformers and Foundation Models	A ConvNet for the 2020s	Jashwanthreddy Katamreddy, Chenqian Xu, [slides]
Feb 7 (Tue)	Image Generation	Denoising Diffusion Probabilistic Models	Xuansheng Wu, Daniel Redder, [slides]
Feb 8 (Wed)	Discussion	Diffusion model playground
Feb 9 (Thu)	Image Generation	High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)	Jacobi Coleman, Dongliang Guo, [slides]
Feb 14 (Tue)	Image Generation and Edits	DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation	Ehsan Latif, Chetan Dhamane, [slides]
Feb 15 (Wed)	Discussion	Stable diffusion and inversion playground
Feb 16 (Thu)	Image Generation and Edits	An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion	Venkatesh Morpoju, Padmaja Saraf, [slides]	Prompt-to-Prompt Image Editing with Cross Attention Control
Feb 21 (Tue)	Understanding Neural Networks	Emergent Abilities of Large Language Models	Mohammed Aldosari, Rutuja Talekar, [slides]
Feb 22 (Wed)	Discussion	Understanding backpropagation, gradient flows, and the optimization process
Feb 23 (Thu)	Understanding Neural Networks	What do Vision Transformers Learn? A Visual Exploration	Krishna Paladugu, Keerthana Garimella, [slides]
Feb 28 (Tue)	Understanding Neural Networks	Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective	Maansi Reddy Jakkidi, Nasid Habib Barna, [slides]
Mar 1 (Wed)	Discussion	Neural network training dynamics
Mar 2 (Thu) - Midterm	Understanding Neural Networks	Understanding deep learning (still) requires rethinking generalization	Afsaneh Shams, Subas Rana, [slides]	Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
Mar 7-9	Spring Break
Mar 14-16	Project Milestone Presentation
Mar 21 (Tue)	Neural Scene Representation and Reconstruction	NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis	Vatsal Thakkar, Sheung Hang Sean Kan, [slides]
Mar 22 (Wed)	Discussion	Neural reconstruction playground
Mar 23 (Thu)	Neural Scene Representation and Reconstruction	Plenoxels: Radiance Fields without Neural Networks	Likitha Karnati, Sakshi Seth, Ratish Jha, [slides]
Mar 28 (Tue)	Training Large-Scale Neural Networks	Training Compute-Optimal Large Language Models (Chinchila)	Krushi Karukala, Rezwan Mahmud, [slides]
Mar 29 (Wed)	Discussion	Working with GPUs
Mar 30 (Thu)	Training Large-Scale Neural Networks	Reinforcement Learning from Human Feedback (RLHF)	Swarali Gujarathi, Shivam Yadav, [slides]	Training language models to follow instructions with human feedback
Apr 4 (Tue)	Training Large-Scale Neural Networks	LAION-5B: An open large-scale dataset for training next generation image-text models Scaling Language-Image Pre-training via Masking	Noyon Dey, Vijay Iyengar	Scaling Vision Transformers
Apr 5 (Wed)	Discussion	Mixed precision, parallelism, hardware
Apr 6 (Thu)	Training Large-Scale Neural Networks	How to train really large models on many GPUs?	Rajat Rajesh Mhetre, Yucheng Shi
Apr 11 (Tue)	Self-supervised Learning	Masked Autoencoders Are Scalable Vision Learners	Pranavpalreddy Pingili, Zhengliang Liu
Apr 12 (Wed)	Discussion	Visualizing neural networks
Apr 13 (Thu)	Self-supervised Learning	Barlow Twins: Self-Supervised Learning via Redundancy Reduction	Aishwary Nigam, Pranathi Vankineni
Apr 18 (Tue)	Self-supervised Learning	Emerging Properties in Self-Supervised Vision Transformers	Yousef Fekri, Vaibhav Goyal
Apr 19 (Wed)	Discussion	Explore self-supervision signals
Apr 20 (Thu)	Self-supervised Learning	Bootstrap your own latent: A new approach to self-supervised Learning	Hao Zhen, Shanmukha Sai Jasti
Apr 25 (Tue)	Project Presentation
Apr 26 (Wed)	Project Presentation
Apr 27 (Thu)	Project Presentation

School of Computing | University of Georgia | 2023