CSCI 8000: New and Hot Topics in Computer Vision and Deep Learning

Spring 2023

Instructor: Prof. Jin Sun

4 Credit Hours

Catalog Description: Students will learn about the newest development and understanding of algorithms, systems, and best practices in computer vision and deep learning research and engineering. The class will focus on new and hot topics including: the family of vision and language transformers, diffusion models, neural scene representation and rendering, massive scale supervised and unsupervised learning, and neural network foundations. Students will read research papers and code implementations as exercises, present and lead discussions, and work on projects that are closely related to those interesting topics. The course is designed to provide students a state-of-the-art perspective to current computer vision and deep learning research with the goal to inspire future impactful research on those hot topics.

Prerequisties: Students should have knowledge about computer vision and deep learning basics.

Class Location and Times:
Tue & Thu 9:35 am - 10:50 am 222 Boyd Miller Plant Science Building, Room 2102
Wed 10:20 am - 11:10 am 222 Boyd Miller Plant Science Building, Room 1102
Google Map Direction

Reading Materials:

Student Outcomes:
  1. Demonstrate understanding of computer vision and deep neural network fundamentals.
  2. Gain experience deploying deep learning models on computer vision and nlp problems.
  3. Enhanced research skills, including reading papers, performing literature research, and analyzing cutting-edge research.

Instructor Contact:

Prof. Jin Sun
Office Hours: Thursdays 11 - 12 am or By Appointment
Office: 804 Boyd

Evaluation and Grading: The final course grade will be weighted as the follows:
Paper presentations 40%
Paper readings 20%
Course project 40%

Paper presentations: For each required reading paper, we will have around two students presenting the paper (about 45 mins) and leading the class discussion (about 20 mins). Since we will cover a wide-range of topics and problems, a good coverage of background or context for each paper will be very useful. Several essential components of a high-quality presentation are:
  1. Background: What problem this paper is working on? Why is it important?
  2. Related work: Before this work, how do other people work on this problem?
  3. Motivation of the proposed work: What makes the author(s) propose this work?
  4. Method: Describe the proposed algorithm and/or workflow.
  5. Evaluation: How is the method evaluated? What are the results?
  6. Summary and future directions: What is the main takeaway message? What follow-up work can be done?

Each student might present 1-2 times over the whole semester.

Paper readings: Every week, you will pick one of the required papers and write a short reading summary. This is to practice your skills in reading literature and critical thinking. Such a summary should include:
  1. Main message: What does the paper propose? Describe the main points in two or three sentences.
  2. Pros: What are the strengths of this paper? 1-3 bullet points are fine.
  3. Cons: What are the weaknesses of this paper? 1-3 bullet points are fine.
  4. Future directions: Discuss possible follow-up work from this paper. Two or three sentences are fine.
You don't need to write a summary for the week you are presenting a paper.

Team Project: You will work in a team on a course project. Each team should have 2-3 members. You are encouraged to design the project to solve a real-world application using deep learning and computer vision. Feel free to use any programming language or software packages of your choice. The schedule for the project is as follows:
  1. Project Proposal: The project proposal should clearly state what your team plan to do. It should be four pages long (not including references). It should contain a timeline. You should list the questions the project will address and that will be discussed in the report. You should list what software you will be using or will build upon. Describe the datasets you will use and how will you know if the project is successful. Describe the hypotheses you will test and the related work. You should be able to reuse much of the text for the final report.

  2. Project Milestone: You can re-use the project proposal for this report but expand it with additional content. You should talk about preliminary results and/or other measurable items listed in the proposal.

  3. Project Report and Presentation: The final report contains a complete description of the project: what you have done and what the result looks like. It should be about six to eight pages long (not including references). You are encouraged to format it in CVPR format. We will have a presentation session for all projects at the last day of the class. Make sure every member in your team participate in the presentation.

Class Schedule

Date Topic Required Readings Presenter(s) Background and Additional Readings
Jan 10 (Tue) Introduction and Background Prof. Sun
Jan 11 (Wed) Deep Learning Review Prof. Sun
Jan 12 (Thu) Discussion
Jan 17 (Tue) Attention and Transformers Attention Is All You Need (Transformer) Pradeep Kumar Ragu Chanthar, Srinivasa Sai Deepak Varukol, [slides] Illustrated Transformer
Attention? Attention!
Why multi-head self attention works?
Jan 18 (Wed) Discussion Deep learning development basics, Attention and transformer playground
Jan 19 (Thu) Vision Transformer An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT) Sixiang Zhang, Spencer King, [slides] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Jan 24 (Tue) Transformers and Foundation Models Language Models are Few-Shot Learners (GPT-3) Vaishnavi Thesma, Akhila Devabhaktuni, Zihao Wu, [slides]
Jan 25 (Wed) Discussion Vision Transformer playground d2l, uva
Jan 26 (Thu) Transformers and Foundation Models Finetuned language models are zero-shot learners Hemanth Reddy Jakkannapally, Wen Zhang, [slides] PaLM: Scaling Language Modeling with Pathways
Transformers learn in-context by gradient descent
Jan 31 (Tue) Transformers and Foundation Models Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks Yuchen Zhang, Kriti Ghosh, [slides]
Feb 1 (Wed) Discussion Data processing
Feb 2 (Thu) Transformers and Foundation Models A ConvNet for the 2020s Jashwanthreddy Katamreddy, Chenqian Xu, [slides]
Feb 7 (Tue) Image Generation Denoising Diffusion Probabilistic Models Xuansheng Wu, Daniel Redder, [slides]
Feb 8 (Wed) Discussion Diffusion model playground
Feb 9 (Thu) Image Generation High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion) Jacobi Coleman, Dongliang Guo, [slides]
Feb 14 (Tue) Image Generation and Edits DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation Ehsan Latif, Chetan Dhamane, [slides]
Feb 15 (Wed) Discussion Stable diffusion and inversion playground
Feb 16 (Thu) Image Generation and Edits An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion Venkatesh Morpoju, Padmaja Saraf, [slides] Prompt-to-Prompt Image Editing with Cross Attention Control
Feb 21 (Tue) Understanding Neural Networks Emergent Abilities of Large Language Models Mohammed Aldosari, Rutuja Talekar, [slides]
Feb 22 (Wed) Discussion Understanding backpropagation, gradient flows, and the optimization process
Feb 23 (Thu) Understanding Neural Networks What do Vision Transformers Learn? A Visual Exploration Krishna Paladugu, Keerthana Garimella, [slides]
Feb 28 (Tue) Understanding Neural Networks Can Neural Nets Learn the Same Model Twice? Investigating Reproducibility and Double Descent from the Decision Boundary Perspective Maansi Reddy Jakkidi, Nasid Habib Barna, [slides]
Mar 1 (Wed) Discussion Neural network training dynamics
Mar 2 (Thu) - Midterm Understanding Neural Networks Understanding deep learning (still) requires rethinking generalization Afsaneh Shams, Subas Rana, [slides] Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
Mar 7-9 Spring Break
Mar 14-16 Project Milestone Presentation
Mar 21 (Tue) Neural Scene Representation and Reconstruction NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis Vatsal Thakkar, Sheung Hang Sean Kan, [slides]
Mar 22 (Wed) Discussion Neural reconstruction playground
Mar 23 (Thu) Neural Scene Representation and Reconstruction Plenoxels: Radiance Fields without Neural Networks Likitha Karnati, Sakshi Seth, Ratish Jha, [slides]
Mar 28 (Tue) Training Large-Scale Neural Networks Training Compute-Optimal Large Language Models (Chinchila) Krushi Karukala, Rezwan Mahmud, [slides]
Mar 29 (Wed) Discussion Working with GPUs
Mar 30 (Thu) Training Large-Scale Neural Networks Reinforcement Learning from Human Feedback (RLHF) Swarali Gujarathi, Shivam Yadav, [slides] Training language models to follow instructions with human feedback
Apr 4 (Tue) Training Large-Scale Neural Networks LAION-5B: An open large-scale dataset for training next generation image-text models
Scaling Language-Image Pre-training via Masking
Noyon Dey, Vijay Iyengar Scaling Vision Transformers
Apr 5 (Wed) Discussion Mixed precision, parallelism, hardware
Apr 6 (Thu) Training Large-Scale Neural Networks How to train really large models on many GPUs? Rajat Rajesh Mhetre, Yucheng Shi
Apr 11 (Tue) Self-supervised Learning Masked Autoencoders Are Scalable Vision Learners Pranavpalreddy Pingili, Zhengliang Liu
Apr 12 (Wed) Discussion Visualizing neural networks
Apr 13 (Thu) Self-supervised Learning Barlow Twins: Self-Supervised Learning via Redundancy Reduction Aishwary Nigam, Pranathi Vankineni
Apr 18 (Tue) Self-supervised Learning Emerging Properties in Self-Supervised Vision Transformers Yousef Fekri, Vaibhav Goyal
Apr 19 (Wed) Discussion Explore self-supervision signals
Apr 20 (Thu) Self-supervised Learning Bootstrap your own latent: A new approach to self-supervised Learning Hao Zhen, Shanmukha Sai Jasti
Apr 25 (Tue) Project Presentation
Apr 26 (Wed) Project Presentation
Apr 27 (Thu) Project Presentation

School of Computing | University of Georgia | 2023