Project Summary
This project offers a deep dive into training, fine-tuning, and evaluating large language models (LLMs) of various sizes, focusing on their capabilities, limitations, and emergent behaviors as predicted by scaling laws. Students will learn parallelism techniques for efficient multi-GPU training, develop distributed training code in Python using PyTorch, and gain hands-on experience with profiling and optimizing performance. The project also explores evaluation tools and methods for understanding LLM behavior and includes applications such as protein design using genomic data through models like MProt-DPO and GenSLMs. Additionally, students will gain an overview of vision transformers (ViTs) and examine how they are applied to real-world challenges like short- and medium-term weather forecasting.
Learning Objectives
- Understand foundational concepts of neural network architectures and training
- Describe the architecture and core components of transformers
- Compare different distributed training strategies
- Apply LLMs to real-world scientific problems
