Project 5: Large Language Models for Science

Project Summary

This project offers a deep dive into training, fine-tuning, and evaluating large language models (LLMs) of various sizes, focusing on their capabilities, limitations, and emergent behaviors as predicted by scaling laws. Students will learn parallelism techniques for efficient multi-GPU training, develop distributed training code in Python using PyTorch, and gain hands-on experience with profiling and optimizing performance. The project also explores evaluation tools and methods for understanding LLM behavior and includes applications such as protein design using genomic data through models like MProt-DPO and GenSLMs. Additionally, students will gain an overview of vision transformers (ViTs) and examine how they are applied to real-world challenges like short- and medium-term weather forecasting.

Learning Objectives

Understand foundational concepts of neural network architectures and training
Describe the architecture and core components of transformers
Compare different distributed training strategies
Apply LLMs to real-world scientific problems

Project Lead

Sam Foreman

Computational Scientist, Argonne National Laboratory

Peer Mentor

Rene Montelongo

California State University: Northridge