Research

I research deep learning optimization techniques and neural net training dynamics, with a focus on ways to characterize better how and what AI models learn. This work aims at the greater goal of finding new ways to train models more efficiently based on provable guarentees, or even preempitively identify training roadbloacks that can arise before any actual model has been trained.

Modern deep learning models with billions or trillions of learnable parameters have expanded the capabilities of what complex tasks are possible to learn. However, as model sizes grow towards becoming more heavily overparameterized, training these models becomes a herculean task: requiring immense compute resources, multiple months of training, and potentially hundreds of millions of dollars. My work aims to alleviate these burdens to be able to train models cheaply without jeapordizing downstream performance.

Joshua DeOliveira

Research