Talk Tuesday ML Reproducibility Sources of Algorithmic Implementation and Observational Variability

Рет қаралды 13

Күн бұрын

iHARP Talk Tuesday -- October 29, 2024
Talk Title
ML Reproducibility: Sources of Algorithmic, Implementation, and Observational Variability
Speaker
Kevin Coakley, Computational and Data Science Research Specialist at the San Diego Supercomputer Center and UC San Diego
Abstract
Reproducibility is fundamental to scientific research, as it underpins trust, progress, and credibility. In machine learning (ML), achieving reproducibility is difficult due to variability in algorithms, implementations, and observational factors. This presentation explores key contributors to irreproducibility in ML, including algorithmic factors like hyperparameter tuning and random weight initialization, implementation differences in software and hardware, and observational factors such as dataset bias and data preprocessing. It emphasizes the need to view ML model performance as a distribution, not a single metric or average of results, and clarifies the difference between reproducibility and portability. The goal is to guide researchers on improving ML reproducibility and identifying the critical information necessary for replicating experimental outcomes.
Speaker Bio
Kevin Coakley is a Computational and Data Science Research Specialist at the San Diego Supercomputer Center and UC San Diego focusing on AI reproducibility. Kevin holds a MAS in Architecture-based Enterprise Systems Engineering and Leadership from UC San Diego and is pursuing a PhD in Computer Science at the Norwegian University of Science and Technology. Kevin specializes in training and evaluating machine learning models for accuracy and reproducibility in applications like image recognition, time series prediction, and natural language processing.