Deepak Narayanan
Merge branch 'check_nan_in_grad' into 'main'
Ongoing research training transformer models at scale
Python
main