researcher at zyphra
currently working on novel hardware aware pretraining architectures for low latency inference at scale, diffusion language models, sample efficient context extension, spectral clipping optimizers
-Rishi
Feel free to email me, I am almost always interested in meeting new people.
recent work
Training Foundation Models on a Full-Stack AMD Platform
Compressed Convolutional Attention