in | | / \ / \ / \ W(x) V(x) | | | ↓ | ReLU² | | \ / \ / \ / ⊗ | ↓ out


researcher at zyphra
currently working on novel hardware aware pretraining architectures for low latency inference at scale, diffusion language models, sample efficient context extension, spectral clipping optimizers
-Rishi

Feel free to email me, I am almost always interested in meeting new people.

recent work

Training Foundation Models on a Full-Stack AMD Platform


Compressed Convolutional Attention
'_'