branch: master
Commits on master
- 2d0bfdf ptx cleanup (#3893) 2 years ago
- 2e39f57 move lines around in ops_python wmma (#3911) 2 years ago
- e27129a Fix linearizer failure 26 test (#3906) 2 years ago
- 10673d1 tiny search cleanup (#3910) 2 years ago
- 9a9cac5 add lars to nn (#3750) 2 years ago
- 8c8b57f cleanup ops python (#3908) 2 years ago
- 2c69888 include negative float in test_dtype (#3884) 2 years ago
- e22d78b training cifar with BF16 on CUDA (#3905) 2 years ago
- 0145366 wmma: fix the AMD TC threads to split the first 16 threads (#3904) 2 years ago
- 7c3632f add --minimal flag to nvrtc (#3899) 2 years ago
- a2b2597 replace dtype.name str with render_dtype (#3903) 2 years ago
- 24d004a hotfix check ckpts before writing achieved model (#3901) 2 years ago
- 4d566f1 touchup einsum (#3900) 2 years ago
- 556dcfb Fix the result permutation in einsum (#3895) 2 years ago
- 4e18dd7 faster program start in llvm (#3897) 2 years ago
- 46a3501 nv ioctl sniffer (#3892) 2 years ago
- 18e0cef cheap less lines in ptx (#3890) 2 years ago
- f0c4e06 fix cuda sync (#3888) 2 years ago
- 2d3ce53 touchup test_dtype.test_gradient_dtype (#3887) 2 years ago
- fc11808 initialize Tensor grad same type as self (#3613) 2 years ago
- 8db7a6b debug: add optional detailed BEAM_LOG logging (#3883) 2 years ago
- f7f67e0 simple fix llama shard with quantize (#3882) 2 years ago
- ee502c8 fixup to_movement_ops and add back to CI (#3881) 2 years ago
- 16e31f7 init multidevice cuda graph (#3858) 2 years ago
- 0c197b9 hotfix: hip bfloat formatting 2 years ago
- 54dc48a fix assign (#3878) 2 years ago
- 5587594 fuzz_linearizer: add --ast and --file params to read kernels (#3877) 2 years ago
- c5467e5 diverse test value in test_dtype DATA based on dtype (#3864) 2 years ago
- 86ee36e preschedule all (#3875) 2 years ago
- d8c3f18 Use UOpGraph in test (#3876) 2 years ago