branch: master
Commits on master
- 002a140 Ptx store gate cast to bool (#4284) 1 year ago
- dbe3e1d or true fixes ci (#4283) 1 year ago
- 53853e6 save the schedule graph in SAVE_SCHEDULE (#4248) 1 year ago
- acb32e1 hotfix: PM4 supports timing 1 year ago
- ad28fde si.inputs+outputs -> bufs (#4279) 1 year ago
- 8401de9 resnet benchmark return early in eval (#4278) 1 year ago
- 38f97aa rename rawbufs to bufs in ExecItem (#4274) 1 year ago
- 60e3aa5 more docs (#4271) 1 year ago
- 6637ecc use IGNORE_JIT_FIRST_BEAM to not BEAM in jit cnt=0 (#4269) 1 year ago
- f3b4dff KFDProgram -> AMDProgram (#4268) 1 year ago
- 17328de setitem no return value (#4266) 1 year ago
- 3a48773 BERT dataloader (#4252) 1 year ago
- 6934114 Wikipedia preprocessing script (#4229) 1 year ago
- 759b4f4 few more KFD -> AMD (#4262) 1 year ago
- 6c25f1a Optimize ptx loops (#4263) 1 year ago
- 967638f update docs, remove corealize (#4264) 1 year ago
- 9b7efa7 hotfix: skip 0 line count files in sz.py 1 year ago
- acf4ba5 method cache respects beam option (#4261) 1 year ago
- 9a95781 renamed (#4260) 1 year ago
- 2ae4f45 WIP PM4 Support (#4110) 1 year ago
- 3f6c7ca test: fix test_tensor_core_padded on CUDA and add to benchmarks (#4258) 1 year ago
- a90de3b search: add additional 7 factors to the action space (#4256) 1 year ago
- de2b1fb update adding_new_accelerators doc (#4255) 1 year ago
- bbb0ad4 wmma: widen TC usage in search by using PADTO on TC axes when possible (#4216) 1 year ago
- 9e53d6c hotfix: 8000 lines 1 year ago
- e6227bd nv driver (#4044) 1 year ago
- 77a3780 assert reduce recompute (#4250) 1 year ago
- a9bc7c1 unify assign tests (#4247) 1 year ago
- 37f8be6 resnet print epoch ops and mem in benchmark (#4244) 1 year ago
- 7bc8627 Improves error message when CUDA module fails to load. (#4243) 1 year ago