branch: master
Commits on master
- 1c51d58 replace raise Exception with specific errors (#3874) 2 years ago
- 8ef5490 cuda tranfer + async copyin (#3873) 2 years ago
- 624bc89 PTX - implement float 4, ptr arithmetics and other speed improvements (#3775) 2 years ago
- f405543 don't include hip common (#3851) 2 years ago
- 4a27ce6 tiny version of amd_hip_bfloat16 (#3868) 2 years ago
- 82ce60e use JIT_BATCH_SIZE=4 for GPT2 3090 benchmark (#3870) 2 years ago
- fe6ceff proposal: multioutput JIT spec (#3856) 2 years ago
- a26090d search: change to use "spawn" and limit the number of tasks per child (#3862) 2 years ago
- dca69df hot fix use DEBUG >= 3 for allreduce message (#3869) 2 years ago
- 6729f20 Ring allreduce try 2 (#3852) 2 years ago
- 3c0478b fuzz_linearizer: add additional DEBUG info for comparison errors (#3866) 2 years ago
- bc48272 lower hlb_cifar acc to 93.3 (#3865) 2 years ago
- e50b7ab diversed buf inputs based on dtype in fuzz_linearizer (#3863) 2 years ago
- c40f784 reuse fuzz_linearizer.compare_linearizer in test_linearizer_failures (#3861) 2 years ago
- 30fa032 reuse fuzz_linearizer.compare_linearizer in test_linearizer_failures (#3861) 2 years ago
- 33dd99a remove helper_add_store from test_linearizer_failures (#3860) 2 years ago
- 6bf0b82 alloc new output in fuzz_linearizer between baseline and real one (#3859) 2 years ago
- b78352b do not create structs every call in CUDAProgram (#3855) 2 years ago
- e5745c1 fix nan on multigpus cuda (#3854) 2 years ago
- 4e0819e fixing the benchmark not printing in handcode resnet50 opt example (#3850) 2 years ago
- 85691c8 fix hsa sync issue (#3847) 2 years ago
- f271cd6 user _resolve_dim in argmax (#3846) 2 years ago
- 5c4cf62 fix View.pad arg type (#3845) 2 years ago
- 6d5dec2 log optimized kernels and a script to compare with non-optimized ones (#3829) 2 years ago
- 9d1d08f show llama bandwith with timing (#3844) 2 years ago
- 7ff47e4 cifar TARGET_EVAL_ACC_PCT=93.5 (#3843) 2 years ago
- 92c5067 conceptual small refactor (#3842) 2 years ago
- 519336c factor out partial in SumNode div int (#3841) 2 years ago
- 8cb5215 Revert "Ring allreduce in multitensor (#3000)" (#3840) 2 years ago
- c5bf9e4 Ring allreduce in multitensor (#3000) 2 years ago