branch: master
Commits on master
- 82b7b96 test for dtype set (#4069) 2 years ago
- 1a1dd1c add and enable tests for indexing const folding (#4068) 2 years ago
- ba118ab improved caching for pointer arithmetics in ptx (#3922) 2 years ago
- 68fe352 Tensor core ptx (#3894) 2 years ago
- 92378fb Ptx mulacc (#3937) 2 years ago
- 3e72d74 hotfix: make KFD timings right 2 years ago
- 58d1623 Continuing KFD work (#4065) 2 years ago
- d219aba prepend CLANG_PROGRAM_HEADER in ClangCompiler.render instead of compile (#4063) 2 years ago
- 7181ffd HWCopyQueue in KFD (#4042) 2 years ago
- e3c0ac9 remove old envvar "OPT" (#4060) 2 years ago
- 406cb5f const fold ReduceOps (#4059) 2 years ago
- fe03725 const fold cast unrealized_unpadded_const (#4047) 2 years ago
- e5a9bff Add pattern matcher tests, move uop transforms from assembly to pattern (#4056) 2 years ago
- 1ea8fcb graph schedule items (#4054) 2 years ago
- 52ee5b7 update logo (#4055) 2 years ago
- f61ed86 Use exec_alu for lazy const folding (#4039) 2 years ago
- 88dcdae search: fix counting of upcasts to ignore TC upcasts (#4045) 2 years ago
- ccf3c16 Refactor the use of pattern matcher in ptx (#4043) 2 years ago
- 85edc49 uops const fold rules to prevent tautological compare warnings (#4041) 2 years ago
- e879e16 docs: add warning message for conda users when using METAL (#3917) 2 years ago
- 0147174 Embedding in one kernel (#4036) 2 years ago
- 506b1c5 multigpu works (#4040) 2 years ago
- 05e7f93 use clang as default instead of llvm for cpu (#4035) 2 years ago
- 5311b45 re-enable has_local check for linearizer test (#4034) 2 years ago
- bec2aaf add beautiful_mnist_multigpu example 2 years ago
- 7425a0c CommandQueue is the future (#3950) 2 years ago
- 0a34d60 move exec_alu from uops to ops (#4033) 2 years ago
- 82440d3 don't call contiguous for unpadded const into multi tensor (#4032) 2 years ago
- d6ba44b kfd free buffers (#4027) 2 years ago
- 77a68fc test examples for multi tensor const folding (#4031) 2 years ago