branch: master
Commits on master
- 76d191a move consts to end of add (#5783) 1 year ago
- 5b84a7d hotfix: ptx threads match cuda threads 1 year ago
- 460b120 apply more .alu syntactic sugar [run_process_replay] (#5782) 1 year ago
- 0392123 TC=2 still sets tensor cores (and TC=3 support for locals) (#5780) 1 year ago
- 71a64d8 UOps.MUL bound when one is negative (#5781) 1 year ago
- b775db6 high-level benchmark timing diff (#5776) 1 year ago
- 600a397 fix Tensor.arange if (stop-start) and step have different signs (#5775) 1 year ago
- d0fd84e feat: allow passing gradient to .backward() to compute vjp (#5771) 1 year ago
- e0e7293 make process replay unique in retries [run_process_replay] (#5773) 1 year ago
- ea27ec4 nv switch classlist_v2 to classlist (#5763) 1 year ago
- 73fda02 amd better comments for ENABLE_SGPR_DISPATCH_PTR (#5768) 1 year ago
- 95dda8d more unmatching vectorize/gep asserts [run_process_replay] (#5760) 1 year ago
- bfbd7c5 more generic UOp mul mod folding (#5765) 1 year ago
- 80c6475 update test_uop_symbolic to test UOp min and max (#5764) 1 year ago
- 1903542 nv/cuda compilers touchup (#5759) 1 year ago
- 3c79faa remove redundant UOps max folding [run_process_replay] (#5762) 1 year ago
- 05748e5 fix vmax of Uop.RANGE off by 1 (#5750) 1 year ago
- fff19b9 docs: user runtime docs (#5756) 1 year ago
- 5d53fa4 amd autogened kfd ioctls (#5757) 1 year ago
- ed1d784 test profiler timer sync across devs (#5751) 1 year ago
- e5fb08a simpler expand UOps acc [run_process_replay] (#5754) 1 year ago
- de66d93 PTX render vec CONST (#5729) 1 year ago
- 890e11c fix UOps.STORE folding returning NOp [run_process_replay] (#5753) 1 year ago
- 3e49d86 process replay diffs 3 things now (#5731) 1 year ago
- 57b4a8e assert process replay asserts (#5737) 1 year ago
- f8972ac test flops (and allow wide ALU in UOps) [run_process_replay] (#5749) 1 year ago
- 2fde2d2 hotfix: external_test_speed_theoretical works on 24GB 1 year ago
- b75d1e8 UOp._min_max for IDIV (#5748) 1 year ago
- 829262a add external_test_speed_theoretical 1 year ago
- 5f168e7 remove the optimization in AndNode.substitute (#5747) 1 year ago