branch: master
Commits on master
- 5a57b48 cuda p2p enable when available (#4153) 1 year ago
- 380f27d move sum acc_dtype into lazy so it applies to backward (#4149) 1 year ago
- bbda20c CompiledASTRunner -> CompiledRunner (#4148) 1 year ago
- 0f16709 hotfix: remove test speed vs torch 1 year ago
- c079637 refactor membufs (#4147) 1 year ago
- b7e281c JitItem -> ExecItem (#4146) 1 year ago
- e79a11b hotfix: revert llama change 1 year ago
- 2e6c39b Do less realizes (#4141) 1 year ago
- 06bcae1 PADTO SUM if parents of sum are all zero-preserving (#4140) 1 year ago
- 081dd15 hotfix: keep CUDA D2D copy behind the CUDA_P2P flag 1 year ago
- af5984d cudagraph memcpy through host (#4137) 1 year ago
- 5e6d215 Add driving monitoring model to benchmarks (#4134) 1 year ago
- bf3583f use Buffer.ensure_allocated in search _ensure_buffer_alloc (#4132) 1 year ago
- a35375d run_schedule is so simple now (#4130) 1 year ago
- 86bd2eb hotfix: update copy_from_fd for new DiskBuffer 1 year ago
- ee457a4 no more underlying diskbuffer, that's just the device (#4129) 1 year ago
- fe88591 update onnx to 1.16.0 (#4127) 1 year ago
- 6bbbeb9 skip a few clang test that took > 30 seconds in CI (#4126) 1 year ago
- 08ddeb5 create schedule has global vars (#4125) 1 year ago
- 216eb23 hotfix: cast mnist to float 1 year ago
- fea774f spend 5 lines to bring mnist into the repo (#4122) 1 year ago
- 42edae8 pickle schedules (#4114) 1 year ago
- 38ae419 Fixes for ops_kfd (#4105) 1 year ago
- 10dbf90 hotfix: test speed 1 year ago
- ae849d1 numpy device + pickle it (#4120) 1 year ago
- 1ef9c50 Update ssa input order and annotate types in cstyle and assembly (#4117) 1 year ago
- 15f2f39 conceptually simpler fancy index (#3335) 1 year ago
- 980124a add lerp operation to tensor (#4102) 1 year ago
- 46850a0 search: add a BEAM_COMPARE env to optionally not compare to hc/tc (#4107) 1 year ago
- c390828 refactor outbufs (#4112) 1 year ago