branch: master
Commits on master
- 09b6e25 hip compile speed (#2606) 2 years ago
- 19a0a83 fix used resources in metal graph (#2604) 2 years ago
- fde44ae update hip_matmul with new abstraction (#2605) 2 years ago
- 5540f6e hotfix: make_half4 2 years ago
- e8d6a6e view.reshape without symbolic (#2218) 2 years ago
- 664475f vals is an argument (#2599) 2 years ago
- fcd0b2e fix multigpu on tinybox (#2595) 2 years ago
- 61c0113 test external_multi_gpu.py (and works in CUDA) 2 years ago
- bbeba8e use default dict for external_model_benchmark (#2592) 2 years ago
- 5508173 enable test_sample for all backend (#2593) 2 years ago
- a58736f print DEBUG=2 stats after stats update (#2590) 2 years ago
- bc012f2 hotfix, disable model inference benchmark on NVIDIA 2 years ago
- 4380ccb Non fp32 math (#2264) 2 years ago
- 1ac958a update pytest marks and CI test filters (#2587) 2 years ago
- 88a5c36 fix metal graph with var_vals (#2583) 2 years ago
- f180cac wgsl renderer cleanup: use the same const render, reuse cast render logic (#2579) 2 years ago
- ab2d4d8 Fix cl import in the copy_speed test and cifar example (#2586) 2 years ago
- 3226b3d enable the jit random test (#2580) 2 years ago
- 09c9794 clean external_test_opt.py (#2578) 2 years ago
- 171543f cleanups to save lines and files (#2577) 2 years ago
- a9a7663 that's not needed (#2574) 2 years ago
- 875c34b minor lazy tweak before rewrite (#2573) 2 years ago
- fa1d4dd implement MAX in other dtypes (#2572) 2 years ago
- 065495e save a few lines in ops_gpu (#2564) 2 years ago
- 5e87083 Whisper + LLAMA + VITS (#2332) 2 years ago
- 47cec4c int operations shouldn't have a fast math flag (#2571) 2 years ago
- d6b404a No dtype alloc (#2570) 2 years ago
- c877471 lazy cleanup (#2567) 2 years ago
- 5068e99 refactor to remove extra kernel params (#2563) 2 years ago
- 27481b9 Switch ops_gpu -> gpuctypes (#2532) 2 years ago