{"author":"Szymon Ożóg","author_email":"58388001+SzymonOzog@users.noreply.github.com","author_time":1712241151,"commit_time":1712241151,"committer":"GitHub","committer_email":"noreply@github.com","hash":"68fe3527f1d6b87a056d3d3309075b6466c9ec32","message":"Tensor core ptx (#3894)\n\n* tensor cores\r\n\r\n* Merge from master\r\n\r\n* faster program start in llvm (#3897)\r\n\r\n* Fix the result permutation in einsum (#3895)\r\n\r\n* Fix permutation of result indices in einsum.\r\n\r\n* Delete stray line used for breaking tests\r\n\r\n* Fix linter error by renaming twice-used variable\r\n\r\n---------\r\n\r\nCo-authored-by: chenyu <chenyu@fastmail.com>\r\n\r\n* touchup einsum (#3900)\r\n\r\ndon't need rhs_letters\r\n\r\n* hotfix check ckpts before writing achieved model (#3901)\r\n\r\nthis killed tinybox green run\r\n\r\n* replace dtype.name str with render_dtype (#3903)\r\n\r\nfixed some bf16 cast issue since it does not have `.name`.\r\nalso more robust if there are lang specific type override\r\n\r\n* add --minimal flag to nvrtc (#3899)\r\n\r\n* wmma: fix the AMD TC threads to split the first 16 threads (#3904)\r\n\r\npreviously it was incorrectly aliasing 16 into the size 8 upcast\r\non the store alias.  now it splits it properly into 8 and the\r\nremaining 2 into the correct local stride\r\n\r\n* training cifar with BF16 on CUDA (#3905)\r\n\r\n* training cifar with BF16 on CUDA\r\n\r\nmemory usage is between float and half due to numpy calls on dataset preprocessing, which converts into float.\r\n\r\n* simpler bf16 functions\r\n\r\n* bf16 cifar works for HSA too just very slow\r\n\r\n* simpler bf16 functions, we love cuda\r\n\r\n* include negative float in test_dtype (#3884)\r\n\r\n* include negative float in test_dtype\r\n\r\n* that is ub\r\n\r\n* too annoying\r\n\r\n* pack can overflow\r\n\r\n* add to benchmark\r\n\r\n* change var name to satisfy mypy\r\n\r\n* spacing\r\n\r\n* Update to new TensorCore format\r\n\r\n* Spacing\r\n\r\n---------\r\n\r\nCo-authored-by: nimlgen <138685161+nimlgen@users.noreply.github.com>\r\nCo-authored-by: Alejandro F Queiruga <33233447+afqueiruga@users.noreply.github.com>\r\nCo-authored-by: chenyu <chenyu@fastmail.com>\r\nCo-authored-by: sekstini <127142660+sekstini@users.noreply.github.com>\r\nCo-authored-by: Francis Lam <flam@alum.mit.edu>\r\nCo-authored-by: George Hotz <72895+geohot@users.noreply.github.com>","parents":["92378fb5b6010d5daa02119e8b72c34a51a67f10"],"tree_hash":"28b231774ca127eae5c04122204d964fb8c46485"}