{"author":"qazal","author_email":"77887910+Qazalin@users.noreply.github.com","author_time":1701879346,"commit_time":1701879346,"committer":"GitHub","committer_email":"noreply@github.com","hash":"c704a77ca0cede1ba1c589ebf84198e4ae35c1c5","message":"green dtypes ALU tests (#2617)\n\n* dtypes alu test\r\n\r\n* those types don't exist in torch\r\n\r\n* floats\r\n\r\n* more tests\r\n\r\n* disable those\r\n\r\n* a couple unary tests\r\n\r\n* skip float16 tests in CI for GPU\r\n\r\n* fix LLVM bool add True+True=1+1=2 which truncates to False in native LLVM\r\n\r\n* remove hardcoded float for LLVM ALU fns\r\n\r\n* less sensitive atol for fp32, 1e-10 is flaky and sometimes failed even if you revert the merge commit for non-fp32 math, nothing has changed in our kernels for fp32.\r\n\r\n* return on overflows\r\n\r\n* fix CUDA exp2\r\n\r\n* compute results of op regardless of bounds in a python backend\r\n\r\n* skip fp16 in GPU and CUDACPU\r\n\r\n* fuzz a smaller range in the float_midcast_int32 test\r\n\r\nI sampled this and we overflow ~70% of the time.\r\nbecause numpy behaves differently on different devices for overflows and Metal seems to do the same, I'm opting to eliminate the non-determinism here\r\n\r\n* remove CUDA exp2 overload it's already there now\r\n\r\n---------\r\n\r\nCo-authored-by: George Hotz <geohot@gmail.com>","parents":["71d989b476b592e5e9989341159441a1940432a8"],"tree_hash":"7ef2fb2b0e50ef2debb4987c4e79df520e730d02"}