{"author":"chenyu","author_email":"chenyu@fastmail.com","author_time":1711258667,"commit_time":1711258667,"committer":"GitHub","committer_email":"noreply@github.com","hash":"e22d78b3d24126b70ad19c3b098f65000fc3d150","message":"training cifar with BF16 on CUDA (#3905)\n\n* training cifar with BF16 on CUDA\r\n\r\nmemory usage is between float and half due to numpy calls on dataset preprocessing, which converts into float.\r\n\r\n* simpler bf16 functions\r\n\r\n* bf16 cifar works for HSA too just very slow\r\n\r\n* simpler bf16 functions, we love cuda","parents":["0145366323116568d0b1a02d3df2e14a307f00c7"],"tree_hash":"d3f4148e3909487b109205be07269f3db57d89b1"}