{"author":"chenyu","author_email":"chenyu@fastmail.com","author_time":1714103128,"commit_time":1714103128,"committer":"GitHub","committer_email":"noreply@github.com","hash":"1891ebb655829d4b984c2bcdde105aa799a57313","message":"make ring allreduce chunks a multiple of 2^n if possible (#4302)\n\nin resnet, instead of chunking as [43691, 43691, 43691, 43691, 43690, 43690], chunk as [43712, 43712, 43680, 43680, 43680, 43680] and those can have 32 local.\r\n\r\nmore than 2X faster for the applicable kernels and overall 1% for resnet","parents":["1e37c4a7a1a0173a952ae3f9566b8c967c9c52c9"],"tree_hash":"3c670083511040cb9a01e7b0b914099dc6bebb04"}