{"author":"Elias Wahl","author_email":"82230675+Eliulm@users.noreply.github.com","author_time":1714415727,"commit_time":1714415727,"committer":"GitHub","committer_email":"noreply@github.com","hash":"27613dd881f4868580fc4bf851a811896d538bdd","message":"MLPerf BERT: Main training loop (#4288)\n\n* BERT language modeling head + trunc normal initializers\r\n\r\n* add train loop + helpers\r\n\r\n* shuffle in dataloaders + slight changes in main loop\r\n\r\n* beam change\r\n\r\n* Minor changes\r\n\r\n* random.shuffle\r\n\r\n* HParam update\r\n\r\n* Use deque for dataloader\r\n\r\n* wandb bert project name\r\n\r\n* half fixes\r\n\r\n* BENCHMARK + remove epoch\r\n\r\n* cast + print()\r\n\r\n---------\r\n\r\nCo-authored-by: chenyu <chenyu@fastmail.com>","parents":["61c97d530550de032bd88d4afece5467bdec9ee6"],"tree_hash":"1f86ab4f9f09147275d4c5cf6aea6145dba2b68c"}