{"author":"reddyn12","author_email":"72528507+reddyn12@users.noreply.github.com","author_time":1711673352,"commit_time":1711673352,"committer":"GitHub","committer_email":"noreply@github.com","hash":"9b5e15db6ec76868a6e179e3e68494a4d44d5191","message":"Mamba Implementation (#3456)\n\n* first commit\r\n\r\n* state back to orig\r\n\r\n* mamba comparisions\r\n\r\n* rm file\r\n\r\n* rename file\r\n\r\n* use Tensor.einsum and mke default model 370M\r\n\r\n* Cleaned code and made a comparision test\r\n\r\n* Simplyfy pull request. Only has 1 mamba implementation now.\r\n\r\n* Update prompt\r\n\r\n* rm whitespaces\r\n\r\n* last space\r\n\r\n* remove Einops dependency\r\n\r\n* rm unused code\r\n\r\n* add tests\r\n\r\n* rm print statement\r\n\r\n* rm imports\r\n\r\n* skip CLANG\r\n\r\n* Update skipIf description\r\n\r\n* skip model test in CI and add CLANG fix\r\n\r\n* rm Device import\r\n\r\n* don't be stupid\r\n\r\n* Fix conv assign\r\n\r\nWhen the prompt is too short, the logic for conv_state assign messes up. This can be fixed when padding the tokenized array to min length of 4. I padded using the empty string token, but idk if proper practice is to use the PAD token\r\n\r\n* fix p1\r\n\r\n* temp\r\n\r\n* fix jit import\r\n\r\n---------\r\n\r\nCo-authored-by: schlimeszn <schlimeszn@gmail.com>\r\nCo-authored-by: reddyn <nikidsniper@gmail.com>\r\nCo-authored-by: George Hotz <72895+geohot@users.noreply.github.com>","parents":["d085837179789a9a2bedf3f1a9990eb7bb690e4a"],"tree_hash":"8ff9e8f07a451fb6ed06877e01b2eadf71b48c12"}