{"author":"Szymon Ożóg","author_email":"58388001+SzymonOzog@users.noreply.github.com","author_time":1711122842,"commit_time":1711122842,"committer":"GitHub","committer_email":"noreply@github.com","hash":"624bc899102a58df15557659a45f0cab5f745fee","message":"PTX - implement float 4, ptr arithmetics and other speed improvements (#3775)\n\n* ptx float4 implementation\r\n\r\n* remove from cache when trimming uops\r\n\r\n* Gate for float4\r\n\r\n* Linting fix\r\n\r\n* disable test reasonable time for ptx\r\n\r\n* import getenv\r\n\r\n* Update uops.py\r\n\r\n* linter\r\n\r\n* Add div test for half\r\n\r\n* upcast if op does not support operation\r\n\r\n* fix offset\r\n\r\n* Run only if dtype supported\r\n\r\n* zero out registers when accessing by pred + cleanup\r\n\r\n* Remove trailing whitespace\r\n\r\n* revert\r\n\r\n* spacing fix\r\n\r\n* move cache clearing outside loop\r\n\r\n* did this suddenly start working?\r\n\r\n* unused import removed\r\n\r\n* Remove cast\r\n\r\n* Use pattern matching\r\n\r\n* linting\r\n\r\n---------\r\n\r\nCo-authored-by: George Hotz <72895+geohot@users.noreply.github.com>","parents":["f4055439dcab0bd8b8dcceb8f44ab4a2bf1db1ed"],"tree_hash":"473eee8966c7cac48c2601e1e41870279a428503"}