{"author":"Patrick Tsai","author_email":"5304405+patosai@users.noreply.github.com","author_time":1710198560,"commit_time":1710198560,"committer":"GitHub","committer_email":"noreply@github.com","hash":"971d7f5d7c86d3ef427549fe637c168925a3fb73","message":"O(n) arange attempt (#3530)\n\n* It works?\r\n\r\n* Clamp correctly\r\n\r\n* Refactor\r\n\r\n* Make code better\r\n\r\n* Undo some stuff\r\n\r\n* First step to trying to make floats work\r\n\r\n* Floats work in Python op but not metal because int div is different\r\n\r\nPython integerdivision was implemented as // which rounds towards\r\nnegative infinity, but C integer division rounds towards 0 so there\r\nis an off-by-1 division error\r\n\r\n* arange does cumsum with ints and then multiplies by step\r\n\r\nThis is so loop optimization can remain int only\r\n\r\n* Undo a lot of symbolic changes\r\n\r\n* Final check\r\n\r\n* Cleanup\r\n\r\n* There can be multiple phis\r\n\r\n* Fix multiple phi op removal\r\n\r\n* const sets dtype correctly\r\n\r\n* Fix bugs\r\n\r\n* Fix a couple bugs and add loop vars to resolve\r\n\r\n* missed one\r\n\r\n* Don't trim too many ops\r\n\r\n* Fix symbolic test\r\n\r\n* Use ones instead of full\r\n\r\n* Delete test\r\n\r\n* Lint passes\r\n\r\n* max node error\r\n\r\n* Small updates to loop logic\r\n\r\n* Remove unnecessary changes\r\n\r\n* We are getting somewhere\r\n\r\n* Simple case\r\n\r\n* Fix\r\n\r\n* rm, prn\r\n\r\n* Better\r\n\r\n* If NumNode doesn't work then continue\r\n\r\n* clamp is needed for arange(256)\r\n\r\n* Move everything into the optim fn\r\n\r\n* Replace correctly\r\n\r\n* Order optimizations better\r\n\r\n* Delete\r\n\r\n* mypy\r\n\r\n* Test for simplification\r\n\r\n* Rename\r\n\r\n* Fix test\r\n\r\n* update test description\r\n\r\n* Undo more\r\n\r\n* Cleanup\r\n\r\n* No replaced_ops map\r\n\r\n* Fix lint\r\n\r\n* AssertionError\r\n\r\n* back again\r\n\r\n* Reinstate assertion\r\n\r\n* Return true and make diff not as big\r\n\r\n* Bigger range for test\r\n\r\n* Change cumsum impl\r\n\r\n* fix bug\r\n\r\n* make big cumsum work\r\n\r\n* lint\r\n\r\n* Undo cumsum 2-stage removal\r\n\r\n* No while helper\r\n\r\n* optional min/max clamping\r\n\r\n* floats work\r\n\r\n* rm giant arange test\r\n\r\n* fix python cast None\r\n\r\n* Check phi parents\r\n\r\n* one phi allowed per where\r\n\r\n* Fix one phi per where\r\n\r\n* Rework iteration\r\n\r\n* Delete assertions\r\n\r\n* convert to int\r\n\r\n* Try mul -1 instead of neg for hip..?\r\n\r\n* Remove one phi per where requirements\r\n\r\n* one accum only\r\n\r\n* Lint\r\n\r\n* should simplify a loop at a time\r\n\r\n* Don't get rid of loop explcitly\r\n\r\n* Need to iterate backwards\r\n\r\n* lint\r\n\r\n* unary neg\r\n\r\n* Make optim work for onnx and sum_pad_collapse\r\n\r\n* Better message\r\n\r\n* filter alu ops correctly\r\n\r\n* Fix the limiter\r\n\r\n* lint and simplify\r\n\r\n* Add it back\r\n\r\n* off by one error\r\n\r\n* test wheres and phis\r\n\r\n* test max ops and non-if stuff\r\n\r\n* <=\r\n\r\n* cast_scalar\r\n\r\n* Oops\r\n\r\n* Change test\r\n\r\n* Pass loop uops instead of a modified map\r\n\r\n* Cut param transfer between linearizer and uops\r\n\r\n* Fix issues\r\n\r\n* Fix lint\r\n\r\n* fix efficientnet python 3.8 invalid syntax\r\n\r\n* distinct vars in seen_vars\r\n\r\n* accurate var names\r\n\r\n---------\r\n\r\nCo-authored-by: Patrick Tsai <patosai@users.noreply.github.com>\r\nCo-authored-by: George Hotz <72895+geohot@users.noreply.github.com>","parents":["a5d023dff861dd534b5f1f2680c8836490dd7d45"],"tree_hash":"67ef1b26168ff2125d1ac0d2160f45a35e024a45"}