{"author":"Oleg Rybalko","author_email":"rybalko.oleg.123@mail.ru","author_time":1701558226,"commit_time":1701558226,"committer":"GitHub","committer_email":"noreply@github.com","hash":"5e870837839916c821a17aeca8068d4c9aa8aaa9","message":"Whisper + LLAMA + VITS (#2332)\n\n* feat: working voice 2 text using whisper\r\n\r\n* feat: added llama generation\r\n\r\n* feat: vits init\r\n\r\n* feat: more accurate voice conversion\r\n\r\n* feat: support for tts and working pipeline for the first pass\r\n\r\n* fix: linter checks\r\n\r\n* refactored vits initialization and inference, added mmts-tts support\r\n\r\n* fixed process sync and now we can have an infinite conversation\r\n\r\n* reuse output stream to remove overhead of creating a new one each time\r\n\r\n* added pre-prompt configuration with yaml files\r\n\r\n* adjusted code to merge PR which changed whisper\r\n\r\n* optimized whisper, now it's blazing fast and also reduced number of lines\r\n\r\n* added better debug printing\r\n\r\n* use jitted encode function for whisper, added timings and removed response delim to save speed on generating those tokens\r\n\r\n* fixed hf convert and now it's working with tinyllama\r\n\r\n* added tinyllama config\r\n\r\n* refactored code and made it work with all llama models\r\n\r\n* prettier order\r\n\r\n* prettier order\r\n\r\n* fixed suffix for tinyllama and refactored convert_from_hf\r\n\r\n* added missing parameters\r\n\r\n* fixed stream release and added missing params\r\n\r\n* jitted dp and encoder\r\n\r\n* jitted flow forward\r\n\r\n* removed re-init of espeak on each call to save up time\r\n\r\n* jitted generator forward for blazing fast tts\r\n\r\n* added contextmanager for displaying a chat log\r\n\r\n* removed whitespace for pylint\r\n\r\n* updated code to support latest fetch func\r\n\r\n* wait for llama eos token and pass params from cli to llama\r\n\r\n* listen for not fixed amount of time\r\n\r\n* refactored code a bit\r\n\r\n* removed thresholding and now the output streams directly to whisper\r\n\r\n* tokenize llama output for vits batch size to work and stream each sentence to a speaker\r\n\r\n* changed speaker\r\n\r\n* whisper is now printing on the same line\r\n\r\n* don't trigger llama on whisper output in parens\r\n\r\n* added tinyllama chat model\r\n\r\n* adjusted code to work with tinyllama chat model\r\n\r\n* removed unused cli arg\r\n\r\n* autofetch tokenizer and tinyllama model. add 3 chat tokens to the tokenizer\r\n\r\n* fixed issue with long sentences by chunking them\r\n\r\n* support for multiline llama output\r\n\r\n* prettified log output\r\n\r\n* adjusted sentence length\r\n\r\n* remove quote from response to avoid funny tts\r\n\r\n* fixed prompts\r\n\r\n* added missing parameter","parents":["47cec4caf31e479dce3b1320fcde07314e0405a3"],"tree_hash":"cc2bf6e3c149036af6a4c92e76d242311236625e"}