Hey, interesting repo. Unfortunately looks like openai's 3.5-turbo-1106 model limits output to 4096 tokens. As does the 4-turbo model.
Claude has also changed every model's output to 4096 tokens.
OpenaAI's 0613 models will be deprecated in July this year. Azure as well. Soon we will have no major provider (of not OSS LLMs) providing >4096 generation tokens capabilities. IMO this is going to hit a lot of different use cases hard.
Have you had any thoughts on this?
Hey, interesting repo. Unfortunately looks like openai's 3.5-turbo-1106 model limits output to 4096 tokens. As does the 4-turbo model.
Claude has also changed every model's output to 4096 tokens.
OpenaAI's 0613 models will be deprecated in July this year. Azure as well. Soon we will have no major provider (of not OSS LLMs) providing >4096 generation tokens capabilities. IMO this is going to hit a lot of different use cases hard.
Have you had any thoughts on this?