State of LLMs

Large Language Models (LLMs) are still very limited. I work with these models every day, building Jamie, and I encounter limitations repeatedly that seem to constrain the next generation of possibilities unlocked by this technology.

Short inference context window: Currently, LLMs can only generate up to ~4k tokens as output (with some early betas reaching ~8k tokens). In contrast, the common input context window size is well above 100k tokens. Working with larger contexts requires larger inference eventually. This is a significant limiting factor at the moment, and it seems that this limit is harder to overcome than expanding the input context window size.
Slow inference: Despite models getting smarter, inference is still too slow for many advanced applications. For complex tasks, you typically rely on a pipeline of LLM inferences chained together. As most of these operations tend to be linear (and thus not parallelizable), those slow inference times add up quickly. For many interesting cases, we need inference times of < 1 second. More intelligent models remove the need of pipelines. And this is where the field seems to evolve to.
Inconsistencies in instruction following: Although models are getting better at following complex instructions, there's always a significant probability of them not doing what you want. Refining the prompts can help mitigate that risk, but you hit a local maximum for certain types of complex prompts. These edge cases are costly when building great products with LLMs.

I genuinely believe LLMs are one of the most exciting technological platforms at the moment. It's fun working with them and observing the pace at which research evolves in this space. But today, there are still hard, unsolved problems that research doesn't have good answers to. Once we figure out these limitations, we will be on our way to unlocking even more potential.