← Back to all posts

AI in 2026

I love making predictions. I’ve been making a lot during the year in X, and I want to lock in some for 2026 before it begins. Some of them were already posted by me in some form, some are new. I will probably go through all my past predictions in X and make a structured page for myself to track them, but later.

This post focuses on AI and related topics, as I've been mostly focused on that throughout 2025. And it's all based solely on my biased opinion.

Progress

AI in the form of LLMs will continue to improve rapidly. I believe the rate of improvements will be even more significant than it was in 2025, because of more compute being acquired by companies for training next generations of models.

The test-time scaling paradigm added an extra scaling axis in 2024 when OpenAI released o1-preview. This axis is not even nearly saturated yet, even though many people think it is. GPT-5.2 Pro can think for an hour straight, and agents like Claude Code and Codex are capable of working 24h+ executing on a certain plan. But people are working for weeks, months, and years on the hardest problems.

Even though for general chatting the thinking time won't increase further, the tasks that require the highest possible level of intelligence and precision will keep taking more time. I expect that by the end of 2026 we'll see GPT-5.2 Pro alternatives working up to 6-8 hours on the hardest problems, and Codex-like agents working for days completing huge projects end-to-end.

The intelligence of today's models amazes me. LLMs of the latest generation are capable of solving extremely difficult problems, and have become very useful in real work, especially for programmers. I really don't understand why some people don't believe in the improvements in recent months. Maybe the things they tried using these models for are too easy.

If you take a model from a year ago, like o1 or Sonnet 3.6, and compare it head to head on some difficult task with GPT-5.2 or Opus 4.5, you'll notice an insane difference in what they can do. It's easy to get used to new levels of capabilities when they are being released every couple of months with "minor" improvements over the last version. But this gradual progress on a small scale becomes extreme progress in the long term.

Think of models a year from now, like GPT-6 and Opus 5.5, or whatever they will be called then. Imagine that they are at least as much better compared to today's models, as today's models are better than 2024 models. It's almost artificial superintelligence in my opinion.

This doesn't really work this way, because the current pace of improvements is so fast due to a lot of research and scaling. And it's not always right to just assume that "researchers will come up with something new that will make it better". But I'm going to assume it, since I don't see any reasons for the current pace to slow down.

More specifically, I expect the most for 2026 and 2027 from memory. Solving efficient long-term memory for agents kind of solves continual learning, which is one of the last things people want AI to be capable of to call it "AGI".

I can easily imagine a system in late 2026 that is powered by some strong agentic model, wired with a long-term memory module (it could be just some text files or some fancy RAG), and performing short LoRA training sessions from time to time on the things it did recently. This process will probably be individual per-user, and companies will then use insights from these long-running agents to train next iterations of models.

There is literally nothing that prevents us from implementing something like this, it's just a matter of time and taste. And solving this will unlock another axis of scaling that will make the exponent even steeper. This will become a kind of test-time training.

It's hard to predict where exactly models will be in terms of intelligence and capabilities with all that, but I'm sure that we'll see a huge explosion of novel science discoveries made by such systems, as they will be able to run for weeks, making progress and reflecting on the past. One of the most promising areas for that is formalization agents tackling math theorems.

I even think there's a very small possibility of AI solving one of the millennium prize problems in 2026, but that's more of a 2027 thing.

Capabilities

This is mostly a speculative section where I try to predict exact ranges of scores on different evals and benchmarks. There's nothing I provide to back my predictions and it's mostly vibe-based. But they represent my opinion on these evals and AI progress in 2026.

Math

FrontierMath tiers 1-3 get nearly saturated at 85% solved problems, while tier 4 stays somewhere at 50-60%. LLMs will generate many novel math proofs contributing to and solving real problems. A lot of Erdős problems will get solved by AI and the hardest ones will get some kind of advances from AI. A lot of progress will be seen in auto-formalization with Lean, from companies like Math Inc and Harmonic.

Coding

There are no good evals for software engineering at the moment, but we might get one in 2026. Codex is capable of implementing complex things precisely already, but it still lacks the taste and long-term vision of a senior engineer.

I expect Codex-like agents to improve a lot in this direction, so that by the end of the year we'll see many more senior engineers using AI for writing code and giving good feedback about it.

For non-coder users, the limit of what's possible will move much further. Someone without any programming skills will be able to create a playable game of good quality that would be worth publishing.

Science

The research subset of FrontierScience will be at about 70% by the end of the year. AI will contribute significantly to physics, chemistry, and biology, and many new discoveries will be made with strong help of AI or even entirely by AI.

We will see news about new AI advances in science very often closer to the end of the year. But the overall impact will not be that significant yet, as it'll all be in the "adoption" phase still.

It will not cure cancer or whatever people expect from ASI. It's more of a 2027-2028 thing.

Vision

Vision will be improved significantly with new training techniques, and computer-use agents will become nearly flawless. They will get adopted for automated QA testing of software.

AI will get much better at games. Not necessarily LLMs, but maybe some new system from Google will break the world record in Minecraft random seed speedrunning while playing in the same conditions as humans (no slowing down time or anything like that).

Instruction following

We've seen major leaps in that aspect with jumps from o3 to GPT-5 and from GPT-5 to GPT-5.2. And I expect to see at least one more jump of a similar significance.

GPT-5.3 or GPT-5.4 might get nearly perfect at instruction following in anything you would think of asking it to do. GPT-5.2 is already almost there, but the next iteration of these improvements will make it unbreakable in practice, so we'll hit some kind of a wall here.

AI will be able to execute with superhuman precision on very long and detailed plans and specifications. It will be embraced by developers a lot.

Companies

I'm only covering a small set of companies here, as I've been watching them more recently. And others aren't that significant anyway at the moment.

OpenAI

We'll see GPT-5.3 in Q1, GPT-5.4 in Q2, and probably GPT-5.5 in Q3. They might not be called exactly like this, but the general trend is clear. I'm really not sure what they will do about GPT-6. I'm expecting GPT-6 to be released once they train a long-running model with memory integration that I've described earlier. So we might get it in late 2026, or perhaps in early 2027.

It's hard to say which of these will be "major leaps" and which will be minor improvements (like 5.1 was), but I think GPT-5.3 might be a strong step up. Overall, the best available OpenAI model by the end of 2026 will probably be something I'd call ASI already, but the definitions of all these terms are a topic for another post.

We will see some more previews of their experimental internal models, like we've seen this summer. Probably in the form of publishing scientific discoveries and telling that these were made by a new AI system.

The "IMO model" from this summer that I've mentioned might be some early experiment on continual learning already, and in that case we'll see a full system like one I described a few months earlier. But I think there's a higher chance that it's just a new Pro system that will be an upgrade to GPT-5.2 Pro or GPT-5.3 Pro.

We'll see solid improvements in voice mode. It'll feel much more human and be much smarter. The user experience of using it will also improve, so I might start using that myself.

It's hard to say something specific about image generation, but I think the main improvement areas will be detail quality and instruction following. We'll see GPT Image 2 and perhaps GPT Image 2.5.

A new version of Sora will be released, with improvements in realism, details, and instruction following. But there's a lot more to work on in this area, and generally I think Google will be better in this.

Anthropic

Anthropic models are really good in some ways. Their raw intelligence and reasoning power aren't close to those of GPT and Gemini models, but they have certain traits that people love about them.

Their new models won't lose this, and might even get better at this. But I don't think they'll overcome OpenAI in terms of reasoning, and will still be loved by programmers for the speed and taste rather than intelligence. Anyway, their models will still be close to SoTA for development, and their models will still be used and loved.

I don't think we'll see any image, video, or audio models from Anthropic anytime soon, but it might be an interesting surprise.

My best guess is that we'll get Claude 4.6 in Q1, and maybe Claude 4.7 in Q2, followed by Claude 5 in the second half of 2026. Opus will be the main driver, while Sonnet and Haiku will get a speed-up, a price drop, and act as "mini" and "nano" models for their purposes.

Google

Google has a lot of data. It shows in the knowledge of their base models. And with Gemini 3 they caught up with RLVR to be close to SoTA in terms of intelligence and reasoning. But their post-training still lacks some sauce that OpenAI and Anthropic have.

Gemini models are very bad at instruction following right now, making them unusable in many real tasks. And I'm expecting them to fix this, catching up to SoTA in this matter too, but OpenAI will still stay ahead in that.

We will see a few checkpoints and then general-availability versions of all Gemini 3 family models somewhere in the middle of the year, and some chance for a preview of Gemini 3.5 in late 2026.

What's more interesting about Google is their world models like Genie. I'm expecting to see another breakthrough there from them.

xAI

Grok mostly lacks the same qualities that Gemini lacks. And I also expect them to catch up on that by the end of the year. They seem to care about cheap and fast coding models, so they will probably keep working on that too, but I personally don't see anything useful in that for what I'm doing.

We'll see Grok 4.20 in January, then Grok 5 in Q2, and something like Grok 5.1 by the end of the year. These models will be similar to Gemini models in many ways, as Elon also has a lot of data and compute. But I don't see Grok being one of the leading models in 2026 at any point in time.

Image and video models from xAI probably won't be at the Google level, but they might be competitive with OpenAI ones.

Open-source

Open-source models will still be about 6-12 months behind the frontier. There is nothing wrong with that and it's great that we can run local models at all. They aren't really useful in practice, and I'm mostly seeing them as things that help the research rather than something I'd use on a daily basis.

I won't give any specific model timelines here, but I'm just expecting to see even bigger and even smaller models. A 0.1B model will exceed today's 1.5B models in intelligence. I'm personally much more interested in those very tiny models, like what Liquid AI is working on, as they're something we don't see from big companies.

We might see a nice model from Thinking Machines trained with LoRA.

Race

I used to think the AI race was a big thing, with the winner taking it all with ASI and not letting anyone else compete. But since my There is no singularity post, I think it's not like that at all.

Achieving some kind of ASI will not turn competitors obsolete. It's all one smooth curve. There could be moments when some companies are ahead, but overall they will all improve at a very similar pace. Open-source research also contributes to this, not letting open models fall too behind and allowing failing competitors to get closer to the top.

I expect OpenAI and Anthropic to still lead LLMs during the first half of 2026, but xAI and Google might catch up on post-training in the second half, so by the end of 2026 we might see all these four giants stay very close.

I'm expecting that in 2026, OpenAI will remain the leader of LLMs (even if by a small margin) and Google will remain the leader of multimodality and world models (likely by a large margin).

Adoption

In my university nearly everyone is using AI. But the way they use it varies a lot. Some are doing a few ChatGPT messages from time to time, some like me are using it much much more. If we take the whole world population, the difference is even more significant. There are so many people that aren't using AI at all, and few people that are very deeply in it. And this won't change.

It's like this with many things, there are always casual users and power users. But with AI this will have more impact on humanity. Those who aren't using AI right now will be behind and might need to catch up in urgency at some point, while those who are using it daily now will feel great and unlock even more new possibilities for themselves.

AI will get integrated into much more things in general. People are already used to their smart speakers being powered by LLMs, and Tesla cars having Grok as an assistant. Since the whole integration process across all industries is slow and gradual, it's hard to notice it day to day. But looking back by the end of 2026 we'll see how much more AI is within everything we're doing every day.

Google, Apple, Microsoft, and others will integrate it deeper into their software and hardware ecosystems. Attempts at doing that in past years were bad simply because models lacked the needed intelligence. Right now we're at the point when it can be used for very hard things already, and it's all a matter of time and taste, again.

Criticism

Critics and skeptics won't go away. It's normal. There are so many people hating on AI, across all groups. But I think most hate towards AI right now is from different kinds of artists, as image and music models can now generate high-quality pieces of "art" that would take humans hours. But I don't think this hate makes sense, as art itself is more abstract in nature than that. But that's a topic for a separate post.

There are also some haters among programmers and scientists. The usual reasons for hate from what I see are that things AI generates are getting into things meant for people to review. Like a flood of low-quality PRs in popular repositories and meaningless papers submitted to conferences.

But all this is a result of some humans using these tools in a bad way, and I don't think AI is responsible for that, really. There were always low-quality things, and I actually think AI will bump this "low-quality" bar rather than make the situation worse. But we'll have to get used to it. Just add more filtering and more review (including AI-based review).

In general, the criticism will probably just be on the same level as it is today. Some people will experience something bad and start hating, while some people will find usefulness in AI and start loving.

Risks

The riskiest thing I'm expecting from AI in 2026 is cybersecurity. Models like GPT-5.2 can already find new bugs and vulnerabilities in huge codebases. I'm personally seeing this in what I've been working on recently, and there are some confirmations from outside.

As models get even smarter, they will be able to find more vulnerabilities faster. This is a double-edged sword. While it will help everyone by having a very intelligent security auditor with you to review all the code you're shipping, therefore improving quality and security of all the software in the world, it will also allow malicious hackers to exploit much more.

As usual, the adoption for malicious use cases will probably be quicker than the adoption for good intentions. So, we'll see even more major exploits in software. But in the long term it will all be fine, as adoption for security will also improve.

Economic impact

Even though GDPval will be nearly saturated, we will not see any economic impact from AI adoption on charts yet. It's a very slow process and it might get lost within the noise anyway. Since AI is on the same curve as overall humanity's progress, we might not see any extreme impact at all. Just regular economic growth.

But what will be noticeable is the redistribution of power towards companies that are better at using AI in ways that help them progress faster.

The "AI bubble" will not burst in the way most people expect it to burst. Mostly due to the pace of real improvements in AI. OpenAI will not collapse, and progress will not get hurt by slower funding after the "burst". It will all keep progressing as it does now.