Biggest AI Prediction & Why I'm Allocating $175,000 to it
In today's article, I cover,
- Current state of AI Explained
- Biggest prediction: What kinds of companies profit most from AI
- A $175,000 AI thesis
- Next Portfolio moves aligned with my AI thesis and valuation buy levels (đź”’Unlock with premium tier)
Current State of AI Explained
In Lex Fridman's recent podcast on the state of AI, Sebastian Raschka, a leading expert in the field, pointed out that much of today’s AI is still built on the same core idea: the Auto-Regressive Transformer model.
Here is a simple way to understand how this works.
When you type a question into ChatGPT, for example:
“What is the national language of Singapore?”
The model does not understand the question in the way a human does. Instead, it treats your input as a sequence of words called tokens.
The model then asks a very specific question:
“Given all the tokens I have seen so far, what is the most probable next token?”
Step by step, the process looks like this:
- You provide the initial sentence:
- “What is the national language of Singapore?”
- Based on everything it has learnt during training, the model predicts the most likely next token based on probabilities. That might be:
- “It”
- The sentence now becomes:
- “What is the national language of Singapore? It”
- The model repeats the same process, now considering the new sentence "What is the national language of Singapore? It" and predicts the next most likely token:
- “is”
- The sentence now becomes:
- “What is the national language of Singapore? It is”
- Based on this new sentence, the model repeats and predicts the next token again :
- “Malay”
At this point, the final output looks complete and sensible:
“What is the national language of Singapore? It is Malay.”
What you see as a smooth, coherent answer is actually the result of many tiny steps, where the model adds one token at a time, each chosen because it has the highest probability of fitting the context.
This token-prediction is the same concept when used to generate AI images and videos. For e.g., when you prompt Gemini's Nano Banana with the words:
"Create an image of Donald Trump"
The text prompt is converted into tokens, just like in a language model. Then the model starts with a noisy, incomplete image representation. It repeatedly asks:
“Given what I have generated so far, what is the most probable next visual detail?”
These “visual tokens” are not words, but small patches, colours, shapes, textures, or pixel-level patterns.
Step by step, the model refines the image:
- Adding facial structure
- Refining skin tone
- Adjusting lighting
- Filling in background details
Just like with text, the final image you see is the result of many incremental predictions, each one based on probability and prior context.
Now that you understand how ChatGPT works and the Auto-regressive Transformer architecture behind it, you may be disappointed to realize that when you seek ChatGPT for emotional therapy, the AI is neither 'sentient' nor understanding of you, but is merely predicting the next best token given the question of word tokens you've fed it.
In simple terms, today’s AI revolution is fundamentally about predicting the next token with the highest probability, given some input.
In the above examples, each step of generation is probabilistic. This matters because it explains why AI systems can behave inconsistently.
For example, if I ask ChatGPT:
“What is Singapore’s national dish?”
I might get Chicken Rice.
If you ask the exact same question, you might get Nasi Lemak.
Both answers are defensible, but the key point is this: the model is not retrieving a fixed truth. It is probabilistically sampling from its large corpus of trained data (from the internet, from books, etc.). The output is therefore not guaranteed to be identical every time.
Occasionally, the model may produce an answer that most humans would consider wrong, for example:
Mala Xiang Guo as Singapore’s national dish.
This behaviour is known as hallucination. It refers to cases where a model confidently produces incorrect or fabricated information.
Hallucination is not a bug that can be fully patched away. It is a direct consequence of how transformer models work.
Because these models are probabilistic by design:
- Errors can be reduced, but never fully eliminated
- More compute improves performance, but does not change the underlying mechanism
- As long as outputs are generated probabilistically, there is always a non-zero risk of hallucination
This is why, more than three years after the release of ChatGPT 3.5 in November 2022, hallucinations are still present despite models becoming far larger compared to when the AI hype cycle first started.
The core limitation is that today’s models are static:
- They do not truly learn from their mistakes
- They do not update their internal understanding after being corrected
- They do not self-reflect in the way humans do
Adding tools, such as calculators or search engines, does not fundamentally solve this issue. If the model reasons incorrectly about when or how to use a tool, the result is still garbage into the tool and garbage out.
As a tech engineer working daily with AI systems at one of the world’s largest tech companies, my view is that AGI cannot be achieved by relying purely on next-token prediction or the current Transformer architecture.
This view is shared by several key figures in the AI field.
Jerry Tworek, a former OpenAI architect behind the o1/o3 reasoning models and Codex, stated in this video:
“I don’t think a static model can ever be AGI. Continual learning is necessary.”
Andrej Karpathy, a founding member of OpenAI and former head of Tesla AI, has similarly mentioned in this video:
they (Transformer models) don't have continual learning you can't just tell them something and they'll remember it and they're just cognitively lacking and it's just not working and I just think that it will take about a decade to work through all of those issues
Both arrived at the same conclusion: Static transformer models are not sufficient, and continual learning is essential.
In terms of timelines, Demis Hassabis suggested in a recent interview that we are at least 5 to 10 years away from AGI, explicitly noting that today’s transformer-based LLMs have clear limits.
My current view of today’s AI models can be summarised as follows:
- They are fundamentally based on probabilistic next-token prediction, not true reasoning
- Hallucinations will persist even as more compute is applied
- AGI cannot be achieved using transformer-based models alone and continual learning are required for breakthroughs
- We are likely still 5 to 10 years away from AGI
These constraints define the ceiling of what today’s AI architectures can realistically achieve, and they are central to why I am positioning my bets in a very specific direction as I shall describe in the next few sections.
What kind of companies profit most from Today's AI?
(This section only makes sense if you understand the architectural limits discussed earlier!)
I categorize the AI stack into six levels:
- Level Zero: Energy (GE Vernova, Cameco Corp, Constellation Energy, etc.)
- Level One: Chips (TSMC, Nvidia, AMD, ASML, Broadcom, etc.)
- Level Two: Infrastructure & Data Centre (Equinix, Arista Networks, Vertiv, Amazon, Google, Microsoft, etc.)
- Level Three: AI Foundation Model Companies (OpenAI, Anthropic, Google DeepMind, Mistral, etc.)
- Level Four: AI Software Infrastructure (Amazon Web Services, Google Cloud Services, Microsoft Azure, Palantir, Snowflake, Databricks, etc.) - Enterprise platforms enabling AI deployment, orchestration, and data pipelines
- Level Five: AI Applications, Apps and Services (Meta, Google, Microsoft, Amazon, ServiceNow, Shopify, Axon, Netflix, etc.) - Companies delivering end user value and capturing economic surplus from AI optimisation
I will be focusing on Level Five in this article because this is where economic validation happens.
You can have:
- The most advanced GPUs
- The cheapest energy
- The largest data centres
- The most powerful foundation models
None of it matters if end users do not generate ROI that justifies capex deployed upstream.
Level 5 determines whether the entire AI stack earns an adequate return on capital.
Over the long term, the bulk of economic surplus accrues to the layer closest to the customer. Historically in technology cycles, infrastructure enables value creation, but applications capture pricing power.
This layer is still early.
If today’s AI is fundamentally probabilistic next-token prediction, and not true reasoning or AGI, then the key investment question becomes:
Where does today's AI architecture already create measurable economic value?
The best use cases share two characteristics:
- The cost of hallucination is low relative to ROI
- The output is verifiable
From internal observations within my own company, and conversations with peers in other application-layer firms, the most common enterprise use cases today are:
- Coding
- Marketing asset creation: copywriting, images, videos
- Internal semantic search
- Drafting reports and strategic insights
- Summarising documents and meetings
These mostly generate indirect ROI.
They increase productivity.
They keep headcount flat.
They may expand operating margins over time.
However, it is still unclear how much of recent tech layoffs are AI-driven productivity gains versus pandemic over-hiring normalization. That data will take years to confirm.
But there is one use case where AI ROI is already direct, measurable, and immediate and that is - Advertising.
Let me explain.
Ads share two structural traits with coding (a use case that has shown the most promise in enterprise):
- Low cost of failure with hallucination
- Built-in verification mechanisms
In coding, hallucinated outputs are caught through testing frameworks. Unit tests, integration tests, and runtime checks validate whether the generated code works. If it fails, it does not ship.
Advertising works similarly.
An advertiser can generate five variations of an AI-created image, headline, or video and deploy them simultaneously. Performance is verified empirically through A/B testing across metrics such as:
- Click-through rate
- Conversion rate
- Return on ad spend
Poor-performing creatives are automatically filtered out by the market. Strong performers scale.
Advertising is therefore a near-perfect commercial application of probabilistic AI.
And there is one company that benefits disproportionately from this architecture:
Meta.
A $175,000 AI thesis
We first need to understand why Ads are such a natural fit for today's AI and already demonstrate direct, not just indirect ROI.
When a company needs to run ads, they first need to create marketing-related collaterals like ad copies, images or videos. Traditionally, the cost and time investment required is high, because companies need to hire graphic artists, video editors and marketing copy-writers to perform these jobs.
Now, using AI tools like Meta's Advantage+ (Think of it as Nano Banana but integrated within Meta's ad system), advertisers can easily create a large number of media assets at a low price and at a speed faster than what has been historically possible. Hallucination is not an issue because,
- You can generate 10 variations and select the best
- Creative subjectivity tolerates novelty
- A human remains the final tastemaker
This significantly lowers barriers to entry for advertisers and is the first revenue generating impact AI transformer-based models bring to ad companies like Meta or Google. In fact, Meta provided actual numbers on the revenue impact of this, from the Q4 2025 earnings call,
Another area we’re deploying AI to improve performance is ad creative. The combined revenue run-rate of video generation tools hit $10 billion in Q4, with quarter-over-quarter growth outpacing the increase in overall ads revenue by nearly 3x
MBI also wrote an article where researchers conducted an experiment comparing the ad effectiveness of human created vs AI created ad creatives. The result? GenAI-created ads (made from scratch) achieved a 19% increase in CTR compared to human-expert ads.
The part where transformer-based models really shine in ads are AI recommendation systems. For e.g., Meta has a transformer-based AI model called GEM that is trained on thousands of GPUs. This model predicts the next most relevant ads to the right person at the right time with low latency. When launched on Reels earlier last year, GEM immediately increased Meta's ad conversions by 5%. More recently in Q4 earnings,
In Q4, we doubled the number of GPUs we used to train our GEM model for ads ranking. We also adopted a new sequence learning model architecture, which is capable of using longer sequences of user behavior and processing much richer information about each piece of content. The GEM and sequence learning improvements together drove a 3.5% lift in ad clicks on Facebook and a more than 1% gain in conversions on Instagram in Q4. This new sequence learning architecture is significantly more efficient than our prior architectures, which should enable us to further scale up the data, complexity and compute we use in our future ranking models to deliver performance gains. As we scale up our foundational ads models like GEM, we are also developing more advanced models to use downstream of them at run-time for ads inference. In Q4, we launched a new runtime model across Instagram Feed, Stories, and Reels, resulting in a 3% increase in conversion rates in Q4.
All of the above are use cases returning an ROI from today's models, no requirement for AGI to arrive.
Google also benefits from transformer based AI models because it is largely an ad business. But between Google & Meta, I favour the latter because capital allocation differs materially.
From Alphabet’s Q4 2025 call:
Over half of ML compute in 2026 is expected to go to Google Cloud.
This means roughly half of Google’s AI capex is supporting third-party compute consumption in Google's cloud business.
Meta, by contrast, uses its AI capex almost entirely for first-party monetisation inside its Family of Apps.
That has three implications:
- ROI visibility is clearer
- Margins are structurally higher
- Capital allocation feedback loops are tighter
Additionally:
- Meta’s ad format is predominantly display and video based, which benefits directly from generative AI creative tools
- Google’s business remains heavily text-search dependent with a far lesser video/image component
Finally, GPU compute at Meta is fungible. If AI initiatives overbuild capacity, compute can be redirected toward monetising engagement across Family of Apps, whereas half of Google's GPU capex is stuck with Google Cloud where long-term ROI on AI compute is still in doubt.
Investment Implication
If we accept that:
- Transformer models are powerful but bounded
- Hallucination persists
- AGI is at least 5 to 10 years away
Then the rational investment focus is not on speculative AGI breakthroughs.
It is on companies that can:
- Deploy transformer AI today
- Measure ROI directly
- Compound incremental performance improvements
- Reinvest into scale
Advertising is the cleanest demonstration of direct AI ROI so far.
And in my view, Meta is the clearest large-scale beneficiary of transformer-era economics and is the reason why I have $175,000 deployed into it at the moment as my largest position.
In the coming days, I intend to make some significant re-allocation decisions for my portfolio after a few key observations I've made from recent Q4 earnings calls of other tech companies combined with my AI thesis set out in this article above.
I lay out my moves in the next section.
My Next Investment Steps in the coming days & Valuation buy levels:
Note: The following does not constitute financial advice/recommendation and is merely a journal of my own portfolio investment actions.
One shocking discovery I made upon studying the recent Q4 earnings call from a particular company completely changed my mind about AI capex risk. The company is...
(đź”’Unlock below with premium tier):