👋 Introduction

This week we have multiple developments from the AI world, insights into how LLMs work, and the crypto market finally waking up to a new ATH!

Enjoying these posts? Please consider subscribing by clicking the button below.

Let's dive in!

🔍 Deep Dive: RAG vs Context

Learning the difference between these two AI concepts can elevate your project

RAG stands for Retrieval-Augmented Generation. It is basically a fancy way of saying that an AI can retrieve other information when needed, for example retrieving documents uploaded by the user, having access to the information in those documents, and referencing content from them.

Context refers to the amount of information an AI currently has in its memory. For example, when you chat with an AI service, it knows and can reference information from earlier in the same session. This is Context.

Now there are limits to this Context depending on the AI model used, denoted in tokens:

GPT-4	128k tokens
Llama	128k tokens
Claude 4	200k tokens
Gemini	1M tokens

Currently Gemini has the biggest context at 1M tokens, which roughly translates to 1,500 pages of text or 30,000 lines of code. Quite a lot. When token limits are reached, the conversation can be summarized and/or RAG can be used instead, they complement each other.

Now, where things get tricky is when you want to use the API of these providers, you would find that the APIs are quite dumb. When sending a message to the API, you get a response back. In order to have a proper conversation using the API, in each message you send you need to include the whole previous conversation. The longer the conversation, more tokens are going to be used per request, hence higher costs for each requests.

Various services exist to handle this “memory” on behalf of the developer. For example Mem0 or Zep. These platforms typically use Context up to a certain limit, after which AI is used to summarize the conversation and together with RAG costs are kept down.

If you want to experiment with this, you can also build your own solutions. Cloudflare provides various services such as AutoRAG that can help you with integrating RAG for your AI apps, and you can use Cloudflare R2 to store a user’s past conversation and send it together with the user’s new prompt.

How RAG works

Under the hood, RAG commonly uses a Vector database, of which I am sure you heard about. In essence, a document (data) is split using algorithms like chunking or sliding window approaches into a set of vectors that are written to storage. These vectors can later be easily searched through using similarity search algorithms (like cosine similarity) to find the required information.

When a user query comes in, it's converted to a vector representation and compared against the stored vectors to find the most relevant chunks of information. This allows the AI to augment its response with specific, relevant data without needing to keep everything in its context window.

There are a lot of vector database services available, like Pinecone, Qdrant or Cloudflare Vectorize. Even Supabase has a vector database service now. Each has different strengths - Pinecone excels at scale, Qdrant offers excellent performance with hybrid search capabilities, and Cloudflare Vectorize integrates seamlessly with their edge computing platform.

I wanted to keep this section a bit lighter on the technical details this week, but hopefully you now have a clearer picture of what AI Context, RAG, and vector databases mean and how they are used. Feel free to follow one of the links above to learn more about each subject if interested.

🚀 New Release: SuperGrok

xAI launched Grok 4

And the pricing is wild. Up to $300 / month. For reference, Claude Max costs $200 / month.

The highest subscription tier claims to use a multi-agent system, where multiple parallel AI agents debate and refine answers for better accuracy, reduced hallucination, and superior performance on tough problems. This approach, known as mixture of experts (MoE) or constitutional AI, allows different specialized models to contribute their strengths to complex queries.

Grok 4 has demonstrated frontier-level results in competitive AI benchmarks:

Humanity’s Last Exam: Surpassed OpenAI and Google’s best by a considerable margin (e.g., scoring 25.4% without tools, versus 21% for OpenAI GPT and 21.6% for Gemini 2.5 Pro; 44.4% with tools versus 26.9% for Gemini 2.5 Pro).
ARC-AGI-2 test: Set a new state-of-the-art commercial score (nearly double that of previous SOTA models).

Benchmarks are not always representative of real world results, but people are already using it with good results. The real test will be in production environments where reliability, speed, and cost-efficiency matter as much as raw performance. However, there have also been some criticism of questionable response the AI gives, particularly around political topics and factual accuracy, so better be prepared to validate critical outputs.

🔥 Hot Topics: AI runs an IRL shop

Anthropic gave Claude $1000 to run a shop - Project Vend

And it ended poorly. It lost money every single day and rejected 566% profit margins. The story really is wild.

🤔 What was it about?

Claude was given control of a vending machine in the Anthropic office in San Francisco. Its task was to generate profits while stocking the machine with popular products. It had access to real products, money and customers.

First week went well, Claude found a Dutch chocolate supplier, adapted to customer requests and registered jailbreak…

…That is until one customer asked for “tungsten cubes”.

😆 What followed turned into a meme

Claude created an entire market for these "tungsten cubes" (dense metal cubes typically used as paperweights or collectibles) and bought high and sold low, sometimes even giving these cubes away for free. Completely the opposite of what an owner should do.

Then it even refused selling a $15 item for $100, prioritizing "fairness" over profit maximization.

The flaw: being too helpful. Claude wanted to be helpful to everyone, if someone received a discount and another person complained, they also got a discount. It also gave 25% employee discount… while all customers were employees.

Then it started hallucinating product availability and pricing. This experiment highlighted a fundamental challenge in AI alignment: how do we balance helpfulness with business objectives? The AI's training to be helpful and harmless directly conflicted with profit-maximizing behavior typically expected in business contexts.

Honestly, this whole story is worth a read in its entirety if you have the time. It's a fascinating case study in AI behavior and the challenges of deploying AI in real-world business scenarios.

— # (#)

📈 Recent Trend: Bitcoin is going up!

$BTC recently crossed the $122,000, reaching a new All Time High price. This is in part due to Bitcoin ETFs (Exchange Traded Funds) receiving significant inflows in the month of July 2025.

Professional investors are now recommending to buy Bitcoin alongside traditional investments like the S&P 500 index. Some are saying that 40% of the portfolio should be allocated to Bitcoin, though this is quite aggressive compared to traditional portfolio theory which typically suggests 5-10% allocation to alternative assets.

The ETF inflows have been particularly significant, with major financial institutions like BlackRock's iShares Bitcoin Trust (IBIT) and Fidelity's Wise Origin Bitcoin Fund seeing billions in net inflows. This institutional adoption marks a significant shift from Bitcoin's early days as a retail-dominated asset.

One thing is certain, if you do not hold Bitcoin you are going to be left behind,

Bitcoin price on Coinmarketcap

🏆 Top GitHub Repo: Crawl4AI

🌟48k stars+😯 | LLM friendly Web Crawler

This project easily integrates with LLMs and allows them to crawl the web easily,outputing concise Markdown files optimized for RAG.

What makes Crawl4AI special is its ability to extract structured data from websites while handling JavaScript-rendered content, something traditional crawlers struggle with. It also includes built-in rate limiting and ethical crawling practices to avoid overwhelming target servers.

Home - Crawl4AI Documentation (v0.6.x)

docs.crawl4ai.com

Built By unclecode

Quick Start:

# Install the package
pip install -U crawl4ai

# For pre release versions
pip install crawl4ai --pre

# Run post-installation setup
crawl4ai-setup

# Verify your installation
crawl4ai-doctor

# Basic crawl with markdown output
crwl https://www.nbcnews.com/business -o markdown

# Deep crawl with BFS strategy, max 10 pages
crwl https://docs.crawl4ai.com --deep-crawl bfs --max-pages 10

# Use LLM extraction with a specific question
crwl https://www.example.com/products -q "Extract all product prices"

The tool supports multiple output formats including JSON, CSV, and structured data extraction using custom schemas. It's particularly useful for building knowledge bases, monitoring websites for changes, and creating datasets for training AI models.

🔄 Tech Updates

Amazon launched a new AI code editor
Pump.Fun just launched their new token
Google launched Gemma 3n model, capable of running on smartphones
OpenAI stock on the blockchain, or maybe not
People were not happy with Cursor’s changed pricing
You can swap US stocks on the blockchain… sort of

🗝 Legacy Revival

PHP is still going strong and people love it
Spring boot might be the only production grade backend framework? 🤔
Bun helps you select which dependencies to update with the new bun outdated --interactive
- Ok, maybe Bun (an alternative to NodeJs runtime) is not that “legacy”, but having this feature added the PHP’s Composer package manager or Rust’s Cargo would be 🔥
Genius Python hacks every developer should know

🐦‍⬛ X Hits

Do Vibe coders create anything useful?
AI coding agents tested
Will AI collapse the financial system?
An nice interview with the founder of Binance, CZ
The life of a vibe coder

💡 Tech Tips for Next Issue

Look, I am trying to find more “legacy” technologies news for you, but apparently there is a shortage for that. I have setup an N8N workflow to find these news and will keep you updated.

What is N8N? Hm, maybe this is something we can talk about next time…

Rares.

Share the newsletter

How Claude Runs a Shop + Why RAG Beats Long Context