
Half of my time used to be wasted asking the same question to ChatGPT, then Claude, then Gemini, switching tabs to compare. Then second-guessing whichever one sounded most confident.
So I built Dalal — one prompt, six open-weight LLMs answering in parallel on Cloudflare's edge, with a synthesizer pointing out where they agreed and where one probably hallucinated. The cross-check that should have always existed.
— Marcos · Recife, Brasil · @marquinhos1904
The big labs are burning $500M+/quarter on training. They subsidize current pricing to grab market share. Once they consolidate, the bill comes due.
What I do: I'm pricing my SaaS now assuming inference cost 3x. If you build assuming today's prices, you're cooked.
Throwing your docs into a vector DB and calling it "grounded" is how 80% of AI products break in production. The retrieval layer matters more than the model.
What I do: Layered context — structured prompt skeleton + curated examples + verified primary sources, then the LLM. Not the other way around.
Workers, D1, R2, Durable Objects, Pages, Queues, Workers AI — you can ship a real SaaS for $0/mo idle. I run 60 sites + 2 SaaS products on it. Zero servers.
What I do: Default to CF for everything. Only leave when there's a real reason. Most "we need AWS" decisions don't survive 5 minutes of scrutiny.
Anyone can list features and pricing. The valuable signal is: which tool did the operator actually keep paying for after 6 months?
What I do: I publish the live list of what I pay for monthly. When I cancel something, I write why.
Real agentic workflows need structured handoffs, retry logic, observability, and a way to not nuke your budget when one step loops. Most "agent frameworks" skip 3 of those 4.
What I do: Boring deterministic glue first. AI calls only where they earn their keep. Loud failure better than silent hallucination.
Generating 1000 articles with GPT and praying Google ranks them is a 2023 strategy that died in March 2024. Helpful Content Update killed it.
What I do: Process matters more than volume. Information gain per article. Specificity that doesn't generalize. Editorial voice the model can't fake.
Notion + Linear + Slack + Figma + 8 others = $400/mo and still confusion. Pick the 3 that match your actual workflow, kill the rest.
What I do: Cursor (or Claude Code) + Linear + 1 voice channel. Everything else is a distraction tax.
Best-in-benchmark tools die in production because the friction doesn't fit the workflow. The boring 80% solution that fits your habits beats the 100% solution you avoid.
What I do: Test for 1 week minimum, real work, real stakes. If I'm avoiding it by day 3, it's out.
Ask one question. 6 open-weight LLMs answer in parallel on Cloudflare's edge — Llama 70B, DeepSeek R1, Qwen 2.5, Gemma 3, Mistral, Llama 8B. A judge model summarizes the consensus and scores agreement. The cross-check before any decision you can't undo.
For decisions one snapshot can't crack. 4 agents with opposing roles — Pessimist, Optimist, Engineer, Strategist — debate your dilemma across 3 rounds, rebut each other live, and a synthesizer closes the verdict. You watch it stream. The boardroom you don't have.
2-3x per week, my operator-voice take on what shifted in AI infra and tooling. No "10 best tools" listicles. Specific incidents, specific numbers, what to do about each. The narration alongside the machine.