Software Is Eating the World (But Actually This Time)
Coding was just the first workload, and almost everyone is underestimating how much inference demand will grow.
Topic Guide · 10 essays
Inference, agents, routing, evaluation, and model tooling.
These essays are about the machinery underneath modern AI products: inference constraints, orchestration layers, model tooling, and the developer workflows that emerge around them.
If you care more about how systems actually behave than about benchmark theater, start here. The through line is that once models are useful, the hard problems move into routing, evaluation, interfaces, and operations.
Coding was just the first workload, and almost everyone is underestimating how much inference demand will grow.
A short note on distilling Qwen2.5-Coder-7B for numeric vulnerability triage, where a narrow specialist slightly outperformed GPT-5.2 on a real benchmark.
A technical deep dive into inference optimizations for background coding agents, from semantic caching to speculative execution and smarter scheduling.
AI code generation is flooding repos with changes. The next bottleneck in software is clearing, sequencing, testing, and deploying them safely.
AI models, reasoning, tool use, and real world data.
Building basketball shot-classification models taught me a classic ML lesson: sometimes human operations beat a more complicated AI system.
Exploring the pivotal role of vector databases in managing multimodal data, including video and voice content.
Explore how Attentive leveraged LLMs to rapidly prototype a product recommendations model, significantly reducing development time from months to just a week.
Explore the integration of LLMs in enhancing traditional ML workflows at Attentive, focusing on data cleaning and organization.
Discussing an innovative ML Recommendation Engine, this blog post explores a self-serve portal idea for companies to match their data with suitable ML…