Siddharth Ramakrishnan

Writing

Teaching an LLM to Read a Profiler

October 26, 2024

What happens when you hand an AI real performance numbers and asking it to do something useful with them?

Didn't I have something better to do?

LLMs already reason through chain of thought using text. Models produce intermediate "thoughts" and reuse those as inputs to a better final output. But text‑only loops miss what actually counts: measurements that come from outside the language model. When the model can act (e.g. compile code, run a benchmark, read a sensor) and then fold those results back into its next thought, “chain of thought” turns into chain of action → measurement → reflection.

That opens the door to longer arcs: ship a patch, wait for production metrics, come back with a smarter patch; spin up a lab experiment, watch the overnight results, redesign the protocol. My 1BRC sandbox is deliberately small, but the mechanic is the same: a feedback loop tight enough for the model to see the world push back and adjust accordingly.

Tooling reality check

I think everyone's least favorite part about writing code is environments and tooling... I started in C++ because the goal was to be fast. Then I remembered that macOS likes Clang and considers GCC optional. Instruments shows pretty graphs but won’t cough up plain text. I switched to Python + Scalene for one reason: the profiler prints something an LLM can read.

The control loop

  1. Start with seed code
  2. Run the code
  3. Check if it passes tests:
    1. If no, collect traceback
    2. If yes,collect Scalene report
  4. Send the collected information to Claude to suggest a minimal edit
  5. Use the suggested edit to update the seed code, and repeat the process

Each cycle:

  1. Run on a 1 M‑row sample (baseline 0.72 s).
  2. Save either the error trace or the profiler table.
  3. Ask the model for the smallest possible change.
  4. Keep the new champion if it’s faster and still correct.

Progress (minus the confetti)

IterTime (s)Model’s headline tweak
00.72Baseline
20.42Use mmap instead of readline()
50.32Slice bytes directly; skip split(',')
70.22Reuse a single buffer
90.13multiprocessing (2 cores)

Things that needed guardrails

Reinforcement learning?

Right now it’s more like evolutionary search: generate variant, test, keep the fittest. A proper RL setup would treat runtime and memory as a reward, and adjust the policy weights. The ingredients are here; the recipe isn’t finished.

Where to push next

Takeaway

Give a model real numbers and it stops hallucinating improvements and starts earning them. Today that saved a few hundred milliseconds. Tomorrow it could shave weeks off a simulation schedule. Not bad for a few lines of glue code and a profiler.

Aspirationally, I hope we see LLMs get real feedback as much as possible. Executing code and connecting to real world sensors are the holy grail that can ground new models and lead to much better chains of thought.