Reach for a purpose-built model before a general LLM
Building OCR tools for developers
The reflex in 2026 is to throw every task at the biggest available model, but for a narrow, repetitive job a specialized model often wins on both cost and accuracy. OCRskill turns documents into structured data more cheaply and, in head-to-head testing, more accurately than a frontier LLM, because reading a scanned form is exactly the bounded problem a dedicated OCR pipeline was built for. Before you assume the largest model is the answer, benchmark a purpose-built tool on your actual inputs: the general model is paying, on every call, for capabilities your task never uses. The lesson generalizes well beyond OCR. When the job is narrow and high-volume, the specialist usually beats the generalist on the only two numbers that matter, price per call and error rate.
Related advice
For a developer tool, the integration is the product
OCRskill's customers are developers, and what they buy is not the OCR model, it is never having to write brittle string parsing again. The API takes a schema and returns a typed object that matches it. For a developer tool, the quality of the integration (how few steps it takes to go from request to usable data) is the actual product, and the underlying technology is just how you deliver it. Spend disproportionate effort on the API surface, the docs, and the first five minutes. That is the part your buyer experiences and the part they tell other developers about.
Let the schema drive extraction, not the prompt
When you extract structured data with a model, do not prompt-and-pray and then parse the prose. Define the output schema first and let a framework like PydanticAI run the model in a loop, feeding validation errors back in until the result conforms. OCRskill's API takes the fields you ask for and returns a typed object that already matches your schema, so the caller never writes brittle string parsing. The pattern turns an unreliable text generator into a dependable function: the schema is the contract, the loop is the enforcement, and the caller gets data instead of a paragraph. Any time you are tempted to regex an LLM's answer, reach for structured output with a validation loop instead.
The boring, narrow niche is the moat
OCRskill extracts structured data from procurement documents. It is not glamorous, and that is exactly why it is defensible. A narrow, unsexy, high-volume job is one most founders skip while looking for something more exciting, which leaves the field open to whoever is willing to go deep on it. Mihai out-specializes general LLMs on document extraction precisely because he only does that one thing. Boring problems with real, repeated demand are where a solo builder can build something a frontier lab will never bother to beat.
Own your compute to keep a high-volume API profitable
OCRskill runs on owned bare-metal hardware instead of rented per-call GPUs, and that is the reason the margins hold at volume. Per-call cloud inference is convenient at the start and brutal at scale: your cost grows in lockstep with every request, forever. For a product whose whole job is to be cheap per call and run constantly, owning the compute converts an unbounded variable cost into a fixed one. If your product's economics depend on doing one expensive operation millions of times, model the bare-metal version before you assume the cloud is cheaper.
Extracted from
Indie TM #10: Four Builds and the Reach Problem