In Gemini at least, if you look at how they process PDFs, they do an OCR and then feed the text + image to the model, without charging you for the text tokens (I believe).
So my guess is that Claude’s backend is doing the same — so this hack is probably more of a loophole in token accounting that might get closed if Claude is doing what Gemini does
I tried the same thing last year (with openai models), back then it worked to reduce prompt tokens, but you needed way more completion tokens, ultimately more expensive (and slower)
https://pagewatch.ai/blog/post/llm-text-as-image-tokens/
seems really dumb and like it would need to violate basic information theory to work?
input tokens are cheaper than output tokens. seems like it would maybe reduce input tokens at the expense of many more output tokens if you're actually triggering OCR via thinking?
In Gemini at least, if you look at how they process PDFs, they do an OCR and then feed the text + image to the model, without charging you for the text tokens (I believe).
So my guess is that Claude’s backend is doing the same — so this hack is probably more of a loophole in token accounting that might get closed if Claude is doing what Gemini does
Ahhh my eyes the vibe coded readme
I tried the same thing last year (with openai models), back then it worked to reduce prompt tokens, but you needed way more completion tokens, ultimately more expensive (and slower) https://pagewatch.ai/blog/post/llm-text-as-image-tokens/
This seems like a pricing hack that burns resources, that when the loophole gets closed the price of OCR will have to rise?
Related: https://blog.can.ac/2026/06/10/snapcompact/
seems really dumb and like it would need to violate basic information theory to work?
input tokens are cheaper than output tokens. seems like it would maybe reduce input tokens at the expense of many more output tokens if you're actually triggering OCR via thinking?
there's also a DeepSeek whitepaper on this technique https://www.seangoedecke.com/text-tokens-as-image-tokens
I want to see more text-free foundation models
That is hilarious and an amazing find.