The post title about it being "pixel-faithful" is a bit strange. I don't see that claim in the repo, and they don't seem to even claim full feature support at the moment. And for the features marked as supported in .pptx's, it does seem that at least slide image backgrounds and bullet point images aren't actually working, and some text objects have inverted text colors. Seems quite far away from being pixel-faithful in fact.
Very nice, the rendered demo for all the file types appear to render flawlessly and load instantly on page load, and looking in the DevTools the parsers are split into different Wasm bundles for each file type xslx, docx and pptx:
docx 458kb raw 217kb gzipped
pptx 574kb raw 253kb gzipped
xslx 601kb raw 269kb gzipped
I expected the Wasm bundles to be large and a lot more bigger than that for some reason.
ChatGPT.com can benefit from using this library (or such a library) for rendering a preview of the file in a side panel on the right, instead of just giving me a download link to the outputted/transformed docx/pptx/xslx file.
I tried a few pptx from consulting firms available online. The rendering does not seem to actually be pixel-perfect, but all were quite readable with a good layout, which is already an impressive feat.
Interesting because I'm building ooxml-cli right now, for editing pptx, docx, xlsx.
At work I had to adapt a pptx to a corporate template and tried via agent. It kept failing so I started building and then it was able to relatively quick and accurate do what I needed.
Then I needed it to make tables, add pictures. Recently wanted to get data from an xslx and replace text in a presentation etc.
So the tool is growing and maybe this would be interesting to have as the non LibreOffice dependent viewer...
Pretty cool, rendering PowerPoint files to an image is probably the only way for LLMs to make sense of them.
Does this work in Cloudflare’s workerd environment? Would be nice to have a cheap serverless render -> LLM (GLM-OCR / PaddleOCR) -> Markdown pipeline for the various MS Office formats.
This code creates a JSON intermediate representation that LLMs could probably consume. You might want to simplify it to focus on content and reduce token usage.
If someone actually got "pixel-faithful" Office documents rendering correctly, MS would be screwed. That's actually really important for a lot of companies that carry around decades-old templates that never look exactly right in LibreOffice or any other software that attempted to replicate it.
The slightest misalignment of a paragraph means a line on page 27 of 120 now moved down by 2 pixels, screwing everything else out of alignment. Yes, plenty of companies pay Microsoft 365 subscriptions because of exactly this reason; it sounds ludicrous when you think they could just pay someone to replicate the formatting in a different suite a lot less than the subscription costs, but that's not how it works...
Sadly, Microsoft 365 is not “pixel perfect” compared to word. I often run into headaches where line numbers are different between the two and content ends up on different pages.
If Microsoft can’t get consistent rendering of word docs between Word for Windows, Word for macOS and Office 365, I don’t like anyone else’s chances.
which means it probably gets all the halucinated assets correctly and any real world documents wrong.
Still, looks pretty; if it actually has proper testing, could close the gap. Code not being the hard part is a major impediment to good software coming out of these things.
The post title about it being "pixel-faithful" is a bit strange. I don't see that claim in the repo, and they don't seem to even claim full feature support at the moment. And for the features marked as supported in .pptx's, it does seem that at least slide image backgrounds and bullet point images aren't actually working, and some text objects have inverted text colors. Seems quite far away from being pixel-faithful in fact.
Very nice, the rendered demo for all the file types appear to render flawlessly and load instantly on page load, and looking in the DevTools the parsers are split into different Wasm bundles for each file type xslx, docx and pptx:
I expected the Wasm bundles to be large and a lot more bigger than that for some reason.ChatGPT.com can benefit from using this library (or such a library) for rendering a preview of the file in a side panel on the right, instead of just giving me a download link to the outputted/transformed docx/pptx/xslx file.
I tried a few pptx from consulting firms available online. The rendering does not seem to actually be pixel-perfect, but all were quite readable with a good layout, which is already an impressive feat.
Interesting because I'm building ooxml-cli right now, for editing pptx, docx, xlsx. At work I had to adapt a pptx to a corporate template and tried via agent. It kept failing so I started building and then it was able to relatively quick and accurate do what I needed. Then I needed it to make tables, add pictures. Recently wanted to get data from an xslx and replace text in a presentation etc.
So the tool is growing and maybe this would be interesting to have as the non LibreOffice dependent viewer...
I have a bunch of goodies at https://rcarmo.github.io/projects/go-ooxml and https://rcarmo.github.io/projects/python-office-mcp-server you might enjoy then.
Pretty cool, rendering PowerPoint files to an image is probably the only way for LLMs to make sense of them.
Does this work in Cloudflare’s workerd environment? Would be nice to have a cheap serverless render -> LLM (GLM-OCR / PaddleOCR) -> Markdown pipeline for the various MS Office formats.
This code creates a JSON intermediate representation that LLMs could probably consume. You might want to simplify it to focus on content and reduce token usage.
Misread that as open office xml not office open xml. I wish the standards were named more differently. They are too easy to confuse
Microsoft did that deliberately.
If someone actually got "pixel-faithful" Office documents rendering correctly, MS would be screwed. That's actually really important for a lot of companies that carry around decades-old templates that never look exactly right in LibreOffice or any other software that attempted to replicate it.
The slightest misalignment of a paragraph means a line on page 27 of 120 now moved down by 2 pixels, screwing everything else out of alignment. Yes, plenty of companies pay Microsoft 365 subscriptions because of exactly this reason; it sounds ludicrous when you think they could just pay someone to replicate the formatting in a different suite a lot less than the subscription costs, but that's not how it works...
Sadly, Microsoft 365 is not “pixel perfect” compared to word. I often run into headaches where line numbers are different between the two and content ends up on different pages.
If Microsoft can’t get consistent rendering of word docs between Word for Windows, Word for macOS and Office 365, I don’t like anyone else’s chances.
> office-open-xml-viewer
Its kind of sad that the first thing in the repo is a mention that no human was involved in the programming.
I'm fine with that, even as someone who hates AI.
Would author be able to do it otherwise? Is particular tool choice making result worse?
Bit identical/pixel-faithful reproductions are easy to verify…
"LLMs are amazing, I'm so much more productive now"
"oh yeah? Show me what you made, you can't, nobody can, it's all just AI psychosis"
"I made a pixel perfect Office document viewer"
"well... I wish you hadn't"
“If you use LLMs, you’re not a real developer, you’re lazy.”
The best developers are lazy.
Obligatory: https://xkcd.com/378/
Would this project exist otherwise? i doubt it
which means it probably gets all the halucinated assets correctly and any real world documents wrong.
Still, looks pretty; if it actually has proper testing, could close the gap. Code not being the hard part is a major impediment to good software coming out of these things.
"Built entirely by Claude through iterative prompting."
Holy cow!!