If you haven't tried AI modeling pipelines in the last year you'll be surprised.
The star of the show here is https://platform.worldlabs.ai/ (author works there, I don't) which is really good. There's also Meshy.ai (which this repo doesn't seem to use?) for non-scene stuff that's right up there in quality.
The latest VLLM models have true pixel image grounding which means you can totally ask your AI about pixel coordinates of things, so your AI can have true (virtual) 3d perception for edits.
I'm actually surprised I don't see this stuff being used more; I think it's because most pipelines are hard-baked with assumption that your 3D assets are files you get from an artist, not something you can imagine up in minutes in a script. The technology is moving faster than the industry can keep up with.
I remember like seventeen years years ago, Microsoft had "PhotoSynth", which would make 3D environments based on a bunch of images, and seventeen-year-old-tombert thought it was one of the most amazing things to ever be done on a computer.
Doing this with just one image makes this at least an order of magnitude cooler. I will be playing with this over the weekend.
If you haven't tried AI modeling pipelines in the last year you'll be surprised.
The star of the show here is https://platform.worldlabs.ai/ (author works there, I don't) which is really good. There's also Meshy.ai (which this repo doesn't seem to use?) for non-scene stuff that's right up there in quality.
The latest VLLM models have true pixel image grounding which means you can totally ask your AI about pixel coordinates of things, so your AI can have true (virtual) 3d perception for edits.
I'm actually surprised I don't see this stuff being used more; I think it's because most pipelines are hard-baked with assumption that your 3D assets are files you get from an artist, not something you can imagine up in minutes in a script. The technology is moving faster than the industry can keep up with.
This is cool as hell.
I remember like seventeen years years ago, Microsoft had "PhotoSynth", which would make 3D environments based on a bunch of images, and seventeen-year-old-tombert thought it was one of the most amazing things to ever be done on a computer.
Doing this with just one image makes this at least an order of magnitude cooler. I will be playing with this over the weekend.
So Blade Runner's Esper photo analysis went from ruining the suspension of disbelief to reality quicker then most magic.