MeshFormer vs MeshAnything

  Рет қаралды 1,893

hu-po

hu-po

Күн бұрын

Пікірлер: 8
@wolpumba4099
@wolpumba4099 3 ай бұрын
*Summary* * *Three papers discussed:* * *[**03:36**]* *MeshFormer:* Generates high-quality 3D textured meshes from multiple images of an object. It uses normal maps (images showing surface direction) derived from diffusion models to achieve better detail. Requires knowing camera positions. * *[**04:15**]* *MeshAnything:* Generates artist-quality 3D meshes using a language model (LLM) and a specialized vocabulary of "shape tokens" learned through a technique called Vector Quantized Variational Autoencoder (VQ-VAE). Focuses on good topology, which is the arrangement of vertices for smoother animation and texturing. * *[**04:50**]* *JPEG-LM:* Generates images and videos by directly outputting the compressed code (like JPEG or H.264) using a language model. Shows a novel way of thinking about image generation. * *Key Concepts Explained:* * *[**07:33**]* *Normal Map:* An image where each pixel represents the direction of a surface, helping with realistic lighting calculations. * *[**23:56**]* *Signed Distance Function (SDF):* A function that describes a surface by calculating the distance from any point to that surface. * *[**25:21**]* *Marching Cubes Algorithm:* A method used to create a mesh from an SDF. * *[**37:37**]* *Topology (Mesh):* The arrangement of vertices and faces in a mesh, crucial for quality and animation. * *[**44:38**]* *VQ-VAE (Vector Quantized Variational Autoencoder):* A method for learning a compressed "vocabulary" of elements, used in both MeshAnything and JPEG-LM in different ways. * *[**1:00:46**]* *Canonical Codec Representation:* The standard compressed form of a file, such as JPEG for images, used directly in JPEG-LM. * *Connecting the Papers:* * *[**05:17**]* *All three explore generative AI for visual content (images, videos, 3D models).* * *[**05:23**]* *MeshFormer and MeshAnything focus on generating better 3D meshes.* * *[**05:36**]* *JPEG-LM inspires a new way of thinking about generating ANY data by directly outputting its compressed representation.* * *Author's Opinions:* * *[**1:17:20**]* *MeshFormer:* While impressive, reliance on knowing camera positions limits real-world application. * *[**1:17:31**]* *MeshAnything:* Particularly innovative due to its focus on mesh topology and the clever use of VQ-VAE and LLMs. Could revitalize the use of meshes in 3D pipelines. * *[**1:17:45**]* *JPEG-LM:* A ground-breaking proof-of-concept with huge potential, though current results are limited by the small model size and dataset. * *Future Implications:* * *[**1:12:49**]* *The presenter speculates on using LLMs to directly output standard 3D file formats like STL, much like JPEG-LM does for JPEGs.* * *[**1:16:56**]* *Questions the dominance of implicit 3D representations like NeRFs in favor of improved mesh generation techniques.* Summarized by AI model: gemini-1.5-pro-exp-0801 Cost (if I didn't use the free tier): $0.1831 Input tokens: 48033 Output tokens: 1426
@andrzejreinke
@andrzejreinke 2 ай бұрын
your videos are amazing, thanks for doing that
@omidbonakdar4838
@omidbonakdar4838 Ай бұрын
thank you❤, your awesome videos help me a lot
@johanneszellinger232
@johanneszellinger232 3 ай бұрын
Have to disagree with the statement at 28:10. Sensor fusion from smartphones is already more than good enough to get the same data from real world imagery. I have used googles ARCore in my master thesis to gather RGBD images with a corresponding camera extrinsic matrix as input for my models and was surprised how precise the gathered data is.
@francescofisica4691
@francescofisica4691 2 ай бұрын
Where I can find the code for MeshFormer? I'm asking for an university project
@user-wr4yl7tx3w
@user-wr4yl7tx3w 3 ай бұрын
Is it possible to save the video not as a stream.
@wolpumba4099
@wolpumba4099 3 ай бұрын
Summary starts at 1:23:11
AI Scientist
1:54:44
hu-po
Рет қаралды 3,3 М.
Generative Molecular Dynamics
1:37:15
hu-po
Рет қаралды 1,3 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН
Tuna 🍣 ​⁠@patrickzeinali ​⁠@ChefRush
00:48
albert_cancook
Рет қаралды 131 МЛН
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН
Strawberry
1:55:38
hu-po
Рет қаралды 7 М.
Speedrunning 30yrs of lithography technology
46:07
Breaking Taps
Рет қаралды 949 М.
The Surgery That Proved There Is No Free Will
29:43
Joe Scott
Рет қаралды 2,4 МЛН
I will PAY you not to buy this - ROG Matrix 4090 vs Custom Watercooling
22:52
A Hackers' Guide to Language Models
1:31:13
Jeremy Howard
Рет қаралды 537 М.
Diffusion Game Engine
1:35:22
hu-po
Рет қаралды 1,7 М.
The Big Misconception About Electricity
14:48
Veritasium
Рет қаралды 24 МЛН
ROCKET that LITERALLY BURNS WATER as FUEL
19:00
Integza
Рет қаралды 3,5 МЛН
Gaussian Robots
2:01:43
hu-po
Рет қаралды 2,6 М.
UFC 310 : Рахмонов VS Мачадо Гэрри
05:00
Setanta Sports UFC
Рет қаралды 1,2 МЛН