[CVPR'23 WAD] Keynote - Ashok Elluswamy, Tesla

Рет қаралды 37,066

WAD at CVPR

Күн бұрын

Пікірлер: 31

@berlusmafia Жыл бұрын

Thanks to people like you and and the engineers who works at Tesla, there hope for humanity for a decent future

@pranjal86able Жыл бұрын

Here are the key points from the video titled "[CVPR'23 WAD] Keynote - Ashok Elluswamy, Tesla": - The speaker, Ashok Elluswamy, is a member of the autopilot team at Tesla. - He presents their work on what they believe will be the foundation model for autonomy and robotics. - Tesla has shipped the full self-driving beta software to all purchasers in the United States and Canada, with roughly 400,000 vehicles having driven up to 250 million miles on the full self-driving beta program. - The self-driving stack is scalable and can navigate to any destination within the US, handling intersections, stopping at traffic lights, and interacting with other objects. - The system is driven primarily by eight cameras on the car that provide a full 360-degree coverage. - The self-driving stack is based on modern machine learning, with many components folded into neural networks. This is different from the traditional approach to self-driving, which uses localization maps and various sensors. - The system works primarily with cameras, and it performs quite well. - The speaker discusses the importance of occupancy networks in their stack, which predict whether a voxel in 3D space is occupied or not. This model task is general and robust to ontology errors. - The occupancy networks also predict the flow of voxels in the future, providing arbitrary motion. Everything runs in real time. - The architecture of the system may look complicated, but it's quite straightforward. Videos from multiple cameras stream in, and a large Transformer block builds up features and does temporal attention with some geometry thrown in. - The same architecture can be used for other tasks needed for driving, such as predicting lanes and roads. - Lanes are crucial for driving tasks but are challenging to predict due to their high-dimensional nature, graph structure, and large uncertainty. They can span the entire road, fork, merge, and sometimes even humans cannot agree on their structure. - The team uses state-of-the-art generative modeling techniques, such as autoregressive transformers, to predict lanes. This approach is similar to GPT and predicts lanes one token at a time, considering the full graph structure. - Moving objects like vehicles, trucks, and pedestrians need to be detected with their full kinematic state. The models used are multi-modal, taking in not just camera video streams but also other inputs like the vehicle's own kinematics and navigation instructions. - The entire motion planning can also be done using a network, making the system a modern machine learning stack where everything is done end-to-end. - The success of this system is attributed to the sophisticated auto-labeling pipeline that provides data from the entire fleet. This allows for multi-trip reconstruction, where multiple Tesla vehicles driving through the same location provide their video clips and kinematic data to construct the entire 3D scene. - The team uses multi-trip reconstruction to gather data from the entire fleet, enabling them to reconstruct lanes, road lines, and other elements from anywhere on Earth. - They use a hybrid approach to Neural Radiance Fields (NeRF) and general 3D reconstruction, which results in accurate and clear reconstructions of the scene, including vehicles, barriers, and trucks. - Additional neural networks are run offline to produce labels for lanes, roads, and traffic lights, creating a vector representation that can be used as labels for the online stack. - The system can auto-label traffic lights, predicting their shape, color, and relevancy, and these predictions are multi-view consistent. - These predictions provide a superhuman understanding of the world from cameras, creating a foundation model that can be used in various places. - The system helps with both autonomous and manual driving, providing emergency braking for crossing vehicles. This is a new feature, as crossing objects are harder to predict than vehicles in your own lane. - The team is working on learning a more general world model that can represent arbitrary things, using recent advances in generative models like Transformers and diffusion. - The neural network can predict future video sequences given past videos. It predicts for all eight cameras around the car jointly, understanding depth and motion on its own without any 3D priors. - The model can be action-conditioned. For example, given the same past context, when asked for different futures (like keep driving straight or change lanes), the model can produce different outcomes. - This creates a neural network simulator that can simulate different futures based on different actions, representing things that are hard to describe in an explicit system. - Future prediction tasks can also be done in semantic segmentation or reprojected to 3D spaces, predicting future 3D scenes based on the past and action prompting. - The team is working on solving various nuances of driving to build a general driving stack that can drive anywhere in the world and be human-like, fast, efficient, and safe. - Training these models requires a lot of compute power. Tesla is aiming to become a world leader in compute with their custom-built training hardware, Dojo, which is starting production soon. - The models are not just being built for the car but also for the robot, with several networks shared between the car and the robot. - The foundational models for vision that the team is building are designed to understand everything and generalize across cars and robots. They can be trained on diverse data from the fleet and require a lot of compute power. - The team is excited about the progress they expect to make in the next 12 to 18 months. - In the Q&A session, the speaker explains that they can track moving objects in the 3D reconstruction with their hybrid NeRF approach, using various cues and signals in the data. - The world model for future prediction tasks is a work in progress, but it's starting to work now, providing a simulator where they can roll out different outcomes and learn representations. - The use of autoregressive models for predicting lanes is due to the graph structure of lanes and the need to model a distribution in high-dimensional space. This approach provides clear, non-blurry predictions that are useful downstream. - The voxel size in the occupancy network output is a trade-off between memory and compute and can be configured based on the needs of the application. - The same principles of the world model should apply to humanoid robots. The model should be able to imagine what actions like picking up a cup or walking to a door would look like. - The occupancy network is used for collision avoidance in the full self-driving (FSD) system. It's particularly useful for dealing with unusual vehicles or objects that are hard to model using other methods. - The general world model is still being optimized and hasn't been shipped to customers yet. It might be ready later in the year. - The system doesn't use high-definition maps, so alignment isn't super critical. The maps used are low-definition, providing enough information to guide the network on which roads and lanes to take. This concludes the summary of the video "[CVPR'23 WAD] Keynote - Ashok Elluswamy, Tesla". The speaker, Ashok Elluswamy, discusses the development of Tesla's self-driving technology, focusing on the use of machine learning and neural networks. He also answers questions about the technical details of the system.

@pascalg.8772 Жыл бұрын

Thanks for your work Ashok is actually the head of Tesla autopilot team. He took over when Andrej Karpathy left

@jacolantern1 Жыл бұрын

If you’re going to use GPT to summarize the transcript, then at least say that’s what you’re doing. Neglecting to mention that is dishonest and misleading

@carvalhoribeiro Ай бұрын

Great presentation. Thanks for sharing this

@seojimjames Жыл бұрын

Brilliant all the Way, thanks for the great informative video, Tesla Investors appreciate knowing just how great your work is. Also, consider how many drivers are visually challenged and still driving, or impaired drivers, or distracted drivers, the FSB is consistent and a much better option than borderline bad drivers.

@jaysrinivasan8205 10 ай бұрын

Ashok is amazing

@MrMolledm Жыл бұрын

Really enjoyed Can’t wait for autonomous mass transit buses

@jascfdrac Жыл бұрын

Nice work!

@ThomasButryn Жыл бұрын

I think the most interesting thing is that Tesla's approach is really based on experimenting with ideas

@TheFutureThoughtExchange Жыл бұрын

Keynote speech presented by Ashok Elluswamy at CVPR'23 WAD (Workshop on Autonomous Driving), where he discusses Tesla's self-driving technology, including the advancements in machine learning techniques, robotics, real-world implementations, and the underlying concepts of Tesla's technology. Let's break down the themes and deeper insights, including the esoteric aspects of this conversation. ### 1. **The Foundation of Autonomy and Robotics**: - **Machine Learning at the Core**: Elluswamy emphasizes the use of machine learning in building the foundation for autonomy in vehicles. The explanation of neural networks, occupancy, voxel prediction, transformer models, and 3D scene construction reflect the cutting-edge practices of AI and machine learning. - **Generalization and Robustness**: He emphasizes the stack's robustness to errors and its ability to adapt to different situations and environments. This ties to the broader theme of machines being adaptable, like humans, and being able to operate in the real world. - **Integration with Robotics**: The technology is not limited to cars but is extendable to other robotic platforms, highlighting the unifying concepts within robotics and machine autonomy. ### 2. **Full Self-Driving (FSD) Technology**: - **Real-World Implementation**: Elluswamy discusses the actual deployment of Tesla's FSD system in the U.S. and Canada. He doesn't merely focus on theoretical concepts; instead, he gives insights into the real-world challenges and accomplishments. - **Camera and Sensing Technology**: The 360-degree coverage and the way cameras are used to interpret the world resonate with how human senses work. It reflects an attempt to create machines that perceive the world much like humans do. - **Lane and Object Prediction**: The real-time prediction and analysis of lanes, objects, and traffic signals is a sophisticated task that mimics human cognition. The comparison of modeling lanes to modeling language (mentioning GPT) connects two complex realms of machine learning - natural language processing and computer vision. ### 3. **Simulation and Prediction of Different Futures**: - **Dynamic World Modeling**: There's a profound concept of creating a "neural network simulator" that can simulate different futures based on different actions. It implies a move from static to dynamic models, mirroring human ability to predict potential outcomes. - **Generative Models and Action-Conditioning**: The ability to generate and condition models on the past to predict the future aligns with human imagination and intentionality, extending AI into realms previously exclusive to human consciousness. ### 4. **Hybrid Approaches and Configurability**: - **Hybrid Neural Representational Field (Nerf) Approach**: The discussion about 3D reconstruction with a hybrid Nerf approach represents the synthesis of different AI techniques to create something novel and effective. - **Customizable System**: The configurable nature of the models, allowing for different applications and needs, reflects a move towards personalized and adaptable technology. ### 5. **Tesla as a Leader in Compute**: - **Dojo and Scalability**: Mention of Dojo, Tesla's training hardware, and the company's aspiration to become a world leader in compute emphasizes the role of technology not only in driving but in broader societal change. ### 6. **Questions and Further Insights**: - **Engaging with Complexity**: The subsequent questions and responses delve into complexities like auto-regressive models, voxel sizes, occupancy, inference times, and map components. These details allow a nuanced understanding of the technology. ### Conclusion This transcript is more than just a description of Tesla's autonomous driving technology. It's a glimpse into the current and future state of AI and machine learning, offering insights into how these technologies are shaping our interaction with the world. It reflects the ongoing journey to bridge the gap between machines and human-like understanding, adaptability, and intuition. Moreover, the real-world application of these complex technologies represents a crucial step in the transition from theoretical research to tangible, everyday experiences, contributing to the evolution of our transportation systems and potentially our broader interaction with machines.

@lala-ru1jj 2 ай бұрын

Most questions are from BYD, Xpeng employees, I guess 😬

@karunald Жыл бұрын

I will never understand why Tesla continues to put intersection cameras 9 feet behind the nose of the car and behind our backs. When there's no room to creep to see - it doesn't work! Such an obvious massive flaw. Maybe if I lived in CA with bike lanes & road buffers it would work. FIX IT

@joeysipos Жыл бұрын

12:05 bro that was totally the Tesla’s fault. It’s the one that blew through the stop sign…

@galileo3431 Жыл бұрын

That's exactly what Ashok also says. In this case, FSD wasn't enabled, but the human was driving and made the error. The visual AEB system was anyways able to perform emergency breaking.

@SkradaczTENZNANY Жыл бұрын

yes, that's what he said. "the ego driver" means the system controlling the Tesla

@joeysipos Жыл бұрын

@@galileo3431 Ah ok, yeah now that I rewatched it. I thought he said the red car blew through the stop sign...

@meamzcs Жыл бұрын

@@SkradaczTENZNANYLOL... The title of the slide literally says MANUAL DRIVING...

@gregchristie2763 Жыл бұрын

Imagine this applied to military robots with guns or worse.. so Elon now has both vehicles robots and humanoid robots already where if this AI is applied to them can learn by itself with no restraints whatsoever...it is really is quite scary this needs stopping now.. even Elon himself has said it needs legislating.. and this is just the beginning.. people joke about skynet and the terminator but this is a real threat..Tesla also has the coms via Starlink .. very scary.😢😢 Elon himself in the Tucker Carlson interview available on KZbin .... The dangers of hyper intelligent AI ... said this sort of AI unchecked could result in the annihilation of the human race ..or "civilisational destruction" as he called it .. just thinks if a mad man like Putin got hold of this tech..very very scary.

@meamzcs Жыл бұрын

Lol... Elon has access to literal ICBMs...

@GloriaKerluke Жыл бұрын

P r o m o s m 😣

@falconxlc Жыл бұрын

12:10 the tesla missed the stop sign but ashok says the red car blew a stop sign??

@f2yd Жыл бұрын

"In this case, on the left side the *ego* driver for some reason blew past the stop sign" The ego driver the one driving the Tesla

@falconxlc Жыл бұрын

@@f2ydi stand corrected, he did say ego.

@FinanceNinja Жыл бұрын

@@f2yd It was a human driving the Tesla and blew the stop sign, not FSD. He was explaining how FSD saw the path of the perpindicular car and interjected to stop the Tesla from hitting it.

@f2yd Жыл бұрын

@@FinanceNinja I agree, that's what I was saying too. Ego driver = the human driving the car from which we see the video

@nioncao Жыл бұрын

Too little progress compared to Ai day

@Jsmith32t Жыл бұрын

It’s pretty great progress actually. With the rate of change in the ML world you have to re-evaluate your approach every 3-6 months now. What they showed at AI day is what they are shipping now, but they have already hit a wall with the rare corner cases. The World model will take advantage of their auto label system and will provide clean data for the new approach. Probably in 2 years from now the World model will be at a stable release and metrics for regulators will start accumulating towards proper level 5

@SyntheticSpy Жыл бұрын

@@Jsmith32tif their amount of compute scales like they are planning, it will likely be sooner than 2 years