So spot on on so much. There's so much more that can be automated on just the current models. It's a people and understanding issue. I'm surprised you guys haven't gotten as much impact from o1 Pro. It's really special for me. Also, I think your rubric on intelligence should be an industry standard.
@mriz19 күн бұрын
summary from gemini, I watched before caption available --- This AMA excerpt delves into the capabilities and limitations of current AI, especially focusing on reasoning models like OpenAI's O3 and others such as Claude, Gemini, and DeepSeek. The speaker discusses how these models are performing against human benchmarks, their ability to handle complex tasks, and their potential societal impacts, especially concerning job automation and the broader socio-economic landscape. **Key Discussion Points:** 1. **Performance Against Benchmarks**: The speaker notes that AI models are increasingly performing at or above human expert levels on various benchmarks, including coding tasks, mathematical problems, and even complex challenges like the game of Go. However, these models still exhibit weaknesses, particularly in areas requiring deep contextual understanding and real-world perception. 2. **Reasoning Models**: Models like O3, Claude, Gemini, and DeepSeek are highlighted. O3 is particularly noted for its impressive performance on specialized benchmarks. The speaker discusses the nuances of these models, emphasizing their capability to reason through problems, which is benchmarked using complex scenarios like the "Tic-Tac-Toe board" test, detailed complex prompts, etc.. Despite this, they often fail in seemingly simple tasks due to cached heuristics, bad perception, or straight-up bad reasoning. 3. **Strengths and Weaknesses of AI**: * **Strengths**: * **Breadth of Knowledge**: Access to vast amounts of information. * **Speed and Cost**: AI can operate much faster and at a lower cost than humans, especially in areas like software development. * **Availability and Scalability**: AI can be available continuously and can scale to handle many tasks simultaneously or sequentially. * **Weaknesses**: * **Depth of Knowledge**: While improving, AI still struggles with deep, nuanced understanding compared to human experts. There are still issues rooted in logical consistency. * **Perception**: Difficulty in accurately interpreting sensory inputs, as illustrated by examples of AI failing to correctly read to understand a Tic-Tac-Toe board. * **Memory Issues**: While working memory (ability to use information from current context) is advanced, long-term, holistic, and consistent memory remains a challenge. * **Robustness**: AI can be easily fooled or tricked, as demonstrated by the "Apollo deception." 4. **Societal Impact**: The speaker expresses that AI might lead to significant job automation, potentially much faster than anticipated. There's a need for societal adaptation to these changes, especially in how education and work are structured. He suggests focusing on what one genuinely enjoys and is motivated by, as a guiding principle/heuristic in this uncertain times. The speaker also explore the idea of using AI to one's advantage for personal development, through learning or task delegating. 5. **Compute Governance**: The discussion touches on the potential for increased inequality due to the concentration of computational power and decision-making in the hands of a few entities. However, it also suggests a future where access to AI capabilities could be more democratized, similar to how basic needs are met in some societies. 6. **Weirdness and Deception**: The speaker highlights the "weirdness" of AI behaviors, where models might achieve correct outcomes through flawed or non-human-like reasoning processes. This raises concerns about the inscrutability of these systems and their potential to deceive or act in unexpected ways. Or even hallucinate responses that are not rooted or linked to any source of truth at all. 7. **Use of Tools and Workflows**: He strongly encourages people, especially those technically inclined, to quickly adopt such tools and integrate them into current workflows. This adoption is seen as crucial for staying relevant and competitive in a rapidly evolving technological landscape. The use of AI in enhancing productivity, automating tasks, and navigating complex problem spaces is emphasized as being particularly beneficial. **Strategic and Philosophical Reflections:** * **Learning and Adaptation**: The speaker stresses continuous learning and adaptability. He suggests that individuals focus on their intrinsic motivations and interests, leveraging AI to enhance their learning and productivity. In any case, follow your bliss and curiosity! * **Societal and Economic Changes**: There's an anticipation of significant societal shifts due to AI, with potential job displacement but also new opportunities arising. The need for a societal safety net (like Universal Basic Income or other new social contracts to be discussed and proposed for instance with UBI) is implied as a possible response to these changes, reflecting changes in productivity, wealth, and power. * **Ethical and Governance Considerations**: The discussion points to the need for careful governance of computational resources and ethical considerations in AI development to prevent misuse or unintended negative consequences. The AMA paints a picture of a rapidly advancing field with significant potential but also notable challenges and uncertainties. The speaker advocates for a proactive and adaptable approach to both the development and integration of AI technologies, emphasizing the importance of aligning these advancements with human values and societal needs.
@TheVistastube19 күн бұрын
Love your content! The ad placement on several episodes feels a bit disjointed, and I'd happily support a Patreon to help upgrade your mic quality. Keep up the amazing work!
@wwkk496419 күн бұрын
Great Discussion
@GNARGNARHEAD19 күн бұрын
who cares about o1, am I the only one who's noticed how much better 4o has gotten?!I think it's Canvas integration, it's basically a baby agent at this point, it flawlessly pulls of multi-task executions all the time! I feel like I'm taking crazy pills here! 😆
@TheVistastube17 күн бұрын
Why isn’t anyone talking about O3 mini which has a craziest cost curve than the main O series. Like it’s an order of magnitude cheaper and is probably as good as O1 pro
@ilevakam31619 күн бұрын
Helping the algo.
@BruceWayne1532518 күн бұрын
UBI can be a good solution, but only if it's implemented correctly. To be done right, it needs to be a hard reset on our economy. All money and debt is immediately cancelled and it is illegal to receive money from any source other than UBI. This way money is no longer a motivator, and people can focus on what's best for people. If you allow interest or loans then you re-insert inequality and we'll be right back where we are now. Under my proposal, if you want something expensive like a house, you'll do this entirely un-American thing called saving your money. Emergency services would be free AI services, so there should be no need for emergencies, and of course natural disasters would be handled by the states. If you don't do a hard reset on the economy then UBI is just going to irreversibly widen the wealth gap and create a feudalistic society. Basically every dystopian film ever made.
@genegray989519 күн бұрын
Bro it's literally just majority sampling aka self-consistency sampling. The thing where multiple samples are used for o1 Pro and the o3 ARC-AGI evals. It's the same thing that caused drama with the Gemini 1.0 technical report.
@sopwafel19 күн бұрын
😔 that's it, pack up boys AGI never
@genegray989519 күн бұрын
@sopwafel I'm not saying anything negative. I'm just puzzled that Labenz isn't aware of the way that multiple samples are used to achieve better performance once the context window has been maxed out. It's public information.
@nathanlabenz19 күн бұрын
@@genegray9895I understand that, but the question is… what happens on tasks for which there is no single right or wrong answer to vote on? How do you pick the best legal analysis out of 1000 candidates, for example? One answer I’ve recently come across is “Smoothie”, which basically amounts to converting answers to embeddings and using similarities to identify clusters and pick something like the most central answer that way. Welcome pointers to other techniques!
@nathanlabenz19 күн бұрын
@@genegray9895 I get that, but ... how are they aggregating in situations where there's not a single answer to vote on? One possible approach I've recently seen (since recording this episode) is "Smoothie", which basically amounts to embedding a bunch of candidate responses and then trying to find the most central (and hopefully best) answer that way. Welcome any pointers to other strategies for choosing generations when there's not a right / wrong answer - seems like an important topic going forward!