Rannes, Great to see this demonstration 😅😅 ! By the way, have you tried Grok2?
@rannesmanКүн бұрын
Yes I have, I introduced it in my previous video 😎
@wowojune91842 күн бұрын
建議提問:#<a href="#" class="seekto" data-time="1492">24:52</a>「我是很想問一些問題去測試正它的思維鏈」 a. 向中學畢業生解釋『哥德爾不完備第一定理』Gödel's Incompleteness First Theorem。b. 向大學博士生提出選擇題,以考驗他對『哥德爾不完備第一定理』具備正確理解。
@@wowojune9184 「解得明白」測試嘅會係佢表達能力。我個人認為「邏輯能力」係需要測試佢講得啱唔啱,合唔合邏輯。假設,四個模型都講得你明白(一向表達能力佢哋都強,我冇質疑過),但四個模型個邏輯唔一樣,甚至係結論都唔一樣,如果你本人冇呢方面知識,你點知邊個答得好?就好似電梯個例子,一個話靠牆,令自己唔好咁易飛出去;一個話扒喺地下先冇咁易受傷。我哋有基本嘅common sense 同知道基本physics law,我哋先會知後者make sense d,如果個話題係一個我哋完全冇基本認知嘅嘢,我哋就真係唔會知邊個講得合理
今次的ol preview版本雖然進步好大但是仍然遠遠都達不到本科生水平,不用提博士生水平這次更新主要增強了模型的深度思考能力。我嘅睇法係是ol preview在推理時部署了一個新的系統,結合了搜索和強化學習(RL)。相比於直接通過Greedy algorithm輸出單一答案, ol preview能夠逐步採樣多個答案或中間步驟,並利用RL評價器選擇最優答案和路徑,從而引導模型進行深度思考。但是距離快思考仍然有一段好長距離
@ai76892 күн бұрын
其實用英文問同中文問都會有分別。另外,問問題及比指令的方法、follow up questions 問咩都好重要。個人測試過就一定有grad to mphil students的能力,至於phd就睇邊間u。
@ai76892 күн бұрын
可分享大家問D咩問題,同佢生成咩答案出來。
@aahh-q2r2 күн бұрын
@@ai7689 How many positive integer Coxeter-Conway friezes of type G2are there? 簡單大學數學問題,正確答案是9,ol preview版本出來的答案是5,最重要系呢個題目可以喺網上搵得到答案都係解唔到出嚟,所以我覺得遠遠答不到大學水平,openai所以有點吹得過頭
@ai76892 күн бұрын
@@aahh-q2r (1) Who were the "Parsee merchants" (Parsees)? Briefly discuss the historical background of this group and analyze their role in modern British-Chinese trade. (Mention their presence in opium trade or tea trade during the Qing Dynasty.) Sources: 郭德炎:《清代廣州的巴斯商人》 (Beijing: Zhonghua Book Company, 2005) 《廣州番鬼錄》 (Taipei: Taiwan Ancient Book Publishing Co., 2006) 張曉寧:《天子南庫:清前期廣州制度下的中西貿易》 ==> (2) Please add more details on their roles, support your answer with concrete historical facts, and with minimum 2000 words ==> (3) Parsees often employed local agents and leveraged their understanding of Chinese customs to circumvent restrictions. How? (4) b. Knowing Which Officials Could Be Persuaded: Mapping the Bureaucracy: Compradors identified officials who were open to negotiation or susceptible to bribery. They maintained dossiers on officials’ backgrounds, preferences, and vulnerabilities. who? (5) the name of the officials who were corrupted ;(6) When Qing China banned opium before the opium war, how did it affect the above stakeholders? How did these stakeholders response. Please support your answer with concrete historical facts. (7) fates of Parsee merchants
@ai76892 күн бұрын
@@aahh-q2r 調返轉,佢係基於已有的大數據 (學術書籍, 期刊), 一步一步思考、唔同角度同埋會辨證同自行fact check, 亦會考慮會否太generalize, 有歧視的問題。無資料支持會同你講返聲,唔會亂吹。寫的野好多都合理、有insights,雖然不知是否必然跟住我指定書目去回應。具有chain of thoughts的功能。寫的野好過學士、碩士的人。
@@rannesman 我問 ChatGPT o1: If I use English and Chinese to ask you (ChatGPT o1) question, what are the differences in terms of quality, comprehensiveness, quantity and credibility of the response.
個測試第一步就已經炒曬粉……唔應該用貪食蛇做邏輯測試,因為programming 係連串語言,雖然生成自邏輯,但 AI 基本背書,所以識寫貪食蛇只係代表佢背書好勁而唔代表佢有腦,雖然搵生成式AI做邏輯測試係實failed,但最簡單係問佢涼衫問題,一件衫要涼2小時,10件衫要涼幾耐,答20小時既基本failed﹐另外就係搵啲現成既 card game殘局,寫成描述叫AI搵出最佳策略,一試便知龍與鳯,十個測試十個死 XD 所以生成式 AI 基本就係無腦唔識得思考,用埋chain of though都係一樣。