How Well Does Chat GPT Know Commander Cards?

Рет қаралды 5,918

Күн бұрын

Пікірлер: 63

@admiralatom5990 5 ай бұрын

The biggest take away is that CHATGPT doesn't refer to itself in its answers. Some of the people used "I" when they answered.

@thetrinketmage 5 ай бұрын

Didn’t notice that!

@TeaAddict1 5 ай бұрын

I also saw chatgpt using I.

@devan9197 5 ай бұрын

Honestly for me it was pretty obvious and that gave it away a lot

@natelagrassa9337 5 ай бұрын

Yeah AI don’t refer to themselves very often… in formal writing to drive a point home you don’t use the word “I.” I picked up on that too, lol.

@mr.whistler6114 5 ай бұрын

Remember : ChatGPT will never think outside of the box. ChatGPT is the box. Edit : In the AI response for Forcefield, ChatGPT talks about it not being ''essential for most decks withing its colors''. Forcefield is colorless, so only an AI would think of it the same way as Black/Blue/Green/Red or White because those are the six options plausible for MTG deckbuilding. Furthermore, we can deduce that ChatGPT didn't understood a ''colorless card'' as the idea of a card devoid of colors, but as a sixth color that, in the MTG rules, can be blend in every color types of decks, thus why it speaks of ''its colors'' in plural.

@Jerma985_fan 5 ай бұрын

woah demonic tutor trip me up I'm surprised someone gave that a B.

@thetrinketmage 5 ай бұрын

That threw me for a loop as well!

@Mwarrior1991 5 ай бұрын

without fail, chat gpt would repeat itself "demonic tutor is an incredibly powerful card allowing you to search your library for any card"... "its ability to fetch any card greatly increases consistency..." redundant information each time.

@XiaosChannel 5 ай бұрын

16:09 that's why you either use the API or always restart a new conversation per case

@thetrinketmage 5 ай бұрын

Yea I didn’t know it was gonna do that. Made for a funny bit though

@solarupdraft 5 ай бұрын

The Assassin's Trophy one was interesting, because it makes you consider who would be more likely to make that mistake in their writeup. It's also inconsistent, saying "any nonland permanent" in one line and "any permanent" in a later one. For me the Mechanized Production one came down to "which author is likely to go off on a non-magic tangent?" Also, the final sentence of the right hand text seems to contradict the entire message preceeding it, depending on how you define something "being a riot."

@thetrinketmage 5 ай бұрын

Yea those are the AI hallucinations which causes it to be wrong

@DMZZ_DZDM 5 ай бұрын

ChatGPT will use "creative" language to fill up space and will always expand on surface level issues while only brushing on more nuanced details that affect the broader game. Also, its trained mostly on business emails, pamphlets and guidebooks so it has an inherently sanitized vibe to its responses (unless asked to use a different tone)

@atticussalmon9064 5 ай бұрын

The AI says "decks within it's colors" or some variation of that A LOT, kind of a giveaway

@trevordumais2117 5 ай бұрын

Bros rating Mechanized Production as a D forget that treasures exist. It all goes back to Smothering Tithe.

@thetrinketmage 5 ай бұрын

Smothering tithe really was the hero of this story

@Xhosant 4 ай бұрын

The ikea giveaway was that it started its (surprisingly poetic) metaphor along the lines of 'it's a doomed project', and then twists to 'and when it works it's neat'. That sudden context switch was suspicious. Speaking of context, overall ChatGPT will provide too much of it explicitly, compared to human answers using it implicitly and with less regard about you having it. From needless clarifications to tying back to the assignment's phrasing, that was a pattern for ChatGPT, feeling like a grade-school essay - answering as it expected you wanted it to answer. Contrasting, the humans would often use subtler slang or context cues.

@KirioGameNote 5 ай бұрын

I really need to hear more on the patron’s thoughts on giving demonic tutor a b

@thetrinketmage 5 ай бұрын

I made it anonymous so unless they tell me, I also won’t know more

@DMZZ_DZDM 5 ай бұрын

I would have given it an A, but yeah, it isn't an S imo

@TeaAddict1 5 ай бұрын

I noticed thay chatgpt has a habit of reiterating the prompt. It always talks like its checking off items on a checkbox.

@thetrinketmage 5 ай бұрын

I feel like that’s how a lot of AI work looks.

@cinderheart2720 5 ай бұрын

I swear they didn't used to and now they always do it, in any context. Its very frustrating.

@violetto3219 4 ай бұрын

it's got the vibe of trying to fill space in a high school writing assignment you reeeeally don't want to do

@aleksihakli1125 5 ай бұрын

"ChatGPT doesn't care about budget" You're telling me. I asked some recommendations to my food token life drain deck. It recommended such affordable cards like anointed precession (60€) doubling season (36-40ish €) teferi's protection (48€) parallel lives (33€) exquisite blood (23€) and many, many more cards well over my budget. I think the cheapest card it recommended was beast whisperer that I already HAVE IN MY DECK.

@Ent229 5 ай бұрын

Commenting before watching: I predict the LLM will have good syntax in its responses but will fail some of the semantics. Likewise I expect its fake "reasoning" to be heavily biased towards generalities and other common responses. I predict patrons of MtG to understand the semantics. I also expect those patrons to be capable of novel reasoning, but likely to give general answers. (common responses are common for a reason). As for the ranking, I would expect the LLM would have a higher mode in their answers and the Patrons answers would have a broader spread.

@thetrinketmage 5 ай бұрын

Novel reasoning ends up being a huge giveaway! I think you are spot on

@Ent229 5 ай бұрын

While watching (my guesses of the identities and tracking the rating scores). My guess for the AI in brackets. Actual AI in parentheses. 1 [(A)] or C. Initially guessed based on accuracy. Doubled down based on generic AI answer vs novel Patron answer. 2 [(C)] or C. Again, novel responses help guess the Patron. 3 [(A)] or S. One answer repeated itself in a redundantly redundant explanation. Huh, the AI downgraded it to nonland permanent. Was that due to generalizing answers or due to not understanding the semantics of the card? Both are factors but I wonder which had a bigger cause. 4 D or [(C)]. Initially guessed based on accuracy. Further confirmed by the novel Patron answer (silver bullet draft design). Even further confirmed by the LLM having no context for the lack of horsemanship. 5 [(S)] or B. Initially guess based on accuracy (unless the patron is trolling, or arguing that it is too powerful to fit in many commander decks without moving the deck away from the desired power level). Wow the reasoning is making me reconsider. The S ranking said "any deck within it's (Demonic Tutor's) colors (plural)". Why the implication of plural? There is also more redundancy in the S's reasoning. I am changing my mind. 6 [(A)] or B. The LLM likes listing literally the same logic repeatedly. The Patron response was more novel. 7 D or [(D)]. This one is tough. The left was more novel. Wow. I expected something like 55/45 odds there. Let's Go! 8 [(S)] or B. I initially guessed based on accuracy, but the B has the novel response, so it must be the Patron. LLM wouldn't do that. And once again the LLM uses "decks within it's colors" when talking about a mono white card. Why the plural? Also the card needs to fit within the deck's colors not the deck fit within the card's colors. 9 [(A)] or A. "Decks focused on defending against large attacks"? Also the Patron is once again the more novel answer. 10 S or [(S)]. Redundant LLM response is redundant.

@Ent229 5 ай бұрын

After the 10 scores: Patron scores: SSABBBCCDD (5 different ranks. Somewhat biased towards B but really spread out otherwise) LLM's scores: SSSAAAACCD (4 different ranks. High bias towards S or A) Since my 10/10 accuracy was based on my reasoning of the LLM's limitations, I think it is soft evidence that my predictions about its limitations might be accurate.

@Ent229 5 ай бұрын

Bonus Round? 1. [(B)] or C. The C had a novel response. Final thoughts: We already know ChatGPT does not try to evaluate cards, so it is not suited to evaluating cards. (Don't use a saw for a hammer's job). Beyond its lack of motivation to judge cards, it does not understand the card or their context enough to judge them. Additionally we see it's general answers as a clear marker of the LLM answer. It is trained to give a "reply-like" response that was a likely reply rather than a reply that was likely to be correct. Specificity and nuance are things it is trained to avoid.

@Ent229 5 ай бұрын

Your patron's evaluation seems within the norm for commander players. They can mostly evaluate cards, and there is some subjectivity that make the "surprising" evaluations still have merit.

@drunkcapybara7004 5 ай бұрын

Dang, i actually got the Mechanized Production wrong as well, what threw me off was the mention of wasting 2-3 slots and getting "the combo", since there was no prior mention of what other slots are wasted for what combo, and these inconsistencies are a big problem of AI. Should have focused more on the same problem in the other text, the card being able to be "a riot" contradicting the D rating.

@thetrinketmage 5 ай бұрын

Yea it was such a weird response for that card

@FranciscoJG 5 ай бұрын

Oooohh, surprise Snail participation :D

@SwedeRacerDC 5 ай бұрын

Lord of Extinction: I was right from the grade alone Lightning Bolt: They had the same grade, so I guessed correct based on the description Assassin's Trophy: I wasn't sure on the grade, because I don't use it in 5C decks typically, but the description was obvious to me. Taoist Mystic: Obvious from the grading. Demonic Tutor: I honestly don't love using tutors that much, but I was wrong on this one. I think it's an A, right in the middle. Panharmonicon: I'm correct...Chat GPT is just stupid at this point. Lol Mechanized Production: Same grade, so had to guess based on description. Both descriptions were wild... But I was right. I think it's a C. It's fun and can win on the spot, especially now that we have Obeka, but even with extra turns. Smothering Tithe: I needed the description on this one, but got it right. I still think its a better grade than the human gave it. Ink Shield: I lost to this card. It's great. You will likely win if everyone else has been eliminated. I was right from the description. Tropical Island: The description helped. Right again. Forcefield: I was right and that's an interesting card. Of course it's on the reserved list. Chat GPT is fairly easy to sus out. But it's still interesting to see.

@leax1337 5 ай бұрын

I recently build a Deck with ChatGPT aswell, the cards were so random i had to put it into a power level calculator, because i didn’t understand the deck myself, which put out a 10 for some reason. ChatGPT always tried to put Rhystic Studys in the Deck xD (It was green black)

@thetrinketmage 5 ай бұрын

That’s funny maybe I’ll need to try that too

@nahboh1897 5 ай бұрын

I agree with the demonic tutor Rating , but not its waste of space but because it is a tutor it makes the deck to consistent so the deck does the same thing every time and make it a less fun deck to play against.

@drunkcapybara7004 5 ай бұрын

Valid point, especially in casual settings and for decks with a very clear and not super varied gameplan. For example, my Kathril deck only really wants to fill the graveyard with keywords, and i took Entomb out of it because i would always tutor up Zetalpa which made the deck play very monotonous (amplified by how terrible the precon is at filling its graveyard so i took a lot of mulligans, but Entomb of course was always keepable) and now that i'm replacing a ton of cards soon, i think i might also cut Vile Entomber and Buried Alive, and exclusively rely on what i happen to mill/sacrifice.

@ShotgunLover2367 5 ай бұрын

10:40 interesting option

@v3rsatile_V3 5 ай бұрын

tbh instead of running demonic tutor you should run it until you play it, then whatever you search for just get another version of that effect, if you search for a boardwipe, put another in the deck. simple really

@thetrinketmage 5 ай бұрын

I do like this idea, though the flexibility of a tutor I think makes it worth it!

@Jacob-km4yb 5 ай бұрын

What he said the flexibility to get let's say a board wipe OR a single target removal spell because you have a big board presence makes it way better imo

@v3rsatile_V3 5 ай бұрын

@@Jacob-km4yb . . . I know

@AutumnReel4444 5 ай бұрын

Yeahhh very not hard to guess. AI ain't killin us yet

@Demoncoregobrrr 5 ай бұрын

rad, got recommended your work early

@BS-bv5sh 5 ай бұрын

I enjoy your content.

@thetrinketmage 5 ай бұрын

I’m glad! I know this one is a bit different so I’m happy you like it

@anabsurdlylongnameme8948 5 ай бұрын

What version of chatgpt did yall use? 3.5 is terrible, 4 is great but behind a paywall. If yall used 4, did u put any additional reference info in?

@AlluMan96 2 ай бұрын

God damn, Mechanized Production just gettin' smoked out of nowhere here! It's gimmicky, sure, but as far as artifact payoffs are concerned, you can do way, way worse. Both the AI and the person remark on the resources you're pouring into trying to resolve it, when it really just doesn't ask for much of anything outside of it's 4 mana investment and works regardless of whether or not you're going for the win con or not. As a wincon, Mechanized asks "Can your deck make treasures, clues or foods?" If yes, then congratulations, your deck can run Mechanized Production for basically free. As just a random artifact goodstuff card, it still makes you free copies of good cards. Put it on a Solemn Simulacrum and you're getting ramp every turn and can churn them into card draw and that's low-balling it. Imagine every turn Sol Ring or Jeweled Lotus. I don't mean to overhype Mechanized Production or shit on the guy that gave the evaluation. I'm just here to defend my pet card. It's fitting, that it was listed just after Panharnomicon, because I think both kinda occupy a similar space of "4 mana do-nothing that usually gets blasted before it gets to do anything." and as kindred spirits, would've both gotten a C from me as fine cards that are probably not gonna get to go off, but are worth it in casuals for the off-chance they do get off the ground.

@JustinNovack 3 ай бұрын

If both answers were then (re-)summarized by ChatGPT, it may have removed the obvious bias that is inherent in the verbiage and language used of ChatGPT. Clear prompt reiteration from ChatGPT and "I built a deck..." phrasing from the humans made this not much of a game.

@epi1763 5 ай бұрын

Next time ask chat gpt to write like a normal personnor dumb it down and feed it other peoples reviews so ot wrotes on a similiar context

@CD-sl7ld 5 ай бұрын

I love you

@kormit-b9x 5 ай бұрын

Personally, I think Demonic Tutor is a B or even C in most casual metas. If I were to pull out a Demonic Tutor, I'd probably get focused on because my playgroup doesn't run $50 cards unless we're proxying high power or cEDH. In a lot of games, Demonic Tutor is just too focused/good to be worth slotting in, since it gets people to target you.

@thetrinketmage 5 ай бұрын

Interesting I often think of it as a charm effect. I don’t know if I’ve ever been explicitly targeted because of it

@robertomacetti7069 5 ай бұрын

to be fair to chat gpt it never played commander, freacking out over lord of extinction is a classic noob mistake

@hoffedemann5370 5 ай бұрын

"highly desirable" "in its colors" "extremely valuable" "versatility" "particularly those in XYZ strategies" are dead giveaways. Also Ai do be yappin' with way too eloquent words all the time