Nicer Trees Spend Fewer Bytes: compressing 12947 Wordle words into 12155 bytes

  Рет қаралды 86,488

David Renshaw

David Renshaw

Күн бұрын

What's the smallest javascript program you can write whose output is the Wordle wordlist? A lively "code golf" competition to answer that question is currently underway at the website golf.horse/. This video describes how one particular entry achieved an impressive amount of compression by using binary trees to divide the space of possible words.
Made in collaboration with ‪@jedgrabman‬ .
Submitted to Summer of Mathematical Exposition 2023: some.3b1b.co/
#SoME3
Interactive visualization: dwrensha.github.io/nicer-trees/
My code: github.com/dwrensha/nicer-trees
Jed's code: github.com/JedGrabman/WordleG...
Our current best solution.js: github.com/dwrensha/golf-hors...
00:00 - intro
01:04 - encoder/decoder
02:44 - Noiseless Coding Theorem
03:32 - a simpler two-letter version
04:32 - the binary entropy function is concave
05:35 - binary tree representation
06:39 - visualizing simple greedy search
07:39 - a more optimized tree
08:15 - some interesting clusters
09:02 - call to action

Пікірлер: 135
@MT-nu8ox
@MT-nu8ox 11 ай бұрын
Very interesting! Only downside is that I am not smart enough to understand 90% of this haha
@Fritz0id
@Fritz0id 11 ай бұрын
yet so interesting!
@leosmi1
@leosmi1 11 ай бұрын
you need to understand data structure first
@MT-nu8ox
@MT-nu8ox 11 ай бұрын
@@leosmi1 Thanks, I'll try to learn that
@parlor3115
@parlor3115 11 ай бұрын
@@leosmi1 Half of the video is information theory. He's referenced entropy FFS.
@leosmi1
@leosmi1 11 ай бұрын
thank you @@parlor3115
@brady6968
@brady6968 11 ай бұрын
This is surprisingly relevant for the Sebastian Lague chess competition where I've been trying to compress the weights of a neural network instead of wordle words.
@pinruihuang8463
@pinruihuang8463 11 ай бұрын
You're writing an NNUE in the competition? God help us all...
@gwentarinokripperinolkjdsf683
@gwentarinokripperinolkjdsf683 11 ай бұрын
@@pinruihuang8463 NNs compete separately no?
@pinruihuang8463
@pinruihuang8463 11 ай бұрын
@@gwentarinokripperinolkjdsf683 tbh i have no idea man
@macchiato_1881
@macchiato_1881 11 ай бұрын
​@@pinruihuang8463NNs don't perform all that well in chess unless it has a shit ton of parameters. More traditional engines will outperform low-parameter models by a huge margin.
@pinruihuang8463
@pinruihuang8463 11 ай бұрын
@@macchiato_1881 I was expecting an NNUE on top of a traditional engine, which would most likely require this kind of compression.
@jedgrabman
@jedgrabman 11 ай бұрын
Great working with you David! This turned out to be a surprisingly deep problem combining math, stats, CS and linguistics!
@proterotype
@proterotype 11 ай бұрын
I always love stumbling across a video where you know the channel is going to grow a ton, and you found it early on.
@GigsVT
@GigsVT 11 ай бұрын
I have to echo the other comments. I think something like this will be useful to me in an entirely different problem space. A truly useful idea that I'm surprised i never heard about before quite like this. It's like huffman rainbow tables.
@nif4345
@nif4345 11 ай бұрын
I feel like it could be more efficient to sometimes hardcode words in very sparse areas to lower the cost further
@dwrensha
@dwrensha 11 ай бұрын
Quite possibly!
@unflexian
@unflexian 11 ай бұрын
username w
@SpencerTwiddy
@SpencerTwiddy 10 ай бұрын
@@unflexianyo I can’t believe 3 seximal enthusiasts stumbled into the same comment section
@SpencerTwiddy
@SpencerTwiddy 10 ай бұрын
I was just musing that in a twelve team fantasy football league, you want to have RBs and WRs in the top nif for fantasy points each week
@unflexian
@unflexian 10 ай бұрын
@@SpencerTwiddy im not american enough to understand that but it sounds really reasonable:) also seximal rocks
@ArtificialDjDAGX
@ArtificialDjDAGX 11 ай бұрын
hmm, could try using the rubik's cube short-path finding approach, where you start both top-down and bottom-up and try to find a convergence point where both methods meet, vastly reducing the scale of the problem.
@undefined06855
@undefined06855 11 ай бұрын
Is the word on the thumbnail "First"?
@dwrensha
@dwrensha 11 ай бұрын
Looks quite right. Bravo. Great solve.
@alexismandelias
@alexismandelias 11 ай бұрын
This is beautifully obscure, unique, intriguing and satisfying at the same time
@taranknutson175
@taranknutson175 11 ай бұрын
Awesome job!! I love how simple problems can have such deep answers in CS/cryptography
@JediJess1
@JediJess1 11 ай бұрын
Fascinating video. I love seeing mathematics becoming a part of the discussion with programming and Computer Science
@commandblockguy
@commandblockguy 11 ай бұрын
Sweet video! I first tried this problem back in like February 2022 when I was writing a Wordle clone for a graphing calculator and needed to compress the word list to get it to fit in memory. The best I could do was around 14 KB for just the word list (never got around to writing a golfed decoder for that compression format, since handling bit streams is a pain in the assembly language the calculator uses). I effectively just applied a Huffman coding to convert each word into a bitstring, sorted the list of bitstrings, then encoded only the number of bits that changed and the bits after the first bit that changed between each pair of words. I was excited to see this video in my recommendations - I'm glad that there's still progress being made on it, and the algorithm is very impressive.
@unflexian
@unflexian 11 ай бұрын
wait what I wrote Wordle for desmos too didn't need any compression, one base-26 number per word fit just fine
@commandblockguy
@commandblockguy 11 ай бұрын
@@unflexian This was for a physical graphing calculator (the TI-84 Plus CE), not Desmos. Those are more restrictive (150 KB total RAM + 4 MB flash, shared with OS, with 64 KB per program max)
@unflexian
@unflexian 11 ай бұрын
@@commandblockguy that's great damn
@campbellhutcheson5162
@campbellhutcheson5162 11 ай бұрын
Incredible video!
@Marcox385
@Marcox385 11 ай бұрын
Lately some real underrated gems such as this has been emerging in youtube. Truly a blessed time. I had never seen any of your videos but this is amazing David. Thank you for sharing.
@TheDestroyer55e
@TheDestroyer55e 11 ай бұрын
i think it has something to do with #SoME3
@EchoErik
@EchoErik 11 ай бұрын
Great video, look forward to seeing more videos from you!
@khoda81
@khoda81 11 ай бұрын
Oh my god, this was such a fascinating video, really hope you make more of these :)
@dwrensha
@dwrensha 11 ай бұрын
Thank you for the words of encouragement!
@iogamesplayer
@iogamesplayer 11 ай бұрын
This is criminally underrated!
@kitizz548
@kitizz548 11 ай бұрын
Nice! I hadn’t considered the intersection between codegolf and maths before this 😄
@michaelbauer8766
@michaelbauer8766 11 ай бұрын
Amazing to see KZbin suggesting this to me! Excellent, excellent video. Hope the view count keeps growing and growing!
@claytonharting9899
@claytonharting9899 11 ай бұрын
I was looking for an example to show how math and computer science can be applied in an interesting way to a magic system (on the author’s request). This is a perfect video to share. It’s short, well explained, and very interesting
@kgleba4442
@kgleba4442 11 ай бұрын
Great job! Definitely worth watching and digging deeper Probably great choice of topic for the next video would be explaining the algorithm behind the visualization (clusterization particularly) from the computer science perspective
@yaybaby7984
@yaybaby7984 11 ай бұрын
Cool video!
@smileyp4535
@smileyp4535 11 ай бұрын
I have no idea what like any of this is but it has very pretty pictures and your voice and enthusiasm is very nice ☺️
@blank4305
@blank4305 11 ай бұрын
Thanks for the vid, it was super interesting!
@kalei49
@kalei49 8 ай бұрын
i have never watched this video, but from the thumbnail in my recommended what i took away was that 'nicer' is a good word to open with in wordle. turns out that is not at all what the video is about but /is/ what i've been playing with for weeks now and having pretty decent success, so i just figured that the video was right. turns out i literally just made up that fact myself.
@dwrensha
@dwrensha 8 ай бұрын
I don't think "nicer" is a particularly bad word to start with! But yeah that's not what this video is about at all. I hope you found it interesting, even if it was not what you expected!
@Belissimo-T
@Belissimo-T 11 ай бұрын
The video was so good, I wish it would have been longer.
@Tobiky
@Tobiky 11 ай бұрын
Very cool video, thank you
@Asterism_Desmos
@Asterism_Desmos 11 ай бұрын
This is so cool!
@ckq
@ckq 11 ай бұрын
5:00 I've found this result in the context of win probability models. The entropy is initially maximal then after time passes, the expected entropy decreases until it finally reaches 0 when the game is over. The change in entropy between two points in time corresponds to the information/importance.
@pseudolemon8272
@pseudolemon8272 11 ай бұрын
criminally underrated channel, glad to be in the first thousand
@Vfulncchl
@Vfulncchl 11 ай бұрын
Honest to god, one of the very best #SoME3 entries
@willowytale
@willowytale 11 ай бұрын
Fantasic video. Completely nerdsniped me. I know what I’m doing this weekend
@dwrensha
@dwrensha 11 ай бұрын
Eagerly awaiting your submissions on the leaderboard. :)
@sheepcommander_
@sheepcommander_ 11 ай бұрын
I love this!!
@Dorumin
@Dorumin 11 ай бұрын
Very fascinating. But it would've been nice to see more on the decoder front code too, ha
@sdjhgfkshfswdfhskljh3360
@sdjhgfkshfswdfhskljh3360 11 ай бұрын
Wow, did not expected to find such video. I think it's a great idea to learn about programming in general and compression in particular by trying to make more and more optimized solutions. I created similar challenge with the goal of compressing specific image. I learned a lot while making solutions to it, but sadly, failed to find anyone else who want to participate in it.
@claudiacarrasco908
@claudiacarrasco908 11 ай бұрын
Fun fact: The wordle answer in the video's thumbnail is "first"
@Finnnicus
@Finnnicus 11 ай бұрын
Great channel and presentation. Tom7 vibes to me
@bihazards
@bihazards 11 ай бұрын
Bonus points for the wordle word in the thumbnail
@DemonixTB
@DemonixTB 11 ай бұрын
interesting, I am currently solving a similar problem, regarding a specific 1bit/pixel video (yes, you guessed it, it's Bad apple) that I am trying to fit into the lowest space possible (at a significantly reduced resolution and fps, all to fit the code and data under 60kb, so far i have not found a technique that could do it, but this seems like a promising avenue to explore, though I doubt it will help since this "only" cuts about 90% where as I need to compress ~10M bytes, = 99.3% savings
@ahumanintheory4114
@ahumanintheory4114 11 ай бұрын
Have you considered only encoding the changed pixels? A significant portion of bad apple is either a black screen that stays black or a white screen that stays white. The GIF specification includes an optional system like this that allows for updating any sub-rectangle of the larger image, which with some optimization could potentially yield high compression ratio given the nature of bad apple.
@DemonixTB
@DemonixTB 11 ай бұрын
​ @ahumanintheory4114 of course I have, gifs are notoriously large and inefficient, it only got to around 2MB, far far too much. I've also tried RLE, I've tried bzip2, several voxel compression algos, contour mapping, cubemarching, k-trees, octrees, huffmann coding, I now have several files that technically fit, though at reduced visual fidelity, or reduced resolution and framerate, with, surprisingly the now old-ish h264 codec at rfc 48 being the best quality result I've found so far (still only 40x30 resolution at 8fps) although h265 got a far far better quality result (my target 64*48 at 15fps), but at 90kb, sadly I also need a method that has a small and fast enough decoder. to fit within the previously mentioned 64K so h265 can't be used. currently I think my best bet is a 3D representation of the contour edges with some good enough 3D object simplification , together with some custom compression algorithm, since most stock 3D asset compression focuses on things that are not the verticies themselves.
@FinetalPies
@FinetalPies 11 ай бұрын
I'm trying to figure out why you would need a 60kb Bad Apple and I have no idea so I'll just assume you're trying to make Bad Apple run on Doom
@DemonixTB
@DemonixTB 11 ай бұрын
@@FinetalPies close enough look up "poom"
@disketa25
@disketa25 11 ай бұрын
@@FinetalPies Given 60k limit, I expect the port for some 8bit/64K machine.
@boysosleepy
@boysosleepy 11 ай бұрын
Go Jed!
@Nylspider
@Nylspider 6 ай бұрын
Took me a surprisingly long time to figure out that the thumbnail puzzle’s solution was “First”
@Stonehawk
@Stonehawk 11 ай бұрын
Was "FIRST' the solution to the thumbnail?
@ckq
@ckq 11 ай бұрын
9:15, yeah generally speaking bottom up is optimal for minimizing the number of bits used since you can combine the two smallest, and repeat
@technicalgaming4936
@technicalgaming4936 11 ай бұрын
I’m only half way through but I have thought of an idea if you can turn the world list of 0’s 1’s into a 2D array then send the positions of each correct word we could maybe cut but the array would be so long that we might have to cut it up and have another number but cut it up as many times to find the most efficient one then decide it by doing the opposite but I don’t know if it would work
@NonTwinBrothers
@NonTwinBrothers 11 ай бұрын
I was amazed I recognized the profile picture "Wait, is that the Acronymy guy?"
@fireyblackdragon
@fireyblackdragon 11 ай бұрын
about the thumbnail: is the solution word "first"? it was the only one I could think of given the colors. Most wordle-esque designs like that tend to be impossible, just using random coloring.
@msclrhd
@msclrhd 11 ай бұрын
I wonder if it would make sense to encode the data as a trie first. That way, you can take advantage of the sparsity of the data. Another trick could be to pack the branches of the trie into a 26bit integer, with one bit per letter -- i.e. each bit indicates whether the letter is used. Using your 7bits per byte from the base 125 encoding, that gives 4 bytes per trie node with 2 bits for additional information. Because all words are 5 letters, you don't need to store an "end of string" marker. Note: I haven't calculated how many bytes this encoding/approach will need for the complete trie data structure. You then just need to order them in a way that makes it easy to walk the trie in a compact decoder algorithm that prints out the words.
@dwrensha
@dwrensha 11 ай бұрын
some of this sounds similar to what Luke G is doing in the currently-first-place submission: www.luke-g.com/modeling-5-letter-english-words/
@yash1152
@yash1152 11 ай бұрын
3:33 3:42 to 4:10 woaohwww!! plots and plots.
@unflexian
@unflexian 11 ай бұрын
how would you go about implementing this for arbitrary length strings, or just a general byte sequence? ~it's for an advanced research project involving~ i'm trying to make bad apple in shenzhen i/o
@dwrensha
@dwrensha 11 ай бұрын
Haha, now that's a demo I want to see! I'm not sure exactly how to translate the specific ideas from this video, but the general strategy of "find some model where your data has low entropy" should be a good guiding principle.
@unflexian
@unflexian 11 ай бұрын
​@@dwrenshaI've compressed the pixel data to 19kb, and after two days of fiddling I can't get it any lower without implementing rar or some ridiculously complicated file format. Luckily, there's a mod that lets you resize the working grid. it only works for 2x which is not enough to store 19kb, but it's open source so... see you in a bit.
@trex2537
@trex2537 11 ай бұрын
I am new to Javascript and programming. I can't understand a thing can someone please explain? What kind of programming is this and Where is this used ?
@empty5013
@empty5013 11 ай бұрын
it's interesting that theres a bias towards the bottom right of each node, is that coming from the algorithm itself?
@dwrensha
@dwrensha 11 ай бұрын
That's coming from the layout algorithm used by `d3.treemap()`. Positioning and shape are chosen in way that's supposed to be balanced and nice to look at. The only geometric thing that's significant is the area, which is proportional to the number of strings represented by the node.
@Z0DI4C
@Z0DI4C 11 ай бұрын
Seems you could encode tokens in a Huffman tree taking advantage of the knowledge the output would be 5 characters, allowing you to optimally encode with the assumption that the decoder could truncate all bytes past 5. For example, a node in the tree expanding to a token of "aste" could also be used as the last token in everything ending in "ast" ("ast ", "as ", and "a "). "[ide][aste]" would truncate/decode into "ideas" while sharing tokens with both "[p][aste]" ("paste") and "[ovo][ide]" ("ovoid").
@dwrensha
@dwrensha 11 ай бұрын
Interesting! How would you choose your set of tokens?
@Z0DI4C
@Z0DI4C 11 ай бұрын
​@@dwrensha That's the hard part, I think :) -- you could brute force it, partly by searching for the most common 3- and 4-character strings (previewable with Notepad++'s Count functionality) and adding weight to them by the count of the valid strings which (in terms of regex) match "....", "... ", ".. ", and ". ". I'm not sure if I would expect the resulting tree to be mostly for 3-character-long tokens (e.g. "are", "ang", "ear", "one", "ast", "ide", "der"), resulting in an output which mostly truncates just one trailing character, or a mixture of 1-2 character plus 4-character tokens. There are a lot of possibilities there.
@DerMatheman
@DerMatheman 11 ай бұрын
Instead of a tree one can use a graph that has directional paths. And every path hast to start at root and only contains 5 letters one could easily cut those bytes to half
@Exilum
@Exilum 11 ай бұрын
Surprised no one commented First. Good thumbnail.
@k.umquat8604
@k.umquat8604 11 ай бұрын
Very interesting video. Unfortunately I can only understand the gist of it
@coffeefish4743
@coffeefish4743 11 ай бұрын
making trees sounds lkike a good task for a nureral network.
@dwrensha
@dwrensha 11 ай бұрын
Please try it and let me know how well it works!
@aze4308
@aze4308 Ай бұрын
woahhhh
@Preinstallable
@Preinstallable 11 ай бұрын
Only 500 subs? Wow
@joshuachan6317
@joshuachan6317 11 ай бұрын
So... what's the word on the thumbnail
@kinglysandus
@kinglysandus 11 ай бұрын
I just clicked on the video to say that I think the answer is "First" in the Wordle puzzle in the thumbnail.
@dwrensha
@dwrensha 11 ай бұрын
bingo
@EHMM
@EHMM 11 ай бұрын
the answer to the wordle in the thumbnail might be fires
@dillon8124
@dillon8124 11 ай бұрын
it's first S can't be the last letter
@taraskuzyk8985
@taraskuzyk8985 11 ай бұрын
how many hours did you spend on the title?
@dwrensha
@dwrensha 11 ай бұрын
I was stuck on the title for weeks! I hit upon the "use five-letter words" idea first thing in the morning the day before the SoME3 deadline. After that, it took less than an hour to select the words in a satisfactory way.
@Omena0
@Omena0 11 ай бұрын
Me when i make a request to a server to get the wordle words 💀💀💀
@dwrensha
@dwrensha 11 ай бұрын
oh no please don't die we just wanted to send you as small of a payload as possible
@helper_bot
@helper_bot 11 ай бұрын
oh no i clicked this from reccommendation
@GameJam230
@GameJam230 11 ай бұрын
The irony is, anybody who commented "first" on this video was finally right for once...
@BlastinRope
@BlastinRope 11 ай бұрын
FRITS
@stevemelons
@stevemelons 11 ай бұрын
Good video but why didn't you implement it...
@adsan7787
@adsan7787 11 ай бұрын
Why is grrrl a word?
@dwrensha
@dwrensha 11 ай бұрын
I think you're going to have to ask Josh Wardle that one.
@dwrensha
@dwrensha 11 ай бұрын
en.wiktionary.org/wiki/grrrl
@adrianover
@adrianover 11 ай бұрын
First
@zyxwv
@zyxwv 11 ай бұрын
horrible wordle start. did not get the word.
@dwrensha
@dwrensha 11 ай бұрын
Sorry about those inept ploys.
@zyxwv
@zyxwv 11 ай бұрын
@@dwrensha with that, you could figure out the word, but it'd be difficult
@PipStuart
@PipStuart 11 ай бұрын
@@dwrensha You're quite adept along these words. Bravo! ;) Great video makes Pippi happy. Super enjoy! Peace.
@dwrensha
@dwrensha 11 ай бұрын
Witty reply, there. Props!
@zyxwv
@zyxwv 11 ай бұрын
@@dwrensha didn't get today's wordle with that :(
@buttonasas
@buttonasas 11 ай бұрын
first
@TheOobo
@TheOobo 11 ай бұрын
The thumbnail beat you, sorry
@Green24152
@Green24152 11 ай бұрын
did anyone else figure out the wordle on thumbnail
@nanopi
@nanopi 11 ай бұрын
FIRST
@vepply
@vepply 11 ай бұрын
Ong you sound like shounic and yall both make coding content lol
@Cyberian_Khatru
@Cyberian_Khatru 11 ай бұрын
8:55 "maybe if we stare at these things long enough..." we will inevitably realize how dumb the english language is lol That's what I was thinking, at least.
@NiGHTcapD
@NiGHTcapD 9 ай бұрын
I think that the way the wordlist is compressed is similar to the way pokemon sprites were originally compressed? I have to assume, because you did not explain.
@myrusEW
@myrusEW 11 ай бұрын
First…
@nbecnbec
@nbecnbec 11 ай бұрын
First
@mydearaniryx7829
@mydearaniryx7829 11 ай бұрын
FIRST
@durellnelson2641
@durellnelson2641 11 ай бұрын
First
@MrArbaras
@MrArbaras 11 ай бұрын
First
@F1e308
@F1e308 11 ай бұрын
First
Tactics & Keyframes: Visualizing Lean 4 Proofs in Blender
8:47
David Renshaw
Рет қаралды 6 М.
Inside Out 2: Who is the strongest? Joy vs Envy vs Anger #shorts #animation
00:22
Mom's Unique Approach to Teaching Kids Hygiene #shorts
00:16
Fabiosa Stories
Рет қаралды 39 МЛН
Alex hid in the closet #shorts
00:14
Mihdens
Рет қаралды 19 МЛН
The Mathematics of String Art
10:36
Virtually Passed
Рет қаралды 518 М.
Oh, wait, actually the best Wordle opener is not “crane”…
10:53
How principled coders outperform the competition
11:11
Coderized
Рет қаралды 1,6 МЛН
Stop, Intel’s Already Dead!
13:47
Linus Tech Tips
Рет қаралды 833 М.
The purest coding style, where bugs are near impossible
10:25
Coderized
Рет қаралды 938 М.
Dear Game Developers, Stop Messing This Up!
22:19
Jonas Tyroller
Рет қаралды 700 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 862 М.
The Boundary of Computation
12:59
Mutual Information
Рет қаралды 994 М.
The hidden beauty of the A* algorithm
19:22
polylog
Рет қаралды 852 М.
Faster than Rust and C++: the PERFECT hash table
33:52
strager
Рет қаралды 534 М.
Samsung laughing on iPhone #techbyakram
0:12
Tech by Akram
Рет қаралды 7 МЛН
Запрещенный Гаджет для Авто с aliexpress 2
0:50
Тимур Сидельников
Рет қаралды 1,1 МЛН
My iPhone 15 pro max 😱🫣😂
0:21
Nadir Show
Рет қаралды 1,9 МЛН