Parsing strings with nom | Advent of Code 2022 Day 04

Рет қаралды 11,245

Күн бұрын

Пікірлер: 57

@donut12365 2 жыл бұрын

Really appreciate that you're solving these problems in a way that introduces the various aspects of Rust instead of simply finding the most optimal solution!

@fritzrehde9182 2 жыл бұрын

Totally agree, never heard of rust before but now I really want to use it for the next advent days!

@BrianMelancon 2 жыл бұрын

I'm still pleasantly surprised how often it is that when Rust code compiles, it works as expected. It's so nice how many mistakes are caught early by the compiler.

@bustamove361 2 жыл бұрын

I was happy with my solution using vanilla Rust, but I knew nothing about Nom and seeing you use it makes me want to go back and re-write my solution using that. I think knowing how to use it os going to come in handy for future days

@Galakyllz Жыл бұрын

I really appreciate the basic nom introduction. That's exactly what I needed to start using this library. I hope you keep making videos - you're great at it.

@ISKLEMMI 2 жыл бұрын

To find the upper and lower bounds of ranges, RangeInclusive has start() and end() methods, while Range has start and end fields. range_a.contains(range_b.start()) || range_a.contains(range_b.end()) || range_b.contains(range_a.start()) || range_b.contains(range_a.end())

@daliareds 2 жыл бұрын

For the first part you could do: range_a.contains(range_b.start()) && range_a.contains(range_b.end()) || range_b.contains(range_a.start()) && range_b.contains(range_a.end()) to see if a range fully contains the other. And do this: range_a.contains(range_b.start()) || range_a.contains(range_b.end()) for the second part. Because you only need to check if there's overlap on one side, since it'll mean there's overlap on the other side as well

@ISKLEMMI 2 жыл бұрын

@@daliareds Your part one looks good. I don't think your part two solution will catch cases where range_b.start() < range_a.start() && range_b.end() > range_a.end()

@daliareds 2 жыл бұрын

@@ISKLEMMI You're right. Then the better solution for part two would be what you proposed originally

@markday3145 2 жыл бұрын

Thanks for introducing nom. I haven't used it yet. For this problem, I started with split(), then tried Regex. My impression is that nom takes more work to set up the parsers, but makes it very easy to get the parsed value (the tuple of ranges). With Regex, it was easy to create the pattern, but then tedious to extract the capture groups (the numbers) and do the rest of the work of parsing the numbers and packaging them up into a tuple of RangeInclusive.

@chrisbiscardi 2 жыл бұрын

I think you'll find that once you develop the level of familiarity with nom that you have with regex, you'll find yourself writing the patterns just as easily.

@driedurchin 2 жыл бұрын

@@chrisbiscardi I agree, but the first couple hours with nom are hard, since the compiler errors for that many nested functions are really gnarly.

@liamkearn 2 жыл бұрын

26:55 I’d be very surprised if that didn’t already get optimised out 😂 Your videos on AOC are so awesome ❤

@anupjadhav 2 жыл бұрын

Great video. Did not know about nom, very useful crate

@cristobaljavier 2 жыл бұрын

cool to see you used range too. I didn't know about Nom, great video!

@narigoncs 2 жыл бұрын

Just so you know, the second check for `.any()` on the range B is redundant because if any of the items of Range A exist in Range B, they overlap.

@chrisbiscardi 2 жыл бұрын

true!

@charlesbcraig 2 жыл бұрын

Nom nom nom more nom! I tried thee different approaches just to get more experience with Rust. 1. Ranges and Traits. Added a "to_inclusive_range" Trait to &str: "2-4".to_inclusive_range -> RangeInclusive::new(2,4). Then did a similar compare to yours 2. String, Ranges, Traits. Reused that trait, added a "to_string" trait, then tested using String's contains method against each string with the other (Slowest) 3. Final, and *blazingly fast*, solution was just comparing the start and end integers (I used fold the first time but changed it to how you used filter): input. lines(). filter(|line| { let (pair1, pair2) = line.split_once(",").unwrap(); let (p1s,p1e) = pair1.split_once("-").unwrap(); let (p2s,p2e) = pair2.split_once("-").unwrap(); let i1s = p1s.parse::().unwrap(); let i1e = p1e.parse::().unwrap(); let i2s = p2s.parse::().unwrap(); let i2e = p2e.parse::().unwrap(); (i2s

@dashl5069 2 жыл бұрын

Over the first four days I've written this awfully large and confusing input macro, but I think I'll switch to nom for day 5

@zeroows 2 жыл бұрын

Thank you for the great videos ❤️

@artemivasyuk1699 2 жыл бұрын

Seems like overkill solution to use ranges instead of just integers :) Thanks for introducing nom. I wonder if it’s okay to keep such a minimal and non descriptive code? I tend to separate every step into named function or trait implementation and test every one of it. Having long chains of maps, filters or untested functions alerts me. How do you work on actual production code?

@chrisbiscardi 2 жыл бұрын

I think Ranges match the semantics of the problem that we're given *and* do what we'd do anyway with two integers, so are the right choice here. Let me explain why. A Range is a struct with a start and and end field. This means they're lazy so we're not constructing the full set of numbers unless we need to. The only extra work we're doing over using integers is typing the extra two dots, but we'd have to plop the separate integers into a tuple or something to pass them around *anyway*, so we're not even saving anything. range.start() and range.end() will yield the start and end values as well, so you can use a Range in the same way you'd use two integers. Interestingly, the implementation of .contains for Range is also the same code you'd have to write for two integers, with =. You can see that here: doc.rust-lang.org/src/core/ops/range.rs.html#810-823 --- re: long chains of maps, etc I don't think this is particularly a long chain, but it's also not production code. We could take the filter function out and name it/test it if that was important, in the same way we test the process_part1 function. In this case, I the chain is fairly small. .itert().filter().count(), so testing the whole function is enough for me. We could also, of course, write another test for the parser itself as well and each of the sub-parsers. If the parser was really important I would do that as well, especially using something like quickcheck or fuzz testing to make sure the parser could handle extra input. In this case the input is guaranteed to be well-formed, so it's not as important.

@artemivasyuk1699 2 жыл бұрын

@@chrisbiscardi thanks for reply. I didn't know, range is lazy. Than it's ergonomics are way better than tuple's, because it provides extra context.

@dewaeq 2 жыл бұрын

@@chrisbiscardi Range indeed is lazy, but your implementation is not. Calling all(|num| range_b.contains(num)) is a very expensive operation which isn't required at all, instead you could just compare the start and end values of your range.

@chrisbiscardi 2 жыл бұрын

@@dewaeq yes I am aware, as you'll find I've described elsewhere in the comments already if you peruse them (someone asked about the impact of a 0..maxint range). Also try to not use "very expensive" without quantifying what you mean. In this case the difference doesn't matter at all and it's easy to mislead people if you hold back what you mean by "very expensive".

@KurtSchwind 2 жыл бұрын

If the input file was 1..MAXINT what would have been the performance impact? I also did just a check of start/end instead of using ranges because I think it said that some of the ranges were 'large' in the example. But the largest number in the range was '99' which is trivially easy for these ranges.

@chrisbiscardi 2 жыл бұрын

A Range is a struct with start and end fields, so with the current implementation you'd potentially have to loop over 1..MAXINT because we .into_iter(), but that's not actually required because we could range_a.contains(range_b.start()) && range_a.contains(range_b.end()) which wouldn't evaluate all of those numbers. .contains does exactly what you'd do if you had to check two independent numbers, using >= and

@doce3609 2 жыл бұрын

thank you very much for introducing such a great thing as nom to me. I'll be looking forward to using it more

@benibachmann9274 2 жыл бұрын

Thank you again for the video and especially for the introduction to nom! I found the crate "intervallum" to be useful for today's challenge. The functions "a.is_subset(&b)" and "a.overlap(&b)" make the code quite readable. By the way: Do we have a private leaderboard? 🙃

@aranasaurus 2 жыл бұрын

Haha I was thinking about asking this same question on the Discord 😂

@chrisbiscardi 2 жыл бұрын

I created one today. Only 200 people can join, so it's first come first serve for spots this year: code is 956267-8a6cec86

@benibachmann9274 2 жыл бұрын

@@chrisbiscardi Thanks! ❤

@dealloc 2 жыл бұрын

Range does have an intersection method that could be used, but I did the easiest way by comparing them manually without the use of Range.

@narigoncs 2 жыл бұрын

by default Range does not have an intersection method, but you can collect your range into a HashSet and do an intersection that way.

@chrisbiscardi 2 жыл бұрын

fwiw, Range::contains does the same thing you (probably) wrote by hand -- doc.rust-lang.org/src/core/ops/range.rs.html#810-823

@dealloc 2 жыл бұрын

@@chrisbiscardi Indeed, I just didn't use range in my impl. but had I done so, I had used contains as well :)

@dealloc 2 жыл бұрын

@@narigoncs That's right, sorry for the confusion!

@aranasaurus 2 жыл бұрын

Thanks again for the video! I’ll probably go back and refactor mine to use nom because this seems like it’ll definitely be useful for future days. While watching this one I had a thought, have you ever explained why you choose u32 as opposed to say usize? Or like u16 is almost assuredly big enough for this problem, but you chose u32. And so did I in my solution, I’m just curious if there’s anything more to why?

@chrisbiscardi 2 жыл бұрын

If you type let num = 5 into a rust file and see what rust analyzer infers, it's an i32. That has influenced my choice, but really I don't have a strong reason. The reason I didn't use usize is because usize is platform dependent. It can be 32 or 64, etc.

@felixst-gelais6722 2 жыл бұрын

in part 2, you dont need b_contains_a because it'll already be caught in a_contains_b (since youre testing if any at all match, makes sense for them to be in both ranges)

@bobbybobsen123 Жыл бұрын

For part 1, any reason you’re using .clone().into_iter() rather than just .iter() ?

@chrisbiscardi Жыл бұрын

RangeInclusive implements Iterator, which is where .all() comes from. .all() takes a mutable reference to the range so that it can do the iteration. While we don't strictly need the .into_iter(), we do need the clone because we have shared references to the ranges and we need something we can have mutable access to.

@YTCrazytieguy 2 жыл бұрын

number::complete is relevant when you're parsing binary numbers, not decimal representations of numbers :)

@noahperrin9905 2 жыл бұрын

what's your ide and theme pls?

@chrisbiscardi 2 жыл бұрын

vscode and night owl

@sanderbos4243 2 жыл бұрын

I fucking love your laugh, thanks for this great intro to Rust!

@Jeanpierrec19 Жыл бұрын

Better late than never, part one calls for set theory, I would likely have turned the ranges into a hashset and then just used is_subset and is_superset

@chrisbiscardi Жыл бұрын

Ah yeah that would've worked better here. And then is_disjoint would've worked nicely too

@YTCrazytieguy 2 жыл бұрын

more nom!

@YTCrazytieguy 2 жыл бұрын

I think you're not using clippy right? clippy is really useful

@chrisbiscardi 2 жыл бұрын

I am not currently showing clippy. Would you like to see it in future videos?

@YTCrazytieguy 2 жыл бұрын

@@chrisbiscardi definitely! I love clippy. I think it would have warned you that instead of doing let ... = separated_pair(...)? Ok(...) you can just do separated_pair(...)

@felixst-gelais6722 2 жыл бұрын

why does Range.any() take a FnMut predicated? why cant it just be Fn?

@marcospataro4223 2 жыл бұрын

it's because Fn is a stricter requirement than FnMut. FnMut means that the predicate can be called multiple times, and it may or may not mutate some state, whereas Fn means that the predicate can be called many times but it can't mutate any state (it's basically a pure function). every Fn is a FnMut, but not vice versa

@felixst-gelais6722 2 жыл бұрын

@@marcospataro4223 i get that, but i dont understand why any() needs to account for the state being modified. arent predicates supposed to be pure anyway?

@marcospataro4223 2 жыл бұрын

@@felixst-gelais6722 well any() doesn't really care about whether you're mutating some state. it just needs to call your predicate multiple times, hence the predicate can't be an FnOnce. you probably will be passing a pure predicate either way, but there's no reason not to allow an FnMut to be passed, as it is more general than Fn

@felixst-gelais6722 2 жыл бұрын

@@marcospataro4223 I guess so. what I hate about that though is that now the compiler can't tell it you're mutating the state or if your function is pure so it makes some function chaining impossible. it forces you to take ownership or clone at times when you wouldn't need to.