Scheduling Notebooks in Microsoft Fabric + Reading JSON from Dynamic File Paths

Рет қаралды 4,859

Күн бұрын

Пікірлер

@pphong Жыл бұрын

Hey Will! I appreciate your detailed walkthrough of the code. The practical examples on notebook usage, datapipelines, and scheduling were very insightful, mirroring what we'd do as data engineers. Thanks!

@LearnMicrosoftFabric Жыл бұрын

Gotta keep it practical! Thanks for watching :)

@AmritaOSullivan Жыл бұрын

Thank you for this additional video!! It’s so super helpful and you explain concepts so simply.

@LearnMicrosoftFabric Жыл бұрын

I’m glad you’re finding them helpful! Thanks for asking some great questions!

@ryanmeyer506 Жыл бұрын

great videos. I'm glad to see someone making good Fabric content.

@LearnMicrosoftFabric Жыл бұрын

Thanks 🙏 lots more to come, there’s so much to learn with fabric!

@djdopus 7 ай бұрын

Great vid! Id love to see you do this with a SharePoint source, I use a lot of power automate flows to basically get my data into lists in a semi-structured way, doing this in data factory and pushing it out to business users as well as a PBI source would be my end goal

@robertbarkovicz800 4 ай бұрын

Hello Will! Thank you for your effort! I would like to understand how this is handled in the real world. When aiming for robustness and "self-healing," isn't it common to process all unprocessed files, rather than just the file from the current day? For example, what happens if there was an issue over the weekend or something similar? Regarding this kind of logic: Is it typical to move processed files to a different folder structure, or is it more common to keep track of which files were successfully processed by writing to a control file? Are there any other common mechanisms for this? If you have any references or examples related to these questions, I would greatly appreciate it. Thank you so much for your response!

@evogelpohl 12 күн бұрын

For your dyn json, why not just use spark structured streaming? wouldn't it just figure out what new files are in your /files/%partition from the last checkpoint?

@misoizi 8 ай бұрын

Thank you for the explanation using a practical example! Wouldn't it be more efficient and better maintainable to perform both steps in a single Dataflow 2.0 instead of generating the JSON files (pipeline step 1) and reading them out via the notebooks and adding the data to the end of a table (pipeline step 2)? In a Dataflow 2.0, the data handling would be omitted, the append functionality is also available there and you have everything in M code in one place (maintainability). Dataflow schedule can then be orchestrated by a pipeline as well.

@LearnMicrosoftFabric 8 ай бұрын

Yes in Fabric there’s normally always 2 or three different ways and f doing something. in this video, I wanted to show the Notebook approach. It has the benefit that the JSON format is saved in raw, plus would be possible to test and validate (not really possible with dataflow) 👍

@reedoken6143 5 ай бұрын

Hi Will! I've got a notebook setup to collect GTFS-RT (real time bus location and trip data) from a protobuff within Fabric. I had this successfully running on a schedule every couple hours, but realized I needed to start collecting it more frequently, every couple minutes, to do the needed level of analysis. However, it looks like the time it takes to deallocate and reallocate a spark session for the notebook is longer than the time between my scheduled runs. The solution might just be to have the data collection portion of the notebook run on a loop throughout the day, and then have the notebook be scheduled to run just once a day, but I was wondering if you had any other ideas, or if you know of another method for obtaining protobuf data in a Fabric lakehouse without the need for a notebook, and in tandem spark session? Thanks!

@LearnMicrosoftFabric 5 ай бұрын

take a look at the eventstream

@niteshmishra6932 27 күн бұрын

Can you create step by step video on above topic

@ismailbartolo9741 7 ай бұрын

Hello will , I would like to connect to Microsoft Fabric using a copy activity to copy my collection, but I'm encountering this error. I likely have an issue with permissions, I suppose? Or perhaps I need to develop a private endpoint? I'm not sure. Thank you for your assistance

@LearnMicrosoftFabric 7 ай бұрын

Please join the community to ask questions like these - skool.com/microsoft-fabric

@anitatrpenoska8739 7 ай бұрын

Great explanation! Thank you for the awesome video. ✨✅🥇

@jonskaggs2891 Жыл бұрын

Can we pass a dynamic parameter of the folder created and pass it the notebook? e.g. the first copy data step in the pipeline ingested the data into an adls folder or unmanaged file of 2023/08/17, rather than recalculate the date folder structure in the notebook function, can we pass the created folder during the ingest to the notebook activity?

@LearnMicrosoftFabric Жыл бұрын

we can! see this video kzbin.info/www/bejne/Y56agXismLZgocU