Fashion App Series - Intro
18:05
ChatGPT4 Vision Data Augmentation
10:27
Dave Does Answers Ep1
6:10
Жыл бұрын
The Truth About Data
11:11
Жыл бұрын
GDPR and cookie dialogs
11:14
2 жыл бұрын
What is "Production Data"?
16:46
3 жыл бұрын
What is a Data Lake?
10:50
3 жыл бұрын
Long term GitHub traffic statistics
17:07
DataOps -  Databricks demo fix
1:47
3 жыл бұрын
Пікірлер
@Josemartinez-oz4es
@Josemartinez-oz4es Ай бұрын
Very helpful !! nice video
@valmirmeneses
@valmirmeneses Ай бұрын
Great ideas seeded here. Thanks for sharing.
@Lilninj3
@Lilninj3 Ай бұрын
Thanks Dave, very well and simply explained!
@paulvanputten2009
@paulvanputten2009 Ай бұрын
Hi Dave, thanks for the great video. I am trying to conceptually understand MDM. I have one question though. In the video you mention that there is one system that is the single source of truth. In the case of your example, this is the CRM system. Is it possible to design the ESB in such a way that if a record is created in the system that is not the 'main' system that via the ESB a record is created in the main system? So, a person creates an account in the web system and this person is not yet in the CRM system. Is it possible that a record is automatically created in the CRM system? The CRM will also add the email and phone number from the web system to this record. I hope the question is clear :)
@DaveDoesDemos
@DaveDoesDemos Ай бұрын
Hi Paul, thanks for the question. It sounds like you've fully understood the purpose of enterprise integration and ESBs already! Yes absolutely, that's kind of the purpose. The CRM is the system that owns the truth, but we do that by making sure all updates go to it regardless where they start. If I add a customer account in the web platform, the ESB takes that data and creates (or matches!) the account in the CRM system, which will then update any other systems that may need those details. If you do this well, then there is less work matching records when you ingest for analytics since you already know the data is matched and consistent across systems. What we're avoiding here is the customer having different accounts in different systems, and the same for other data like sales, stock, product catalog etc. In retail, product and offer SKUs in particular need to be consistent between systems and this can be very challenging between logistics and distribution where you deal with a pallet of X and the stock system with may deal with a tray of X and then the store system which deals with a single X. All the same product SKU in theory, but plenty of work to do to make the numbers match up. Long story short - your comment was spot on.
@paulvanputten2009
@paulvanputten2009 Ай бұрын
@@DaveDoesDemos Hi Dave, thanks for the response. So conceptually speaking, if you update data in system x this will also get updated in system y, even though system y is the systems that 'owns the truth'. Does this also work in practice? We are currently implementing MDM in my organisation, and currently the ESB is being developed in a way that only system y can update data and these updates will be communicated to other systems. If you update a field in system x, this will be denied. I am not sure if I agree with this method. Doesn't this go against the idea of MDM or is this a viable solution?
@Hellya38
@Hellya38 Ай бұрын
that's a very good talk, just question about 7:06 when you mentioned about update only happens at single point when there is need to replace one of the services, I think that make sense if the event payloads stay the the same but probably not the case most of the time where you still have to update the logics in other service in order to publish/consume the new event payloads, or did I misunderstood what you were trying to describe?
@DaveDoesDemos
@DaveDoesDemos Ай бұрын
Thanks for the question. Usually you'd put a translation layer between service and ESB to make the data generic and usable. If you directly integrate then you need to make translation layers for all integrated services, but with the ESB you just write one translation layer to the bus and the interfaces from the bus to other services remain the same. Take a point of sale system, when a basket is processed it might have several fields, one being pID (productID). We might have several systems with fields called p_ID, product_ID, product, productName but if we translate on the way IN to the service bus to our common productID, each of them will have a standard connector to translate to their own language on the way OUT of the ESB. If we go direct, we need to rewrite them all if we replace the POS system. This is a very simple example but the same is true of data structure/schema too and we can translate into something generic and extensible then back again. Hope that makes sense?
@lxn7404
@lxn7404 2 ай бұрын
Maybe my question is stupid but wouldn't you plug your analytics to your ESB?
@DaveDoesDemos
@DaveDoesDemos 2 ай бұрын
Many people try and fail. ESB is for operational live information. Analytics is for historical information, and the two are very different in terms of the answers they provide. Live data shows the current state, which is often different from what has happened. While it is possible to take that live feed and process it onto the lake, this often leads to errors in data and is very expensive since you end up replicating your business rules in the analytics solution, doubling the required processing (and therefore cost). As I said though, people continuously try to make this work but I've yet to see it done successfully at scale.
@thommck
@thommck 2 ай бұрын
You can't say you're "not a fashion person" with that awesome t-shirt on ;)
@contactbryson
@contactbryson 3 ай бұрын
The insight the model can provide is so impressive! Great demo, thanks for putting it together.
@illiakaltovich
@illiakaltovich 5 ай бұрын
Thank you for the high-quality video, it was really interesting and insightful
@wubinmatthew
@wubinmatthew 5 ай бұрын
解释的非常清楚,感谢提供信息。
@jamesholloway9332
@jamesholloway9332 7 ай бұрын
Great video! 'Single source of truth' and 'too much Excel everywhere' being a related one. As you say it sounds very good and gets projects spun up but even after a new central BI platform is built I've rarely seen different departments building their own reports from a shared dataset. I've seen an IT director take away "single source of truth" as an action after boardroom arguments along the lines of "my figures are different therefore your figures are wrong"; which was more of a cultural issue in the boardroom than anything.
@amj8986
@amj8986 7 ай бұрын
Well explained! Thank you so much.
@ashishsangwan5925
@ashishsangwan5925 9 ай бұрын
@dave - Can you send code for each loop ....for how to deploy multiple files from a folder ? It would be great help
@vinodnoel
@vinodnoel 9 ай бұрын
thank you so much
@bunnihilator
@bunnihilator 9 ай бұрын
how can i put all files in a new folder everytime, with datetime as name?
@balawalali679
@balawalali679 9 ай бұрын
@dave I am trying to build a analytical DB. Data will be collected from multiple sources and each is connected to each other with some master Id. I have just learned master data management, but i wonder how i will design my analytic system according to MDM
@DaveDoesDemos
@DaveDoesDemos 9 ай бұрын
Thanks for the comment. The how is pretty easy, you just need IDs to link things together. The difficult part is the business rules that you use to master the data, deciding what to keep, what's overlap or duplication, what format you want each column in. Start with the system of record generally and work back from there, if there's a valid record then use that and match other systems to it. If you have a record in another system that doesn't have a match you can create a new record (so don't use the SoR ID Key in analytics!). You then optionally feed back that there's a mismatch, while dealing with it gracefully in analytics. You may choose to drop such records and call them invalid, but make sure you document that this is happening so that the data is trustworthy. A lot of this won't be the data team's job to complete, you'll need to work with business owners to understand what they need to see in the end result
@balawalali679
@balawalali679 9 ай бұрын
Hi, Very informative, I love this video
@sameerr8849
@sameerr8849 10 ай бұрын
Simple daigram or flow chat well help really well to keep things in mind for longer time Dev so i was hope that will help.
@MrFCBruges4ever
@MrFCBruges4ever 11 ай бұрын
Great insights! Thanks a lot!
@guillaumeneyret7978
@guillaumeneyret7978 11 ай бұрын
Hello Dave, Thank you for you documentation and all your work. I'm currently working on a project where I need to send realtime data (even with a minor delay) from some multiple Garmin Watches (Venu Sq 2) sensors (HR, HRV, Skin tempature, stress level ...) to my PC. Indeed, my project is to collect different "wellness" data from different user during a meditation to make a Data Vizualisation of the meditation for each user just after the meditation. So I need to send all my watches sensors' data to my computer. As I am using multiple watches as the same time, I think that I won't be able to use a mobile phone as a proxy device linking the watches to the computer. Have you a solution in mind ? Thank you by advance
@DaveDoesDemos
@DaveDoesDemos 11 ай бұрын
Hi thanks for the comment. In theory you could set up multiple watches to do this as the phone is just there to provide them Internet access. There will be a limit on this though and I don't know what that limit is in terms of numbers. You could use something like the NPE Wasp to connect sensors direct to the computer @dcrainmakerblog may have some thoughts on that method. If you get watches with wifi they would all be able to connect direct using the API. If you use my method, your data will end up in a database which you can then use either with PowerBI for visualisation and dashboards, or you could connect Excel or similar to it. Given the scenario I'd use PowerBI and give each watch an ID so you can see in real time each user.
@guillaumeneyret7978
@guillaumeneyret7978 11 ай бұрын
@@DaveDoesDemos Well, thanks a lot for your quick and clear answer ! As I would be using a new Garmin watch with wifi, it is such a good news to hear that I won't have to use a phone. I will check the NPE Wasp method and also try using your method. I'll let you know if I managed to make it work (or not ;)) ! Once again : thank you Dave !
@DaveDoesDemos
@DaveDoesDemos 11 ай бұрын
@@guillaumeneyret7978 please do check the API docs as I'm not 100% certain it works over wifi but think it does
@DaveDoesDemos
@DaveDoesDemos 11 ай бұрын
@@guillaumeneyret7978 the docs say it works on wifi but I have not tested it personally developer.garmin.com/connect-iq/api-docs/Toybox/Communications.html#makeWebRequest-instance_function
@Taletherapper
@Taletherapper 11 ай бұрын
Thank you for this!
@user-wn6fw9bv3q
@user-wn6fw9bv3q Жыл бұрын
Excellent Dave, Thanks i love you videos, could you help me how can i use an open source MDM platform for my company?
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
Hi thanks for the feedback. Unfortunately I'm only familiar with the Microsoft tooling so don't really know the open source options. They all work in a similar way though so the skills are transferrable.
@user-wn6fw9bv3q
@user-wn6fw9bv3q Жыл бұрын
@@DaveDoesDemos So Thanks for your reply, is it possible to help me how can i setup mdm with your manner (with ESB)? do your architecture define in video called Data Hub?
@maskgirl7769
@maskgirl7769 Жыл бұрын
Can you please do a quick video on reverse process i.e. ADLS to Box via ADF
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
Hi thanks for the comment. Box doesn't have a connector for ADF so you'd be left with using the API. Generally speaking I wouldn't use ADF for this activity, it's designed to orchestrate your data lake. Instead, you should have an integration layer that updates Box, for instance using a Logic App triggered by a service bus which gets a message when new data is ready (look up Enterprise Service Bus for general info on this approach). I am assuming that Box is being used to deliver data to a customer or partner organisation in this instance, if not feel free to share more detail.
@adebolaopeyemi1039
@adebolaopeyemi1039 Жыл бұрын
weldone Dave! Quite what i needed.
@muralijonna5238
@muralijonna5238 Жыл бұрын
Thanks for sharing such a wonderful demo can please create one demo how to create CI/CD pipeline for azure AD access token with service principal
@petergamma741
@petergamma741 Жыл бұрын
The Meditation Research Institute Switzerland (MRIS) would like to thank Dave Does Demos for his great demos with the Garmin watch we offered to us on his KZbin channel. He was one of the pioneers who solved this challenging problem to access sensor data from Garmin watches. Unfortunately we have to tell him that we have found now a solution with the Apple watch.
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
Glad you found a solution in the end Peter, I hope the research goes well.
@AlejoBohorquez960307
@AlejoBohorquez960307 Жыл бұрын
Thanks for sharing such a valuable piece of information. Quick question, I'm wondering what if my workspace is not accessible over Public Network and my Azure DevOps is using a Microsoft Self Hosted Pipeline? Any thoughts?
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
In that case you'd need to set up private networking with vnets. The method would be the same, you just have a headache getting the network working. Usually there's no reason to do this though, I would recommend using cloud native networking, otherwise you're just adding operational cost for no benefit (unless you work for the NSA or a nuclear power facility...).
@AlejoBohorquez960307
@AlejoBohorquez960307 Жыл бұрын
Yeah! we are facing that scenario (customer requirement). Basically, the Azure DevOps Microsoft hosted agent (and because of that the release pipeline) wherever it'll get deployed on demad, needs to be able to reach our private databricks cluster URL passing through our azure firewall. So far I haven't got any strategy working on this. Would appreciate if you know some documentation to take a glimpse. Thanks for answering. New subscriber!
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
@@AlejoBohorquez960307 Sorry I missed the hosted agent part. Unfortunately I think you need to use a self hosted agent on your vnet to do this, or reconfigure the Databricks to use a public endpoint. It's very normal to use public endpoints on Databricks, we didn't even support private connections until last year and many large global businesses used it quite happily. I often argue that hooking it up to your corporate network poses more of a risk since attacks would then be targeted rather than random (assuming you didn't make your url identifiable, of course).
@daverook3346
@daverook3346 Жыл бұрын
It feels odd to see so much data duplicated (in the operations side). I wonder what the advantage is of having duplicated/synced data vs references to a single source of truth - it also has a familiar feeling with Domain driven design (if I've understood it right). Thank you
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
Data is always replicated on the operational systems. If you were starting from scratch and writing your own software then maybe you'd get away with it, but in the real world that doesn't happen (Amazon might be an exception there when they originally set up the book shop). As such, your warehouse system, stock system and POS system would all have their own lists of products as an example, and they usually can't use an external source for this. The ESB then gets used to ensure they all get up to date information as it changes - update any one system and the others get the changes. Single source of truth is more of a mantra than a reality, and it often causes more work than dealing with multiple copies of information. We may sometimes keep a reference set of data which would be the source of truth, but this is usually also updated by ESB. Some people then leap to a conclusion that systems should talk directly for updates, but this would multiply out the number of touchpoints and cause more work in the long run, hence we use an ESB to abstract each connection to a middleman system (the ESB) and then create a connector to each other system. We can then change out systems easily without rewriting code all over the place. The approach is also useful in larger businesses or after merger activities where you may have several of each type of system - nobody ever tidies up an environment fully! Hope that made sense, happy to add more detail.
@maheshkumarsomalinga1455
@maheshkumarsomalinga1455 4 ай бұрын
​@@DaveDoesDemos ..Thanks for this fantastic video. Talking about SSOT, could you help clarify on the below (quite a few queries...) 1) How is MDM different from SSOT? 2) Is MDM focussed only on master data such as Customers, Locations, Products etc ...whereas an SSOT can also contain transactional data? 3) I have come across articles mentioning SSOT as an aggregated version of the data. What does that mean exactly ? 4) If EDW was considered an SSOT earlier, why is it not so? 5) It would be great, if you could bring up a video on SSOT too in the future.. Thank you.
@DaveDoesDemos
@DaveDoesDemos 4 ай бұрын
@@maheshkumarsomalinga1455 MDM means different things but ultimately it ends up with SSOT one way or another. Sometimes you may also see "master data" created from other sources as reference data separately to systems, and this is another valid use of the term, but generally this is used as a reference to check against or a more pure source rather than actively used. For instance you may have a master data list of your stores, which wouldn't include unopened new ones or ones that have permanently closed, but is a current master list of open active stores. You may choose to have multiple master data lists with different purposes too, so a store list including those that have closed or yet to open. SSOT is not usually aggregated, it's just the single place you go to for the truth - that might mean aggregation sometimes, but it could mean that sales system 1 is the SSOT for subsidiary 1 and sales system 2 is the SSOT for subsidiary 2 while you may also have a data warehouse which is the SSOT for both subsidiaries for reporting purposes. In all scenarios the SSOT is the defined place which has the correct version of data for the defined use-case. As explained in my other video (truth about data), the sales system might not have "the truth" that a CFO is looking for when speaking to the markets, since sales data can and does change over time with returns, refunds etc. EDW can be a SSOT for reporting purposes but never make the mistake of thinking it's a single SSOT. The systems of record are SSOTs for current live data, the EDW is a SSOT for historical facts. Importantly, if you have an item returned in retail a month after purchase, your EDW data will change retrospectively and therefore the truth will change. EDW may also report different truths - if you have an item sold then returned, you did still make a sale, so marketing need to know a sale was made. You also had a return, so you'd want to know there was a return so you could do analytics on that. You also didn't make money, so did you make a sale or not? There are lots of truths in data depending on your perspective, but the sales system will only care about the truth right now - you didn't make a sale. Then there's the stock system - is the returned item in stock? It was sold, so no. It was returned, so yes. It may be damages so....maybe? Check out my other video at kzbin.info/www/bejne/gGqplYCrhtqnhJo&t
@maheshkumarsomalinga1455
@maheshkumarsomalinga1455 4 ай бұрын
@@DaveDoesDemos Thanks Dave for the detailed explanation ! In a way, it has made me think differently (rather broadly) about SSOT now, leading to more doubts. Let me read the details again to digest further...Your efforts are greatly appreciated. I went through your other video (truth about data) too...Found it helpful...
@Ikilledthebanks
@Ikilledthebanks Жыл бұрын
Dev data should be representative, always missing fields for us.
@vkincanada5781
@vkincanada5781 Жыл бұрын
Can you please make a video on "Databricks Code Promotion using DevOps CI/CD" using Pipeline Artifact YAML method please..
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
Hi, the methods would be identical using YAML so in theory you should be able to Google for the code examples. I have a strong preference against YAML for data pipelines in any team that doesn't have a dedicated pipeline engineer. Data teams simply don't need the stress of learning yet another markup language just to achieve something there's a perfectly good GUI for. Data deployment pipelines don't change often enough to make YAML worthwhile in my opinion. The time is better spent doing data transformation and modelling work.
@murataydian
@murataydian Жыл бұрын
Thank you, Dave!
@murataydian
@murataydian Жыл бұрын
Thank you, Dave!
@optimastic811
@optimastic811 Жыл бұрын
Please cover the rest data management applications like data lineage and refrence data management and metadata, thanks in advance
@Me-op9zm
@Me-op9zm Жыл бұрын
Can you teach how to redirect the website after submit the form? Thanks
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
Hi, if you wanted to do this I would recommend using javascript to submit the data rather than HTML, there are lots of examples of javascript forms which don't redirect the page when submitted. The demo was showing that very basic HTML forms can submit data in a very modern way, but I wouldn't necessarily do it this way in the real world. Redirecting isn't possible here since the form fires into Logic Apps and there's no way to then respond with a redirect. I created the demo to help people understand the connection between HTTP and APIs as many see them as very separate things but in reality they are all really basic fundamentals of the web. Thanks for the comment, I can't believe you're the first to mention this in three years as it's a really important topic and affects usability. The original demo I knocked up while on the stand at SQL Bits to show real time data processing and allowed the crowd to submit data, it was clunky but people loved the simplicity.
@omarrose1196
@omarrose1196 Жыл бұрын
Dave, I could kiss you. Thank you!
@nitindhingra2925
@nitindhingra2925 Жыл бұрын
Excellent Dave, Many of my queries got resolved. Keep it up.
@runilkumar3127
@runilkumar3127 Жыл бұрын
Hi Dave, Thanks a lot. Can you please help me when import the notebook from databricks uat environment if am using below am getting error. If i comment below code then note book is creating without code. Please advice. # Open and import the notebook $BinaryContents = [System.IO.File]::ReadAllBytes($fileName) $EncodedContents = [System.Convert]::ToBase64String($BinaryContents)
@shibashishvlogging
@shibashishvlogging Жыл бұрын
I am facing issue while loading the data into power bi from cosmos gremlin . This error i am facing: This query does not have any columns with the supported data types. It will be disabled from being loaded to the model. Any suggestions why it is hppening. I followed the same steps which you showed.
@tarunacharya1337
@tarunacharya1337 Жыл бұрын
Awesome demo Dave, thanks a lot - I have replicated this and works ok with one notebook in the same environment - the file name is hardcoded - $fileName = "$(System.DefaultWorkingDirectory)/_Build Notebook Artifact/NotebooksArtifact/DemoNotebookSept.py", how can I generalise this for all the files and folders in the main branch and what happens to $newNotebookName in this case?
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
Hi glad you enjoyed the demo. I'd recommend looking at using the newer Databricks methods which I've not had a chance to demo yet. These allow you to open a whole project at a time. For my older method you'd want to list out the contents of the folder and iterate through an array of filenames. In theory since you'll want your deploy script to be explicit you could even list them in the script using copy and paste, although this may get frustrating in a busy environment.
@sudheershadows1032
@sudheershadows1032 Жыл бұрын
Could you please explain me more about the binary contents in power shell script
@gayathrivenkata621
@gayathrivenkata621 Жыл бұрын
I just want to know if i have to replace the SFTP with other another source.Is thata possible.actually the condition is you shouldn't use blobstorage.is there any other way..to replace the SFTP so that I can upload my data automatically to the azure.could you please help me with this.
@greatladyp6632
@greatladyp6632 Жыл бұрын
Do you do trainings?
@goofydude02
@goofydude02 Жыл бұрын
18K + views but 900 subscribers why? if you are watching the content, no harm to subscribe right?
@JohnMusicbr
@JohnMusicbr Жыл бұрын
Awesome. Thanks, Dave.
@hectorvillafuerte8539
@hectorvillafuerte8539 Жыл бұрын
Just a minor recommendation, talk about the step you are doing. You are talking and the mouse is doing something else. Are you doing your videos ?
@DaveDoesDemos
@DaveDoesDemos Жыл бұрын
This was a very early video, I later changed the way I record and speak so hopefully the newer ones are better :)
@misterliver
@misterliver 2 жыл бұрын
Thanks for the video Dave. It has been very helpful for me. There isn't much out there about Databricks CI/CD. After adapting to your stream-of-consciousness style, it seems the presentation of ideas vs the actions in the video are totally out of sync from a scripting perspective. If viewers have some experience with the CI/CD process in Azure DevOps already, this probably is not a blocker, but it could be a little difficult if no experience (the target audience?) or if English is not your first language.
@DaveDoesDemos
@DaveDoesDemos 2 жыл бұрын
Hi Benjamin thanks for the comment and feedback. It's a difficult subject to cover well as most data people see CI/CD as scripted deployment, which is very easy. I wanted to cover it in the way it's intended which required a little more understanding of the collaborative nature of CI/CD and Git, and DevOps in general. I'm working on a bunch of new content in Microsoft UK around collaborative DevOps, testing and more agile data architectures with this stuff in mind and hopefully will translate these to some more up to date videos later in the year. This is borne out of seeing large mature customers hitting operational scale issues as data pros work in traditional ways. It's a long road though!
@arpitapramanik4679
@arpitapramanik4679 2 жыл бұрын
Now in 2022..it is failing with error like resource can't be deployed in East us
@DaveDoesDemos
@DaveDoesDemos 2 жыл бұрын
Hi thanks for the feedback. What's the specific error, is it because there aren't enough resources available or one of the items is no longer available? Does it work in other regions?
@arpitapramanik4679
@arpitapramanik4679 2 жыл бұрын
Existing template is not allowing me to change the region and error says particular resource size is not available in East us please select diff region
@DaveDoesDemos
@DaveDoesDemos 2 жыл бұрын
It might be that you're deploying to a resource group in that region. Try creating a new resource group in another region to deploy to. The region isn't hard coded so you should be able to change this. You could also edit the template to change the size of the VM. Unfortunately I don't have time right now to change and test this, otherwise I'd go in and check and update it.
@rankena
@rankena 2 жыл бұрын
Hi, I assume this works only if orders are not updated, and they do not create orders where [date] has already past (today is 2022.05.17, but order created for 2022.05.10). In such cases you will not receive new records, nor any updates. Any suggestions on videos or links on how to manage directory structure when data is updated and created "back in time"?
@DaveDoesDemos
@DaveDoesDemos 2 жыл бұрын
Hi, what you're talking about is known as "restatement" in retail and yes, this is designed for that scenario. We use tumbling windows precisely because we can re-run a given day/hour/month when a restatement is issued. You simply need to make the downstream pipelines in such a way as to allow that too. Essentially your pipelines should be able to rebuild the whole dataset any time from raw data, or any part of it without affecting the whole. A lot of data folk go down a different path and use the tools as a scheduler and try to do the processing manually - this ends up massively overcomplicated and time consuming as well as harder to change. Hope that helps
@rankena
@rankena 2 жыл бұрын
@@DaveDoesDemos , maybe retail is a bad example, but let's say you have a table that is constantly updated and you can rely only on "LastUpdated" column which indicates when the row was updated/created. Now if you do a tumbling trigger on LastUpdated column, that would force you to create files and directory structure based on that column. Which leads to bad structure, because nobody queries by "LastUpdated", they query by "OrderDate", "PostingDate", etc.. What could be done I guess is reload every single day (for example by "OrderDate") which has at least one updated record. The question is should I overwrite existing files for those days in blob storage, or place them as new, thus keeping some kind of history, but also introducing duplicates...
@DaveDoesDemos
@DaveDoesDemos 2 жыл бұрын
@@rankena In that case even better. The lastupdated allows you to only process the changes within the period of the tumbling window, so you get a changed data feed and update your model with the new data. The analytics solution will then contain the up to date version of the data. This can also work with a DataVault approach where you capture all of the new data and use SCD to enable building snapshots of any given time.
@DaveDoesDemos
@DaveDoesDemos 2 жыл бұрын
@@rankena Also worth mentioning you don't have to use it this way, you can use the other types of trigger. I chose to show this because most traditional data people won't understand the tumbling window approach which this is designed around.
@jamespyeatt8368
@jamespyeatt8368 2 жыл бұрын
When I try to setup the dataset for SFTP it wants me to chose options like .CSV, Binary, Excel ect. Im getting different options than in the video (Time 9:25) Can anyone help?
@DaveDoesDemos
@DaveDoesDemos 2 жыл бұрын
Hi James, this is because the video is a bit out of date compared to the current interface. I believe they just merged a couple of things and simplified so you set up the file type on that page instead. Choose CSV for the demo, or whatever you're using in your environment. Binary will ignore the contents and just copy files. Hopefully the rest is self explanatory but let me know if you need any pointers.
@rebeccaperkins7504
@rebeccaperkins7504 2 жыл бұрын
Go Dave