Пікірлер
@Kira-ji5pr
@Kira-ji5pr 11 сағат бұрын
I’m thinking of switching from full stack to data engineering . Any advice ??
@William-B
@William-B 23 сағат бұрын
We’re a young data team for a large organization. Biggest roadblocks for us are issues with data governance (“you can’t have or report on our data”), budget for tooling (“prove the value of the tool, then we can purchase it”), and cloud concerns (“all my data is on-prem. You can’t just put it in the cloud”)
@smrtysam
@smrtysam Күн бұрын
This has happened to me. Now I’m leading a team of data scientists, engineers, analysts and migration specialists. I’ve had to learn so much so quick about strategy and people management. I’ve had to coach the people on my team to really empower and own their own tasks. At the beginning of being head of data I was taking on way too many “low level tasks”. Now I’m delegating and empowering. I still have alot to learn though.
@crisithink9509
@crisithink9509 Күн бұрын
I wonder how much Data God has in the Aether/Astral realm 🤔
@SeattleDataGuy
@SeattleDataGuy Күн бұрын
If you're looking for help setting up your data team and strategy, then feel free to set-up a free consultation here - calendly.com/ben-rogojan/consultation
@Ian-vh2vv
@Ian-vh2vv Күн бұрын
Just went thru this process with my company the past year. Great video. With us it went something like: - Where is all of our data - How are we doing reporting now - What are the shortcomings of existing reporting solutions - Do we need a warehouse (yes) - What warehouse do we pick - What ETL stack makes sense for our use case - What do we integrate in what order to maximize value and get adoption rolling Also, Having someone on the exec level champion the BI effort and really push it forward was huge for the thing to actually materialize.
@SeattleDataGuy
@SeattleDataGuy Күн бұрын
Thanks for sharing! I really appreciate it when people add more context and their own experiences. Were there any gotchas you ran into while going through this process?
@baw5xc333
@baw5xc333 17 сағат бұрын
How long did this rollout take?
@Ian-vh2vv
@Ian-vh2vv 15 сағат бұрын
@@baw5xc333 about 6 months from step 1 until I started development (first snowflake table and started integrating our first source system)
@sirus312
@sirus312 Күн бұрын
I keep hearing from top CEOs that with Palantir we don't need teams anymore
@SeattleDataGuy
@SeattleDataGuy Күн бұрын
I'd love to believe this! I guess the reason I have a hard time believing it is because I know there are lots of consultants that work in the space of setting up Palantir which suggests that it still requires technical skills to set-up and work with(also based on a few conversations I have had with people working with Palantir). But always happy to be wrong.
@hakeem1340
@hakeem1340 Күн бұрын
Thank you for sharing
@SeattleDataGuy
@SeattleDataGuy Күн бұрын
Thank you for watching!
@hantt
@hantt 2 күн бұрын
the de role should not exist, it should just be sde who also own data as a product. kind of lile front end, backend, thete will be a data focused engineer, that we can call data engineer. o wait
@nathannguyen2041
@nathannguyen2041 3 күн бұрын
Hm. Makes me think that I should DM the data engineer that I vaguely know and have communicated with once or twice on Slack about what kind of work he does and if I would be able to work on low priority projects. Any recommended ice breakers?
@crypt_hodl
@crypt_hodl 3 күн бұрын
Interested! can you please have special pricing for people in Africa. 50% reduction is good but our earnings are way too low probably 20x less than those in US or Europe. It becomes difficult for us to participate in this type of good courses. Any help! Thanks.
@madihenry7861
@madihenry7861 3 күн бұрын
Hi! can you please share the full screen for what you have typed under the config_file?
@data-dynamo-guy
@data-dynamo-guy 4 күн бұрын
I also find myself building stuff rather than analyzing business problems @@
@SeattleDataGuy
@SeattleDataGuy 3 күн бұрын
It's always interesting how we all come to the same conclusion, thanks for watching!
@Aristocle
@Aristocle 5 күн бұрын
Is there a service or scripting language that allows me to write relationships between tables/databases in a modern material design style?
@serk-s
@serk-s 5 күн бұрын
Man, you really need to stop pitching your voice higher at the end of your sentences :(
@SeattleDataGuy
@SeattleDataGuy 3 күн бұрын
fair enough, on the flip side i have picked up a vocal fry trying to do that lol
@richardmartin6605
@richardmartin6605 6 күн бұрын
Would love to see article reviews!
@SeattleDataGuy
@SeattleDataGuy 3 күн бұрын
awesome! any particular articles!
@initialb811
@initialb811 6 күн бұрын
This is really awesome. Would love to see more of this!
@SeattleDataGuy
@SeattleDataGuy 3 күн бұрын
this is one of my all time faves
@TJInTech10
@TJInTech10 6 күн бұрын
thx for breaking it down
@SeattleDataGuy
@SeattleDataGuy 3 күн бұрын
glad you found it helper!
@TJInTech10
@TJInTech10 3 күн бұрын
@@SeattleDataGuy yes, thx , I'm trying to understand how Knowledge graph/Vector DB's will integrate into this too, is it safe to assume both will be essential pieces of the enterprise ai layer/stack now being invested in heavily, or do you see one being more relevant in next 2-5 yrs?
@AnalyticsEngineer-hg3to
@AnalyticsEngineer-hg3to 10 күн бұрын
Don’t just be a task taker, be a strategic player.
@SeattleDataGuy
@SeattleDataGuy 3 күн бұрын
thanks for reading my articles and watching my videos!
@B-gaming930-fl5qr
@B-gaming930-fl5qr 11 күн бұрын
E5 is where it's at 750 Million 😂
@osoucy
@osoucy 11 күн бұрын
To me, one of the main benefit of Spark Structured Streaming is that you can easily switch between near real-time (micro batches) and scheduled batch processing without having to re-writing a single line of code. This is a very effective way of scaling up and down and balancing costs vs latency.
@cestlachance7575
@cestlachance7575 13 күн бұрын
Is this really a good video? i feel like he just namedrops every techs
@moussaelaqqaoui
@moussaelaqqaoui 13 күн бұрын
Hello ben, can we have a discussion please !
@DataPains
@DataPains 13 күн бұрын
Great video! Thank you for sharing!
@SeattleDataGuy
@SeattleDataGuy 3 күн бұрын
thanks for watchin!
@danhorus
@danhorus 15 күн бұрын
13:03 in Spark, we avoid Python UDFs like the plague because they're much slower than native Spark code. I wonder if the same is true for Flink, given that it also runs on JVMs. A quick Google search indicates that vectorized UDFs are a thing in Flink too, so I assume the same limitations apply
@SeattleDataGuy
@SeattleDataGuy 15 күн бұрын
Thanks for the added context! It's much appreciated I now am thinking if I have ever had a good experience with a UDF 🤣. I always remember touting them, but even in one case where i do recall trying it out on SQL Server, we found it slow.
@danhorus
@danhorus 15 күн бұрын
​​@@SeattleDataGuy With Spark, there are several ways to write transformations. By far, the best option is to use native Spark functions, as they compile to highly optimized and parallelized Java byte code. The second best option is to write UDFs in Scala or Java, as everything still runs in the same JVM. The third best option, in case you want/need to use Python, is to write a vectorized UDF (also known as Pandas UDF), which leverages Apache Arrow to move data between the JVM and the Python interpreter in batches. Finally, as a last resort, you can use regular Python UDFs, however they're a lot slower because they basically compute results row by row rather than in big batches. If you have slow Spark jobs using Python UDFs, refactoring them is usually a good way to gain some performance. About this blog post, I'm not sure the author is aware of this limitation, but if they need this code to run very very fast, they should probably avoid Python UDFs too
@danhorus
@danhorus 15 күн бұрын
​@@SeattleDataGuyI wrote a long comment about the different types of UDFs in Spark, but apparently KZbin decided to delete it. Maybe you'll find it marked as spam, lol
@SeattleDataGuy
@SeattleDataGuy 15 күн бұрын
@@danhorus Did you put a url in it? That seems to be the main reason I have seen youtube define things as spam. I'll look
@danhorus
@danhorus 14 күн бұрын
Not really, but let's try again, haha. In Spark, there are many ways to apply data transformations. By far the best option is to use native Spark functions, as they compile to highly optimized/parallelized Java byte code. The second best option to maximize performance is to use Scala or Java UDFs, as they run inside the JVM with a minor performance hit. The third option, if you want/need to use Python, is to write a vectorized UDF (also known as Pandas UDF), which leverages Apache Arrow to transfer big batches of records to the Python interpreter and back to the JVM after processing. Finally, the last option you should consider is the regular Python UDF, as it basically transforms row by row and has much worse performance as a result. If you have a slow Spark job, refactoring Python UDFs can make it a lot faster. I'm not sure the authors of the blog post are aware of this, but they can probably make their code faster too
@jace743
@jace743 15 күн бұрын
I’d watch if you did live article reviews!
@SeattleDataGuy
@SeattleDataGuy 15 күн бұрын
Yeah! I think watching other creators do it, I really gotta slow down to do it well
@ankittjindal
@ankittjindal 15 күн бұрын
Recommend me some books as I only have an idea of python and sql so..which book best for me as a beginner in data engineering field
@damien__j
@damien__j 15 күн бұрын
Great video thanks!
@SeattleDataGuy
@SeattleDataGuy 15 күн бұрын
Glad you liked it!
@knkootbaoat6759
@knkootbaoat6759 15 күн бұрын
gotta make things complex otherwise we wouldnt get paid as much. i half joke. we dont make it complex it's just situations are inherently complex
@SeattleDataGuy
@SeattleDataGuy 15 күн бұрын
we do tend to do that some times....
@AyushMandloi
@AyushMandloi 15 күн бұрын
Sound of transition is very loud
@prico3358
@prico3358 16 күн бұрын
Better crossover than a batman & Iron man movie.
@tommynelson4795
@tommynelson4795 18 күн бұрын
Minor tip. I’d recommend removing the very high pitch transitions from your videos. I thought my tinnitus was acting up haha. Other than that great vid!
@user-ux4iu7us7p
@user-ux4iu7us7p 20 күн бұрын
What are your thoughts on the new AWS Data Engineering Certification?
@elcoxeroni8273
@elcoxeroni8273 21 күн бұрын
Thank you for this really great content! Which is the book you are referring to in your video? I like the structure much and am considering buying it. Thanks in advance!
@mrgenetics4063
@mrgenetics4063 21 күн бұрын
I want to become a data scientist or engineer….my biology degree has never brought me financial security and I hope to be rich one day
@otavioattuy5394
@otavioattuy5394 23 күн бұрын
Where do I find the theory behind the "types" of dimension tables?
@glstnlev
@glstnlev 24 күн бұрын
Interesting use case about SCD2 but how in practice do we create these tables? I understand the importance and how useful is it to have a new row for each change but can’t get how to model it to make it work
@abrahamgomez653
@abrahamgomez653 24 күн бұрын
I love learning about data engineering and overall cloud computing. Cloud is the future.
@DerekGatlin
@DerekGatlin 24 күн бұрын
Thank you guys so much for your transparency- it is refreshing and I am more interested in working with you in the future as a result.
@septic7
@septic7 25 күн бұрын
Are these salaries adjusted for 2024 tranches ? 😅🥲
@maxonthetrack
@maxonthetrack 25 күн бұрын
awesome! I enjoy learning about these AI concepts in this hands-on practical way
@NoahPitts713
@NoahPitts713 25 күн бұрын
Josue is the man! thank you both for the great conversation
@poorbadger
@poorbadger 25 күн бұрын
Re: SQL Serverless…. Databricks now has job/workflow serverless which works with notebooks - a few limitations but most functionality is supported. I still use SQL all the time but that’s made the cluster start up penalty w notebooks way better
@saadoa4969
@saadoa4969 25 күн бұрын
dissapointing to know that you don't answer your viewers' emails. Solid content though
@SeattleDataGuy
@SeattleDataGuy 25 күн бұрын
I do my best! I am always playing catch up, but thank you for the support!
@SreejaThumma
@SreejaThumma 25 күн бұрын
Can you also make a video on the difference between DataBricks, Snowflake and Solix technologies
@Syed-A-Rizvi
@Syed-A-Rizvi 26 күн бұрын
so how much sql do I need? I know data science folks need expert level sql
@richardduncan3403
@richardduncan3403 26 күн бұрын
Real talk :)
@CalSticks
@CalSticks 26 күн бұрын
I really like these videos - the guests have all been fantastic and it's great to hear their views on the wider data space. Thanks for continuing to put them together. p.s. looks like you're trying to get better at not trailing off when finishing a thought - but I can tell it's hard! (I have the same problem)
@user-bc6bk7bg9i
@user-bc6bk7bg9i 27 күн бұрын
Hey Ben, what are your thoughts on MS Fabric as a data Engineer? IS is just another tool in the bucket or it actually solves the issue it claims to solve?
@norbinn
@norbinn 27 күн бұрын
In terms of data consulting, do you find more clients in need of Snowflake or Databricks expertise? Is there any correlation with the size / price point of the project?