Applying Software Engineering Principles To Your Data Science Tasks In Python

  Рет қаралды 8,140

StrataScratch

StrataScratch

Күн бұрын

Пікірлер: 56
@dmax9324
@dmax9324 2 ай бұрын
You are very talented at teaching and explaining everything with no bloat while also not assuming anything about your audience. It was an excellent series, and I have learned a lot. Thank you very much!
@subinivi
@subinivi 2 жыл бұрын
The Best 3 videos of data migration I have ever seen before. Very impressive and stepwise explanations for all three videos. Thanks a lot.
@stratascratch
@stratascratch 2 жыл бұрын
Glad you enjoyed it!
@Gautam-lo5zy
@Gautam-lo5zy 2 жыл бұрын
I really like these types of projects. helpful to get an understanding of how real-world projects works.
@majafuntv4538
@majafuntv4538 4 жыл бұрын
You explain your thought process which is much more valuable. You’re a true blessing! Thank you so much!
@stratascratch
@stratascratch 4 жыл бұрын
Thank you! Trying my best.
@christopherwilhoite7856
@christopherwilhoite7856 2 жыл бұрын
Great content! Very thoroughly and clearly explained. I appreciate you taking the time to make such great content! I would love to see more of these series because they are not only educational, but implementable!
@karunakaranr2473
@karunakaranr2473 Жыл бұрын
Thank you for your time and effort to make this tutorial. Really helps.
@prateek2159
@prateek2159 3 жыл бұрын
Hey Nate, your videos are just too good. I love how your channel is so dedicated towards real word data science. By the way I noticed that you started a video series, "For your Data Science Project" and I really want you to continue making videos for this particular series because there's literally no one on KZbin with such guidance on DS projects and I have been looking for one since a very long time because I have my placements just after 12 months and I really want to make a full stack data science project. Thank you.
@Davidkiania
@Davidkiania 2 жыл бұрын
Nate this content is soo good you don't even have to ask viewers to subscribe anyone you values it as much as we do will subscribe and do whatever it takes to keep in touch. This is extremely great content may you continue to sour in everything you do!
@stratascratch
@stratascratch 2 жыл бұрын
Thank you! I'm glad you're enjoying the content
@mysteriousbd3743
@mysteriousbd3743 4 жыл бұрын
Thanks for part 3, I love this tutorial Series.
@stratascratch
@stratascratch 4 жыл бұрын
Thanks for watching. If there are any requests, please let me know and I'll try to make a video about it. Also, let me know if you think the coding is too slow and should be faster. I'm not always sure if people want to see me actually type code or if they would rather just seem me copy and paste the code in to make the video go faster.
@classkori5507
@classkori5507 4 жыл бұрын
Useful tutorial,think you so much
@stratascratch
@stratascratch 4 жыл бұрын
thank you! Happy to take any ideas and feedback
@camelrow
@camelrow 4 жыл бұрын
Amazingly helpful and easy to follow. Love this series on automating common tasks with Python. Can you do a series on automating with Python on calling an API and storing the JSON output in a database? Thank you again!
@stratascratch
@stratascratch 4 жыл бұрын
I'm glad you like this series. I wasn't sure if it's a topic people liked it or found boring. But I'll aim to do a few more. My next one can definitely be automating an API call to storing the data into a database. I have a few SQL videos in queue right now but I'll aim to create another python video some time early next year. I think what I'll also do is speed up the coding process too. Correct me if I'm wrong but you don't actually need to see me coding so I might just show the code one line at a time and explain it. If you have a strong opinion about it one way or another, let me know.
@camelrow
@camelrow 4 жыл бұрын
@@stratascratch Personally, the coding part is very helpful to me, especially how you describe each step and each piece of your code. When you do it live it is slow enough for me to process what you are doing and understand. If you jump ahead and skip over the coding, it's too fast for me and I'll have to figure out each piece of what you are doing (lots of pausing). I'm a novice, so that's my bias. Thank you!
@stratascratch
@stratascratch 4 жыл бұрын
@@camelrow That is really insightful and helpful. I will definitely keep the coding part in. Thanks for your input!
@grzegorzzawadzki3048
@grzegorzzawadzki3048 3 жыл бұрын
​@@stratascratch Hey Nate, I think this is one of the best data science channels out there. I've spent the last few months learning DS from kaggle and tutorials on udemy/yt, but you're the first person to code the way I'd like to learn. Unfortunately, most people focus on the DS part, so they completely ignore good software development practices. I would love to see more series on model building or cleaning data.
@stratascratch
@stratascratch 3 жыл бұрын
@@grzegorzzawadzki3048 Thanks so much! Glad you enjoy these videos. I agree with you on creating more videos on model building and cleaning data. I wish I had the time to create those videos =(. The python videos like this one takes so much effort and time that I am never able to do much else. I'll think about some other DS topics to create into videos! Thanks for the kind words and for watching my videos!
@sweety143sas
@sweety143sas 2 жыл бұрын
Any idea what should be the approach for oracle using cx_oracle, as for ddl statements it is not allowing to use bind variables.
@stratascratch
@stratascratch 2 жыл бұрын
I use Oracle only sporadically and not very extensively. I would advise you to post your question on Stack Overflow. Someone should be able to help you there. Try also Oracle Communities: Welcome | Oracle Communities and cx_Oracle community on Github: Issues · oracle/python-cx_Oracle
@sweety143sas
@sweety143sas 2 жыл бұрын
Hey Nate, I am getting this error while moving the files to the datasets directory: mv 'Customer Contracts$.csv' datasets ^ SyntaxError: invalid syntax Dataset directory is created but the files are not moving
@sweety143sas
@sweety143sas 2 жыл бұрын
I tried with this command def move_csv_files(csv_files,origin,target): # move files to directory print(origin) print(target) # Fetching csv files from origin to target directory for csv in csv_files: shutil.move(origin+csv, target) return
@Sreenu1523
@Sreenu1523 Жыл бұрын
This is one of the best tutorial ever seen. I have been searching this kind of tutorial. Thanks. How to send csvfile, table as input parameter instead of read all files from folder. Please share link or video which can help
@andrefbillette2774
@andrefbillette2774 3 жыл бұрын
Great series!
@stratascratch
@stratascratch 3 жыл бұрын
Thanks for watching!
@nargisparvin4267
@nargisparvin4267 4 жыл бұрын
Thanks Sir
@stratascratch
@stratascratch 4 жыл бұрын
Thanks for watching!
@torontodataguy
@torontodataguy 2 жыл бұрын
@StrataScratch I would love to see the part 4, where you get the data from the database for some data analysis project( maybe simple data analysis)
@jordang8135
@jordang8135 4 жыл бұрын
Very nice series with lots of helpful info. Any reason you use Postgres over others? Do you find it easier to work with?
@stratascratch
@stratascratch 4 жыл бұрын
The real answer is because I randomly chose it over 10 years ago when I was just starting and just kept with it. It's also super easy to deploy on AWS. And postgres is better than other free options like MySQL for analytics. Here's an article about the comparison (hackr.io/blog/postgresql-vs-mysql). On the job, most companies use industry grade dbs like HIVE, Greenplum, Snowflake, and MS-SQL server. There's only slight differences in terms of syntax.
@esamelhosiny5615
@esamelhosiny5615 Жыл бұрын
Thank you so much! Nate, for all your help ❤ could you please make projects from scratch about APIs & pipelines - in sales or any major you want I see your video about one and only and any advice for learning because I don't know what should i learn first in these fields.
@stratascratch
@stratascratch Жыл бұрын
Absolutely! We are creating many data science project videos coming out this year so I'm sure you'll find one interesting =)
@esamelhosiny5615
@esamelhosiny5615 Жыл бұрын
@@stratascratch Insha'Allah, Thank you, bro 😍❤
@steven345lll1
@steven345lll1 3 жыл бұрын
Thank you for a great tutorial! How do you automate running scripts? Do you manually run the python script every time you want to upload csv files into AWS database or do you use any other scheduling software like airflow or Jenkins to do that so you don't need to worry about running it manually? Are you planning on covering the subject about scheduling your script that runs periodically? As for making the interactive dashboard that ingests real time data, how can I refresh database automatically so what's displayed on the dashboard is real-time? Let's say that operators run a machine that outputs csv files with specific format which will be stored in the internal server in our company. I basically want my web application to ingest csv files in that directory and get them into aws database and displays the aggregated result real time. Any ideas? Sorry for the long questions but I would appreciate your help!
@stratascratch
@stratascratch 3 жыл бұрын
Automate running scripts is the hardest thing about the process. It's not hard because it's technically difficult. It's hard because there are so many tools out there that can do it for you. I usually rely on whatever tools my company uses for automation which has included Jenkins, Airflow, and Domino. For personal use, I either will use Airflow (create an AWS EC2 instance and install Airflow) or just manually run it each time I need it. I wasn't going to cover scheduling scripts in my series. But you're spot on with mentioning Jenkins and Airflow.
@steven345lll1
@steven345lll1 3 жыл бұрын
@@stratascratch In case you are running Airflow in AWS EC2 instance, you still need to upload csv files manually from local to EC2 instance first and then use Airflow to move them into AWS Postgres right? Is there a way you can read files on a local machine from EC2 instance?
@stratascratch
@stratascratch 3 жыл бұрын
@@steven345lll1 Yea that's totally fair. For CSV files, I'm not aware of any automated way to upload them from local to EC2. I supposed you could write an airflow py script that will ping your local for CSVs (but probably best to ping a Google Drive account or something more static). Otherwise, I would build the pipeline so that it doesn't even use CSV files. All data goes from API to db and all the transformations would be done on airflow. I've never had to implement this use case but it does seem plausible.
@steven345lll1
@steven345lll1 3 жыл бұрын
@@stratascratch Thank you for your suggestion!
@diaconescutiberiu7535
@diaconescutiberiu7535 2 жыл бұрын
How should we go about if we need to do some cleaning inside the CSV files? I have multiple CSV files which i need to ul to my postgre db; files have different columns ... so the cleaning is different for each column. If i just run this: "dataframe["country"] = dataframe["country"].replace(to_replace=['Kingdom','States','Kong','Emirates','Rico'], value=['United Kingdom','USA', 'Hong Kong','UAE','Puerto Rico'])" it will do the job and upload only the file that does have a country column, but it will fail for the other csv... script is giving me Keyerror:'country" (which is obvious as the other files don't have it)
@stratascratch
@stratascratch 2 жыл бұрын
Could you add an if/else statement that will ignore columns that do not exist in other CSVs?
@diaconescutiberiu7535
@diaconescutiberiu7535 2 жыл бұрын
@@stratascratch Worked like a charm. Thank you!
@AzureCz
@AzureCz 2 жыл бұрын
This video is almost two years old and I have no hopes of getting an answer, but here it comes: i don't understand DBs completely, but wouldn't be an bad practice to start a connection multiple times on the for loop? for what I know, I'd start the connection above the for and them proceed with the for. What do you think? Would it work?
@stratascratch
@stratascratch 2 жыл бұрын
It could work. But why not just keep the connection open -)
@AzureCz
@AzureCz 2 жыл бұрын
​@@stratascratch as far as I understood the for will connect at each round, and disconnect at the end of it. did I get it wrong or something? hahah What I thougth: connect_function() for_loop_function() disconnect_function() The way it is on the video, the connetion happens Inside the for procedure. Like we see on 22:58, the function "upload_db" is inside the for. I thought that starting a connection multiple times on the for loop could be a bad thing, and would be better to start it before the for. As I noted up there 🤔🤔
@stratascratch
@stratascratch 2 жыл бұрын
@@AzureCz No it's not a big practice to open a connection only when you need it. To be honest, it really doesn't matter unless you're opening the connection for a long period of time as you might get a timeout.
@AzureCz
@AzureCz 2 жыл бұрын
@@stratascratch Thanks. I was thinkinh about paid APIs when you have a limit of data to pull from. but now I got it :D
@diaconescutiberiu7535
@diaconescutiberiu7535 2 жыл бұрын
Awesome video (series). You, explaining the code.. and even going from beginner (video 1), advanced (video 2), expert (video 3) make these such a valuable asset for data analysts/scientists. I would really love for you to continue with automatizations like this. These beats any udemy/cursera videos... any day!. Is it possible to get a 4th video with more SQL stuff, such as: update data in the DB, with a new csv file (maybe something that gets updated daily), or append some new data to existing ones (with overwriting whatever gets duplicated)? I would also love some automatization with openpyxl (or similar libraries you are familiar with; maybe creating some charts with the cleaned data). I appreciate the efforts you've put into this videos (i've purchased the lifetime access on the platform) and I will make sure others will learn about your channel!
@stratascratch
@stratascratch 2 жыл бұрын
Thanks for your support! there are some vids here - kzbin.info/www/bejne/nJzPeXWNpNxrrKc and kzbin.info/www/bejne/bWish5lmr8ygras that cover the topics you're talking about. The only difference is that it uses an API to collect data. I would follow the same guidelines but use a CSV and pd.dataframe rather than make an API call to collect the data. But the topics like overwriting with new data, etc is covered.
@diaconescutiberiu7535
@diaconescutiberiu7535 2 жыл бұрын
@@stratascratch I'm having an issue with a specific csv i work with (i tested the final code with other csvs and they work just fine). I'm getting this error: "QueryCanceled: COPY from stdin failed: error in .read() call: UnicodeDecodeError 'charmap' codec can't decode byte 0x81 in position 6589: character maps to CONTEXT: COPY sfs_tool_t1, line 1" . It passes these prints: "opened db succesfully", "csv was created succesfully", "file opened in memory" ... and next is the error ... any thoughts (googling doesn't provide with any helpful suggestion)
@diaconescutiberiu7535
@diaconescutiberiu7535 2 жыл бұрын
I've identified the columns that generates the issue (3 of them, containing text, like sentences). My csv is an export from a sharepoint/list. I suspect, while downloading some of the characters get mess up, so probably there is some issue with the encoding of that text. Perhaps i should do something with the text within those columns (some cleaning)
@stratascratch
@stratascratch 2 жыл бұрын
@@diaconescutiberiu7535 Definitely the encoding. Try to force UTF-8 and that should fix it. I like to export from gSheets because they have the cleanest CSVs and I've never had an issue loading a csv I exported from Google. I always have problems loading csvs that are exported from Excel. Hope that helps
@sirojbekalimboyev6730
@sirojbekalimboyev6730 2 жыл бұрын
Hi everybody!! how can we count every 10000rows loaded from csv file to table in time interval?
SQL Syntax Best Practices: How to Structure Your SQL Code
16:35
StrataScratch
Рет қаралды 23 М.
REAL or FAKE? #beatbox #tiktok
01:03
BeatboxJCOP
Рет қаралды 18 МЛН
Каха и дочка
00:28
К-Media
Рет қаралды 3,4 МЛН
Sigma Kid Mistake #funny #sigma
00:17
CRAZY GREAPA
Рет қаралды 30 МЛН
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
25 Nooby Pandas Coding Mistakes You Should NEVER make.
11:30
Rob Mulla
Рет қаралды 279 М.
Working With APIs in Python - Pagination and Data Extraction
22:36
John Watson Rooney
Рет қаралды 113 М.
Simon Sinek's Advice Will Leave You SPEECHLESS 2.0 (MUST WATCH)
20:43
Alpha Leaders
Рет қаралды 2,4 МЛН
The One and Only Data Science Project You Need
13:05
StrataScratch
Рет қаралды 309 М.
All Rust string types explained
22:13
Let's Get Rusty
Рет қаралды 195 М.
The Return of Procedural Programming - Richard Feldman
52:53
ChariotSolutions
Рет қаралды 61 М.
This INCREDIBLE trick will speed up your data processes.
12:54
Rob Mulla
Рет қаралды 272 М.
REAL or FAKE? #beatbox #tiktok
01:03
BeatboxJCOP
Рет қаралды 18 МЛН