learndataa

Пікірлер

@MominAdnan-v5m Ай бұрын

Bro can we can projects codes

@supriyasontakke3105 Ай бұрын

thanks for explaining in simple language. Looking for function transformer applied to each feature...

@learndataa Ай бұрын

Thank you. Glad to hear it helped.

@ankitrana3781 Ай бұрын

Great work sir

@learndataa Ай бұрын

Thank you. Appreciate your support.

@kingstonteffyroy 2 ай бұрын

What a waste of time!!

@learndataa 2 ай бұрын

I understand that the video just mentions what a PermissionError. If you a have specific question, please feel free to ask.

@RahulKumar-ez6vw 2 ай бұрын

Could you please consider creating the LLM playlist, sir, since you are providing us with such valuable and helpful videos? Thanks a lot ❣❣

@learndataa 2 ай бұрын

Thank you for your suggestion and interest! I'm currently working on a series called 'DL Math.' Stay tuned-I'll be covering topics from an introduction to basic algebra all the way up to LLMs. Excited to share this journey with you!

@Musicalworld-c6y 2 ай бұрын

Sir, I like your content Please make Deep Learning Series also

@learndataa 2 ай бұрын

Thank you. Your support means a lot. Sure I will.

@ashkankarimi4146 3 ай бұрын

Hi Nilesh, I am just starting this playlist. Many thanks for sharing all these contents on your channel.

@learndataa 3 ай бұрын

You are welcome! I hope you find the series helpful. Feel free to post comments if you have any questions along the way. Thanks for watching and supporting the channel!

@saikirant4677 3 ай бұрын

could you give me the script it is much more advantage for us

@learndataa 3 ай бұрын

Sure. The code and examples are derived from BigQuery docs: cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax Hope it helps! Thanks for watching.

@saikirant4677 3 ай бұрын

thank you for a valuable information

@learndataa 3 ай бұрын

Glad it was helpful!

@nicholastey7413 3 ай бұрын

i usually dont comment under videos, but youre really good at teaching, helped me alot!! thank you sir :D

@learndataa 3 ай бұрын

Thank you so much. I am happy to hear that the videos were helpful. Your support means a lot to me.

@deepeshdeshmukh97 3 ай бұрын

grt job sir

@learndataa 3 ай бұрын

Thank you so much. Appreciate your support.

@skv4611 3 ай бұрын

Great work, Thanks

@learndataa 3 ай бұрын

Thank you for your support.

@MohammadYs77 3 ай бұрын

how can we assign the corresponding labels in the multiple csv files section?

@learndataa 3 ай бұрын

Hoping below helps. Thank you for watching! #-------------------------------------------------------------- # Create multiple CSV files #-------------------------------------------------------------- import os import pandas as pd # Create directory for CSV files os.makedirs("csv_data", exist_ok=True) # Create sample CSV files for i in range(1, 4): df = pd.DataFrame({ "feature1": [i*10, i*20, i*30], "feature2": [i*40, i*50, i*60], }) df.to_csv(f"csv_data/file_{i}.csv", index=False) #--------------------------------------------------------------------------------------------- # Read multiple CSV files into a dataset and attach label #--------------------------------------------------------------------------------------------- import tensorflow as tf import pathlib # Path to the CSV files csv_path = pathlib.Path("csv_data") # List all CSV files csv_files = list(csv_path.glob("*.csv")) def load_and_label(file_path): # Read the CSV file file_name = tf.strings.split(file_path, os.sep)[-1] label = tf.strings.regex_replace(file_name, ".csv", "") # Load CSV content content = tf.data.experimental.CsvDataset( file_path, [tf.float32, tf.float32], header=True ) # Add label to each row return content.map(lambda *row: (row, label)) # Create a dataset of file paths file_dataset = tf.data.Dataset.from_tensor_slices([str(f) for f in csv_files]) # Interleave the datasets and assign labels labeled_dataset = file_dataset.interleave( lambda file: load_and_label(file), cycle_length=len(csv_files), num_parallel_calls=tf.data.AUTOTUNE ) # View the dataset content for data, label in labeled_dataset: print("Data:", data) print("Label:", label.numpy().decode()) #--------------------------------------------------------------------------------------------- # Output #--------------------------------------------------------------------------------------------- Data: (<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=120.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=10.0>, <tf.Tensor: shape=(), dtype=float32, numpy=40.0>) Label: file_1 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=40.0>, <tf.Tensor: shape=(), dtype=float32, numpy=100.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=150.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=50.0>) Label: file_1 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=120.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>) Label: file_1

@SidOp786 3 ай бұрын

Keep going thanks for the knowledge🎉

@learndataa 3 ай бұрын

Thanks for the support!

@wsasonorejo5753 4 ай бұрын

thank, very clear, very important for strong foundation in deep learning

@learndataa 3 ай бұрын

Appreciate your support. Thanks for watching.

@RahulKumar-ez6vw 4 ай бұрын

Could you please share the code repository?

@learndataa 4 ай бұрын

The code is available at the link in the description. github.com/learndataa/shared Thanks for watching!

@qamechanix 4 ай бұрын

where is the table name?

@learndataa 3 ай бұрын

The 'FROM' statement usually has the table name. In the example in video, the table is directly create in the FROM statement, hence no table name is needed. Thanks for watching

@user-ce3ip5lx9t 4 ай бұрын

thanks a lot! is the jupyter notebook available?

@learndataa 3 ай бұрын

All the notebooks from scikit-learn docs are available at link below: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples/gaussian_process While these notebooks may not be exactly those in the video, the are well commented. Hope it helps!

@user-ce3ip5lx9t 3 ай бұрын

@@learndataa great and thanks a lot!!!

@user-ce3ip5lx9t 3 ай бұрын

@@learndataahowever I get an error using your link 😕

@learndataa 3 ай бұрын

@@user-ce3ip5lx9t I was able to recreate the error when not logged in to Github. Could you try logging in?

@user-ce3ip5lx9t 3 ай бұрын

@@learndataa in github I see your scikit learn repository but it appears empty to me

@datasciencegyan5145 4 ай бұрын

result is not visible

@learndataa 3 ай бұрын

Apologies. Unfortunately, the video frame got clipped. Hope the code is still helpful.

@ZhalilKachkynov 4 ай бұрын

If you agree , answer me please

@learndataa 3 ай бұрын

I have posted a rely to the earlier comment. Thanks for watching!

@ZhalilKachkynov 4 ай бұрын

Hello , Dear . I don’t know your name . About two weeks I learn basic Ai . Firstly I started to learn Python , numpy , pandas and data analysis , but I need teaching . If you have some free time two times a week , can you help me learning AI tools ? 😢

@learndataa 3 ай бұрын

Sorry for the delayed reply. First welcome to the world of AI and Data Science. To answer your question in short, I think you are already on your way and on track. May be all that is needed now is practice, practice and some more practice. I see that you have already self taught Python. The same would work for AI tools as well. Below are few personal thoughts on getting up to AI tools. Again, the route to get there may vary based on background, experience, learning style and time available for practicing code. Chances are that you may already know steps below. (Analysis) Step-1: Learn Python basics Step-2: Learn Numpy, Pandas (in detail), and Matplotlib Step-3: Try analyzing open source datasets available online: UCI ML Repository etc. Step-4: Practice, practice and practice [Note: If learning without any prior coding background. May take 6 months. The Beginner series on this channel will cover all of the topics needed.] (Machine Learning) Below assumes prior basic background in Math, Algebra and Calculus Step-5: Learn the theory of ML (Andrew Ng's Course on Machine learning) Available for free on youtube. Step-6: Begin learning implementations in scikit-learn [Intermediate course on the channel] Step-7: Practice, practice and practice Step-8: Continue learning ML fundamentals [Note: May take about 6 to 8 months, without prior ML background.] (Deep Learning) Step-9: Learn the theory of DL (again Andrew Ng's course is good start; there are others as well) Step-10: Learn a framework of your choice. On this channel TensorFlow, Keras is covered so far. Step-11: Practice, practice and practice. [Note: May take one or two semesters; 6+ months] Overall, to answer your question, I think putting in more practice time may help. Trying to understand what each line of code does makes a huge difference. I believe learning to code and getting good at AI/ML is a marathon!!! Just keep going and do not give up!!! You will get there!!! If you have any questions or suggestions, please feel free to post them as comments on the videos, I'll try to reply as best as I can. Hope it helps.

@nayeemx11 5 ай бұрын

wow, this is very good informative video

@learndataa 3 ай бұрын

Thanks for watching and your support. It means a lot.

@rayoh2011 5 ай бұрын

Thank you. The Tutorial is helpful!

@learndataa 3 ай бұрын

Appreciate your support. Thanks for watching!

@et.sachin 5 ай бұрын

Can we do this without using the UNPIVOT() clause?

@learndataa 3 ай бұрын

Thanks for watching the video. Code may help: WITH sales_data as ( SELECT 1 AS product_id, 10 AS Q1, 15 AS Q2, 20 AS Q3, 25 AS Q4 UNION ALL SELECT 2, 5, 10, 15, 20 ) /* # Option-1: UNPIVOT SELECT product_id, quarter, sales FROM sales_data UNPIVOT ( sales FOR quarter IN (Q1, Q2, Q3, Q4) ) */ /* # Option-2: UNION ALL SELECT product_id, 'Q1' AS quarter, Q1 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q2' AS quarter, Q2 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q3' AS quarter, Q3 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q4' AS quarter, Q4 AS sales FROM sales_data */ # Option-3: UNNEST and STRUCT SELECT product_id, quarter, sales FROM sales_data, UNNEST([ STRUCT('Q1' AS quarter, Q1 AS sales), STRUCT('Q2' AS quarter, Q2 AS sales), STRUCT('Q3' AS quarter, Q3 AS sales), STRUCT('Q4' AS quarter, Q4 AS sales) ]) AS t

@juliaclaira8905 5 ай бұрын

This is incredibly helpful. Thank you!

@learndataa 3 ай бұрын

Glad to hear it! Appreciate your support. Thanks for waching!

@arulprakash5589 5 ай бұрын

Thanks for the video. I have a question about scikit learn GP. I have multiple observations of the heart pressure traces. Can it be fitted to a single Gaussian Process to capture the uncertainty among multiple observations. I need multiple observations to be fitted to a single GP. But When I use scikit learn to fit, I am getting mean and covariance matrix for each pressure trace !! Thank you :)

@learndataa 3 ай бұрын

Appreciate your support. It means a lot. Thanks for watching. To answer your question, I've tried to put together a code below. Hope it helps! import numpy as np import matplotlib.pyplot as plt from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C # Create data np.random.seed(42) X1 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 1 X2 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 2 X3 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 3 # Pressure waves y1 = np.sin(X1).ravel() + np.random.normal(0, 0.1, X1.shape[0]) y2 = np.sin(X2 - 1).ravel() + np.random.normal(0, 0.1, X2.shape[0]) y3 = np.sin(X3 + 1).ravel() + np.random.normal(0, 0.1, X3.shape[0]) # Put it together y = np.concatenate([y1, y2, y3]) X_combined = np.vstack([X1, X2, X3]) # Create a kernel: Constant kernel * RBF kernel kernel = C(1.0, (1e-4, 1e1)) * RBF(1.0, (1e-4, 1e1)) # Initialize and fit GP gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10) gp.fit(X_combined, y) # Predict mean, covariance = gp.predict(X1, return_cov=True) # SD and COV std_dev = np.sqrt(np.diag(covariance)) # Plot plt.figure(figsize=(10, 6)) # Original data plt.plot(X1, y1, 'r.', markersize=10, label='Observation 1') plt.plot(X2, y2, 'g.', markersize=10, label='Observation 2') plt.plot(X3, y3, 'b.', markersize=10, label='Observation 3') # Predicted GP mean plt.plot(X1, mean, 'k-', label='GP Mean') # CI of GP plt.fill_between(X1.ravel(), mean - 1.96 * std_dev, mean + 1.96 * std_dev, color='gray', alpha=0.2, label='95% Confidence Interval') plt.title('Gaussian Process Regression on Multiple Observations') plt.xlabel('Time') plt.ylabel('Pressure') plt.legend() plt.show()

@NamrataSingh-pf1zn 5 ай бұрын

The result is shwoing number.x in row the value is 1. Its not showing curly bracket in the resulat window

@learndataa 3 ай бұрын

Apologies for delayed reply. Trying to understand the question. At 2:18 in the video, the output table i.e result is {"x":"1"} in the "number" column. Could you please elaborate! Thanks for watching!

@shiladityachowdhury3279 6 ай бұрын

I need to talk to you regarding this project how to connect with you

@learndataa 3 ай бұрын

Apologies for delayed reply. Thanks for reaching out! If you have specific questions about the project please feel free to ask here, and I'll do my best to answer.

@josafatzamora5224 6 ай бұрын

help!! i REPLICATE DE CODE AND ITS NOT WORKING :(

@learndataa 6 ай бұрын

I would need the python version, line that causes the error and the error message?

@sahilgarg9304 6 ай бұрын

i followed your series , its very amazing

@learndataa 6 ай бұрын

Thank you for your support. Happy to hear that you enjoyed the series.

@singh-ml5mj 6 ай бұрын

How to extract this data?

@learndataa 6 ай бұрын

For the video, data was tabulated manually page by page. You could try their API's such as: developer-docs.amazon.com/amazon-business/docs/product-search-api-v1-reference

@ajijulislamhridoy9212 7 ай бұрын

Could i have the source code please, sir?

@learndataa 7 ай бұрын

Code is available in the repository: github.com/learndataa/examples/tree/master/lite/examples Thanks for watching

@aditikar2383 7 ай бұрын

from line 45 its not working . kindly help

@learndataa 7 ай бұрын

It is difficult to answer the question without more information such as error. Line # 45 @ 22:18 data['nutrient'].head(2) Try checking the shape of DataFrame 'data' to make sure it is correct. Thanks for watching!

@aditikar2383 7 ай бұрын

Yes I have done it like this itself but it is showing error … it would be better if I could show you the error .

@learndataa 7 ай бұрын

could you post the error traceback?

@vsiddu95 7 ай бұрын

I am not able to get the explorer page

@learndataa 7 ай бұрын

The explorer panel should be accessible after you login to BigQuery at link below BigQuery: console.cloud.google.com/bigquery Docs: cloud.google.com/bigquery/docs/sandbox Hope it helps!

@aguntuk10 8 ай бұрын

Hi realy nice video. Can you share the jupyter notebook?

@learndataa 8 ай бұрын

Thank you for your support. All the code is derived from the docs . Below are 300+ notebooks from docs with original code and description (not directly from the video). Hope it helps! scikit-learn code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples

@sagewagner3803 8 ай бұрын

Where in this code would you split the data into training, testing, and evaluation sets? Would it be after concatenating the preprocessed inputs but before the line "titanic_preprocessing = tf.keras.Model(inputs, preprocessed_inputs_cat)"? Can you give an example of how this would be done?

@learndataa 8 ай бұрын

The data should be split before any preprocessing begins to avoid information from test set getting to the model building process. This could lead to misleading performance of the final model. Depending on the data I would do a (stratified shuffle split to get train, validation (if needed) and test set right after the line below: titanic = pd.read_csv("storage.googleapis.com/tf-datasets/titanic/train.csv") CSV data --> split (train, validation, test) --> write code to preprocess train data --> put preprocess code in function or pipeline --> call function to preprocess train/validation --> iterate/optimize to get final model --> final model ready --> run the test data through same function used to preprocess train data --> use test data function output as input to final model for prediction Split could be something like the code below using random index or much easier using sklearn.model_selection.train_test_split(). ########## # Code ########## # (not stratified) # Trying to split data into train, validation and test set titanic = pd.read_csv("storage.googleapis.com/tf-datasets/titanic/train.csv") X = titanic.drop(columns=['survived']) y = titanic['survived'] # Define the number of samples in the dataset num_samples = len(X) # Define the ratios for train-validation-test split train_ratio = 0.6 val_ratio = 0.2 test_ratio = 0.2 # Compute the number of samples for each set train_size = int(num_samples * train_ratio) val_size = int(num_samples * val_ratio) test_size = num_samples - train_size - val_size # Shuffle the indices shuffled_indices = np.arange(0,num_samples) #<--not yet shuffled np.random.shuffle(shuffled_indices) #<-- inplace shuffle # Split the shuffled indices into train, validation, and test sets train_indices = shuffled_indices[:train_size] val_indices = shuffled_indices[train_size:train_size+val_size] test_indices = shuffled_indices[train_size+val_size:] # Split X_train = X.iloc[train_indices, :] y_train = y.iloc[train_indices] X_val = X.iloc[val_indices, :] y_val = y.iloc[val_indices] X_test = X.iloc[test_indices, :] y_test = y.iloc[test_indices] print("X_train:", X_train.shape) print("y_train:", y_train.shape) print("X_val:", X_val.shape) print("y_val:", y_val.shape) print("X_test:", X_test.shape) print("y_test:", y_test.shape) # Begin preprocessing ...

@BekkariMohammed-dk9ub 8 ай бұрын

Thank u so much .... Can I send the full code to me please?

@learndataa 8 ай бұрын

Thanks for watching. All the code in. this series is derived from the examples in the docs. I have created a new repository at the link below that has a compilation of ~302 notebooks from scikit-learn. Although these are not directly from the videos, they are better commented and descriptive. Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples

@sagewagner3803 8 ай бұрын

Can you further clarify the purpose of using tf.data.Dataset.from_tensor_slices? Why is it used and how does it change the dataset into a useful format?

@learndataa 8 ай бұрын

From what I understand: "from_tensor_slices": - especially for larger datasets - creates a separate tensor for each row of input to make it easier to iterate and batch process - so if input has 3 rows and 4 columns (features), it would create 3 different tensors in the dataset for each row with 4 columns. "from_tensors": - especially for smaller datasets - creates just one tensor - so if input has 3 rows and 4 columns (features), it would create 1 tensor with all 3 rows and 4 columns ######## # Code ######## # Import libraries import tensorflow as tf import numpy as np # -------------------------------- # from_tensor_slices: input array # creates 3 arrays in dataset # -------------------------------- data = np.random.randn(3,4) dataset = tf.data.Dataset.from_tensor_slices(data) for element in dataset: print(element) # Output tf.Tensor([ 1.6346394 1.13362992 0.42821694 -0.15339032], shape=(4,), dtype=float64) tf.Tensor([ 0.90122249 0.27264101 0.26286328 -1.14954752], shape=(4,), dtype=float64) tf.Tensor([-0.27845238 -0.78464886 -0.11236994 -0.18858366], shape=(4,), dtype=float64) # -------------------------------- # from_tensor_slices: input tensor # creates 3 arrays in dataset # -------------------------------- data = tf.random.uniform(shape=[3,4]) dataset = tf.data.Dataset.from_tensor_slices(data) for element in dataset: print(element) # Output tf.Tensor([0.38493872 0.44316375 0.14045477 0.8924254 ], shape=(4,), dtype=float32) tf.Tensor([0.7913748 0.9827099 0.8950583 0.36067998], shape=(4,), dtype=float32) tf.Tensor([0.65940714 0.5389466 0.7395221 0.8307824 ], shape=(4,), dtype=float32) # -------------------------------- # from_tensors: input array # creates 3 arrays in dataset # -------------------------------- data = np.random.randn(3,4) dataset = tf.data.Dataset.from_tensors(data) for element in dataset: print(element) # Output tf.Tensor( [[-1.41672221 0.81045198 -0.3883847 -0.86726604] [ 0.69639162 -1.14857263 0.37013669 0.56729552] [-0.1541059 0.09261183 -0.00200572 -0.12433269]], shape=(3, 4), dtype=float64) # -------------------------------- # from_tensors: input tensor # creates 3 arrays in dataset # -------------------------------- data = tf.random.uniform(shape=[3,4]) dataset = tf.data.Dataset.from_tensors(data) for element in dataset: print(element) # Output tf.Tensor( [[0.30948663 0.27289176 0.6494436 0.7968806 ] [0.10863554 0.36693168 0.18443334 0.07225335] [0.2699784 0.26086116 0.88859296 0.03361833]], shape=(3, 4), dtype=float32) Thanks for watching!

@CJP3 8 ай бұрын

Hi Learndataa! Quick question, why do you code in Google Colab vs Jupyter or another IDE?

@learndataa 8 ай бұрын

It depends! Google Colab: Feels easier for deep learning to later on use free GPU (TensorFlow, Keras, GPU) or larger datasets Jupyter: Easiest to learn analytics on a local computer (Python, numpy, pandas, matplotlib, scikit-learn) PyCharm: Steeper learning curve to teach/learn analytics Thanks for watching!

@manasbagul7866 8 ай бұрын

Helloo! I had a doubt. Is there a way to augment data after loading it using 'image_dataset_from_directory'? I am getting input shape errors and all other stuff. Can you please help? Thank you!

@learndataa 8 ай бұрын

In the code below images are retrieved from directory to augment. The directory needs to have a structure such as: image_data |___ train |__ class_1 |__ class_2 |__ class_3 |___ validation |__ class_1 |__ class_2 |__ class_3 |___ test |__ class_1 |__ class_2 |__ class_3 Check link below for further details: www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory ################## # sample code ################## import tensorflow as tf from tensorflow.keras.layers.experimental import preprocessing import matplotlib.pyplot as plt # Define the directory containing your images image_dir = "/content/data2/" # Create an image dataset from the directory image_dataset = tf.keras.preprocessing.image_dataset_from_directory( image_dir, batch_size=32, # You can adjust batch size as needed image_size=(224, 224), # Adjust image size as needed shuffle=True, ) # Define your data augmentation pipeline data_augmentation = tf.keras.Sequential([ preprocessing.Rescaling(1./255), # Rescale pixel values to [0,1] preprocessing.RandomFlip("horizontal"), # Random horizontal flip preprocessing.RandomRotation(0.2), # Random rotation with 20% angle preprocessing.RandomZoom(0.2), # Random zoom with 20% zoom range # Add more preprocessing layers as needed ]) # Apply data augmentation to the image dataset augmented_dataset = image_dataset.map(lambda x, y: (data_augmentation(x), y)) # Define a function to visualize augmented images def visualize_augmented_images(dataset): plt.figure(figsize=(10, 10)) for images, labels in dataset.take(1): for i in range(9): # Visualize the first 9 images ax = plt.subplot(3, 3, i + 1) plt.imshow(images[i].numpy().astype("float32")) plt.title(f"Class: {labels[i].numpy()}") plt.axis("off") plt.show() # Visualize the augmented images visualize_augmented_images(augmented_dataset) # Now you can use augmented_dataset for training # .... Hope it helps.

@manasbagul7866 8 ай бұрын

@@learndataa Thank you so much! This helps a lot!❤️

@sagewagner3803 8 ай бұрын

How would the training loop be different if you had a large dataset and wanted to load it in small batches?

@learndataa 8 ай бұрын

Does below help! 00:30:52 - Train and evaluate the model: set batch size kzbin.info/www/bejne/l6e6nIaciL-qnsU Thanks for watching.

@tomrhee1 8 ай бұрын

When triggering a rolling regression (or for that matter, a simple mean) with a window size of 10, I wonder if you could show us how I can start the rolling process at a particular time date, let's say, '2024-01-02 9:31:00' .

@learndataa 8 ай бұрын

How about getting a subset from '2024-01-02 09:31:00' (code below)!. Thanks for watching! ##### # Code ##### import pandas as pd # Create data date_range = pd.date_range(start='2024-01-01', end='2024-01-10', freq='H') data = {'value': range(len(date_range))} df = pd.DataFrame(data, index=date_range) # Specify the start time start_time = pd.Timestamp('2024-01-02 09:31:00') # Select subset of data starting from start_time subset = df.loc[start_time:] # Calculate rolling mean or regression on the subset window_size = 10 rolling_result = subset['value'].rolling(window=window_size, min_periods=1).mean() print(rolling_result)

@tomrhee1 8 ай бұрын

Thank you guys!!!

@tomrhee1 8 ай бұрын

In your answer the date_range was only for 10 days. My project deals with a large data unfortunately. How would you handle the situations when the date range is for longer than 1 year with periods = 1 minute? Can you help me? I look forward to getting an answer from a pro. Thanks again.

@learndataa 8 ай бұрын

I'll look into it. If a window has half a million records that would be memory intensive.

@learndataa 8 ай бұрын

With a window size of half a million it may need a GPU. In the code below, I have tried to use the cudf and cupy libraries. Note the code ran out of free GPU memory of 15 GB. That would be expected because we are looking at an array of of size (365*24*60)x(365*24*60) i.e. about 554,429,160,000 (554 billion) values! I may be wrong here but we may be looking at 4000 GB of RAM! I am sorry I do not have an answer at this moment. But GPU would be the way to go! and writing a custom code and/or exploring pre-built libraries such as "cudf". ############# # example code ############# ### Install cuda (if not installed) #!pip install cudf-cu12 --extra-index-url=pypi.nvidia.com ### Import libraries import pandas as pd import numpy as np import math import cudf import cupy as cp ### Create date range # Large dataset t = pd.date_range(start='2000-01-01', end='2030-12-31', freq='1 min') # Small dataset #t = pd.date_range(start='2030-01-01', end='2030-01-02', freq='1 min') print(t.shape) # Create dataframe df = cudf.DataFrame({ 'x': np.random.randn(len(t)), 'y1': np.random.randn(len(t)), 'y2': np.random.randn(len(t)), 'y3': np.random.randn(len(t)), }, index = t ) print(df.shape) df.head(3) ### Calculate covariance by window window_size = 365*24*60 # 1 year rolling window size in minutes rolling_covariance = [] for i in range(window_size, len(df)): window_df = df.iloc[i - window_size:i] # Get the rolling window window_values = window_df.values.get() # Convert cuDF DataFrame to NumPy array window_values = window_values.T # Transpose to ensure proper shape for covariance calculation covariance_matrix = cp.cov(window_values) # Calculate covariance matrix using cuPy rolling_covariance.append(cudf.DataFrame(covariance_matrix, index=window_df.index[-len(covariance_matrix):])) rolling_covariance_df = cudf.concat(rolling_covariance) print(rolling_covariance_df.shape)

@pradhumngoyal9856 8 ай бұрын

Thank you for clear n crisp explanation, exactly what I was stuck upon..

@learndataa 8 ай бұрын

Thank you. Glad it helped.

@jeeteshb 9 ай бұрын

Hi thanks for the tutorials. If possible can you also provide the jupiter notebook links or git link for the notebook where we can find the notebook for all tutorials. These are really helpful thanks for your efforts.

@learndataa 9 ай бұрын

Thank you for your support. All the code is derived from the docs. Below are 300+ notebooks from docs with original code and description. Hope it helps! Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples

@rahmadinaadityana2673 9 ай бұрын

PLS discriminant analysis please

@learndataa 9 ай бұрын

Something like this! plsda = Pipeline([ ('pls', PLSRegression(n_components=n_components)), ('classifier', LogisticRegression()) ]) where, - PLSRegression: for multicollinearity, high dimensionality - a classifier (logistic, svc, RF): for classification Thanks for watching!

@hubabokuti 9 ай бұрын

Hi Nilesh, I really appreciate what you did here. It's an enormous effort to put together such a vast material. It's a shame the videos have so low number of views, your channel is hugely underrated. If by any chance you could share the scripts in any way that would be great. I wish you all the bestT

@learndataa 9 ай бұрын

Thanks a bunch for your awesome feedback and support. All the code in. this series is derived from the examples in the docs. I have created a new repository at the link below that has a compilation of ~302 notebooks from scikit-learn. Although these are not directly from the videos, they are better commented and descriptive. Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples If you have any questions regarding any specific video please feel free to post a comment. Thanks again for the support, and hope you enjoy diving into the series!

@hubabokuti 9 ай бұрын

@@learndataa Thanks a lot and all the best!!!

@eq716 9 ай бұрын

thanks. clear!

@learndataa 9 ай бұрын

You're welcome!

@apolloandartemis5415 9 ай бұрын

Thank you! Your videos are very helpful!

@learndataa 9 ай бұрын

You are welcome. Glad you found the videos helpful.

@undertaker7523 9 ай бұрын

Hello, thank you for this series. I have a question. Why do we have 3 different functions we sample our priors from? Is the idea that we're sampling each point 3 times and we're using our functions to generate the 3 values at each point? In practice, do we expect that sampling the same point multiple times will result in different values due to normally distributed noise? Are we capturing the mean of those 3 output values at the sample point and determining the mean of those points?

@learndataa 9 ай бұрын

Thanks for watching! (1) Why do we have 3 different functions we sample our priors from? - Because we want to explore different possibilities or hypotheses about the underlying data generating process. Each sampled function represents a possible function that could describe the data. (2) Is the idea that we're sampling each point 3 times and using our functions to generate the 3 values at each point? - Yes, exactly. When we sample three functions from the Gaussian Process prior, we are effectively generating three sets of function values at each point in the input space. These values represent different possible outcomes or predictions for the target variable. (3) In practice, do we expect that sampling the same point multiple times will result in different values due to normally distributed noise? - Yes, that's correct. In Gaussian Process regression, the function values are distributed according to a multivariate Gaussian distribution. This implies that for the same input point, we would expect different function values in general due to the randomness introduced by the Gaussian noise. (4) Are we capturing the mean of those 3 output values at the sample point and determining the mean of those points? - Yes, in practice, we can capture the mean of the sampled function values at each point to estimate the mean function of the Gaussian Process. This mean function represents the expected value or average behavior of the target variable at each point in the input space. Additionally, we can also compute other statistics such as the variance to assess uncertainty in the predictions. Thus, sampling multiple functions from the Gaussian Process prior allows us to explore different hypotheses about the data, and by observing the variations in function values across samples, we can estimate the mean function and assess uncertainty in our predictions. Hope it helps!

Ең жақсы KZbin

Пікірлер