thanks for explaining in simple language. Looking for function transformer applied to each feature...
@learndataaАй бұрын
Thank you. Glad to hear it helped.
@ankitrana3781Ай бұрын
Great work sir
@learndataaАй бұрын
Thank you. Appreciate your support.
@kingstonteffyroy2 ай бұрын
What a waste of time!!
@learndataa2 ай бұрын
I understand that the video just mentions what a PermissionError. If you a have specific question, please feel free to ask.
@RahulKumar-ez6vw2 ай бұрын
Could you please consider creating the LLM playlist, sir, since you are providing us with such valuable and helpful videos? Thanks a lot ❣❣
@learndataa2 ай бұрын
Thank you for your suggestion and interest! I'm currently working on a series called 'DL Math.' Stay tuned-I'll be covering topics from an introduction to basic algebra all the way up to LLMs. Excited to share this journey with you!
@Musicalworld-c6y2 ай бұрын
Sir, I like your content Please make Deep Learning Series also
@learndataa2 ай бұрын
Thank you. Your support means a lot. Sure I will.
@ashkankarimi41463 ай бұрын
Hi Nilesh, I am just starting this playlist. Many thanks for sharing all these contents on your channel.
@learndataa3 ай бұрын
You are welcome! I hope you find the series helpful. Feel free to post comments if you have any questions along the way. Thanks for watching and supporting the channel!
@saikirant46773 ай бұрын
could you give me the script it is much more advantage for us
@learndataa3 ай бұрын
Sure. The code and examples are derived from BigQuery docs: cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax Hope it helps! Thanks for watching.
@saikirant46773 ай бұрын
thank you for a valuable information
@learndataa3 ай бұрын
Glad it was helpful!
@nicholastey74133 ай бұрын
i usually dont comment under videos, but youre really good at teaching, helped me alot!! thank you sir :D
@learndataa3 ай бұрын
Thank you so much. I am happy to hear that the videos were helpful. Your support means a lot to me.
@deepeshdeshmukh973 ай бұрын
grt job sir
@learndataa3 ай бұрын
Thank you so much. Appreciate your support.
@skv46113 ай бұрын
Great work, Thanks
@learndataa3 ай бұрын
Thank you for your support.
@MohammadYs773 ай бұрын
how can we assign the corresponding labels in the multiple csv files section?
@learndataa3 ай бұрын
Hoping below helps. Thank you for watching! #-------------------------------------------------------------- # Create multiple CSV files #-------------------------------------------------------------- import os import pandas as pd # Create directory for CSV files os.makedirs("csv_data", exist_ok=True) # Create sample CSV files for i in range(1, 4): df = pd.DataFrame({ "feature1": [i*10, i*20, i*30], "feature2": [i*40, i*50, i*60], }) df.to_csv(f"csv_data/file_{i}.csv", index=False) #--------------------------------------------------------------------------------------------- # Read multiple CSV files into a dataset and attach label #--------------------------------------------------------------------------------------------- import tensorflow as tf import pathlib # Path to the CSV files csv_path = pathlib.Path("csv_data") # List all CSV files csv_files = list(csv_path.glob("*.csv")) def load_and_label(file_path): # Read the CSV file file_name = tf.strings.split(file_path, os.sep)[-1] label = tf.strings.regex_replace(file_name, ".csv", "") # Load CSV content content = tf.data.experimental.CsvDataset( file_path, [tf.float32, tf.float32], header=True ) # Add label to each row return content.map(lambda *row: (row, label)) # Create a dataset of file paths file_dataset = tf.data.Dataset.from_tensor_slices([str(f) for f in csv_files]) # Interleave the datasets and assign labels labeled_dataset = file_dataset.interleave( lambda file: load_and_label(file), cycle_length=len(csv_files), num_parallel_calls=tf.data.AUTOTUNE ) # View the dataset content for data, label in labeled_dataset: print("Data:", data) print("Label:", label.numpy().decode()) #--------------------------------------------------------------------------------------------- # Output #--------------------------------------------------------------------------------------------- Data: (<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=80.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=120.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=10.0>, <tf.Tensor: shape=(), dtype=float32, numpy=40.0>) Label: file_1 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=40.0>, <tf.Tensor: shape=(), dtype=float32, numpy=100.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=150.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=20.0>, <tf.Tensor: shape=(), dtype=float32, numpy=50.0>) Label: file_1 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=60.0>, <tf.Tensor: shape=(), dtype=float32, numpy=120.0>) Label: file_2 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=90.0>, <tf.Tensor: shape=(), dtype=float32, numpy=180.0>) Label: file_3 Data: (<tf.Tensor: shape=(), dtype=float32, numpy=30.0>, <tf.Tensor: shape=(), dtype=float32, numpy=60.0>) Label: file_1
@SidOp7863 ай бұрын
Keep going thanks for the knowledge🎉
@learndataa3 ай бұрын
Thanks for the support!
@wsasonorejo57534 ай бұрын
thank, very clear, very important for strong foundation in deep learning
@learndataa3 ай бұрын
Appreciate your support. Thanks for watching.
@RahulKumar-ez6vw4 ай бұрын
Could you please share the code repository?
@learndataa4 ай бұрын
The code is available at the link in the description. github.com/learndataa/shared Thanks for watching!
@qamechanix4 ай бұрын
where is the table name?
@learndataa3 ай бұрын
The 'FROM' statement usually has the table name. In the example in video, the table is directly create in the FROM statement, hence no table name is needed. Thanks for watching
@user-ce3ip5lx9t4 ай бұрын
thanks a lot! is the jupyter notebook available?
@learndataa3 ай бұрын
All the notebooks from scikit-learn docs are available at link below: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples/gaussian_process While these notebooks may not be exactly those in the video, the are well commented. Hope it helps!
@user-ce3ip5lx9t3 ай бұрын
@@learndataa great and thanks a lot!!!
@user-ce3ip5lx9t3 ай бұрын
@@learndataahowever I get an error using your link 😕
@learndataa3 ай бұрын
@@user-ce3ip5lx9t I was able to recreate the error when not logged in to Github. Could you try logging in?
@user-ce3ip5lx9t3 ай бұрын
@@learndataa in github I see your scikit learn repository but it appears empty to me
@datasciencegyan51454 ай бұрын
result is not visible
@learndataa3 ай бұрын
Apologies. Unfortunately, the video frame got clipped. Hope the code is still helpful.
@ZhalilKachkynov4 ай бұрын
If you agree , answer me please
@learndataa3 ай бұрын
I have posted a rely to the earlier comment. Thanks for watching!
@ZhalilKachkynov4 ай бұрын
Hello , Dear . I don’t know your name . About two weeks I learn basic Ai . Firstly I started to learn Python , numpy , pandas and data analysis , but I need teaching . If you have some free time two times a week , can you help me learning AI tools ? 😢
@learndataa3 ай бұрын
Sorry for the delayed reply. First welcome to the world of AI and Data Science. To answer your question in short, I think you are already on your way and on track. May be all that is needed now is practice, practice and some more practice. I see that you have already self taught Python. The same would work for AI tools as well. Below are few personal thoughts on getting up to AI tools. Again, the route to get there may vary based on background, experience, learning style and time available for practicing code. Chances are that you may already know steps below. (Analysis) Step-1: Learn Python basics Step-2: Learn Numpy, Pandas (in detail), and Matplotlib Step-3: Try analyzing open source datasets available online: UCI ML Repository etc. Step-4: Practice, practice and practice [Note: If learning without any prior coding background. May take 6 months. The Beginner series on this channel will cover all of the topics needed.] (Machine Learning) Below assumes prior basic background in Math, Algebra and Calculus Step-5: Learn the theory of ML (Andrew Ng's Course on Machine learning) Available for free on youtube. Step-6: Begin learning implementations in scikit-learn [Intermediate course on the channel] Step-7: Practice, practice and practice Step-8: Continue learning ML fundamentals [Note: May take about 6 to 8 months, without prior ML background.] (Deep Learning) Step-9: Learn the theory of DL (again Andrew Ng's course is good start; there are others as well) Step-10: Learn a framework of your choice. On this channel TensorFlow, Keras is covered so far. Step-11: Practice, practice and practice. [Note: May take one or two semesters; 6+ months] Overall, to answer your question, I think putting in more practice time may help. Trying to understand what each line of code does makes a huge difference. I believe learning to code and getting good at AI/ML is a marathon!!! Just keep going and do not give up!!! You will get there!!! If you have any questions or suggestions, please feel free to post them as comments on the videos, I'll try to reply as best as I can. Hope it helps.
@nayeemx115 ай бұрын
wow, this is very good informative video
@learndataa3 ай бұрын
Thanks for watching and your support. It means a lot.
@rayoh20115 ай бұрын
Thank you. The Tutorial is helpful!
@learndataa3 ай бұрын
Appreciate your support. Thanks for watching!
@et.sachin5 ай бұрын
Can we do this without using the UNPIVOT() clause?
@learndataa3 ай бұрын
Thanks for watching the video. Code may help: WITH sales_data as ( SELECT 1 AS product_id, 10 AS Q1, 15 AS Q2, 20 AS Q3, 25 AS Q4 UNION ALL SELECT 2, 5, 10, 15, 20 ) /* # Option-1: UNPIVOT SELECT product_id, quarter, sales FROM sales_data UNPIVOT ( sales FOR quarter IN (Q1, Q2, Q3, Q4) ) */ /* # Option-2: UNION ALL SELECT product_id, 'Q1' AS quarter, Q1 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q2' AS quarter, Q2 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q3' AS quarter, Q3 AS sales FROM sales_data UNION ALL SELECT product_id, 'Q4' AS quarter, Q4 AS sales FROM sales_data */ # Option-3: UNNEST and STRUCT SELECT product_id, quarter, sales FROM sales_data, UNNEST([ STRUCT('Q1' AS quarter, Q1 AS sales), STRUCT('Q2' AS quarter, Q2 AS sales), STRUCT('Q3' AS quarter, Q3 AS sales), STRUCT('Q4' AS quarter, Q4 AS sales) ]) AS t
@juliaclaira89055 ай бұрын
This is incredibly helpful. Thank you!
@learndataa3 ай бұрын
Glad to hear it! Appreciate your support. Thanks for waching!
@arulprakash55895 ай бұрын
Thanks for the video. I have a question about scikit learn GP. I have multiple observations of the heart pressure traces. Can it be fitted to a single Gaussian Process to capture the uncertainty among multiple observations. I need multiple observations to be fitted to a single GP. But When I use scikit learn to fit, I am getting mean and covariance matrix for each pressure trace !! Thank you :)
@learndataa3 ай бұрын
Appreciate your support. It means a lot. Thanks for watching. To answer your question, I've tried to put together a code below. Hope it helps! import numpy as np import matplotlib.pyplot as plt from sklearn.gaussian_process import GaussianProcessRegressor from sklearn.gaussian_process.kernels import RBF, ConstantKernel as C # Create data np.random.seed(42) X1 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 1 X2 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 2 X3 = np.linspace(0, 10, 100).reshape(-1, 1) # 100 time points for observation 3 # Pressure waves y1 = np.sin(X1).ravel() + np.random.normal(0, 0.1, X1.shape[0]) y2 = np.sin(X2 - 1).ravel() + np.random.normal(0, 0.1, X2.shape[0]) y3 = np.sin(X3 + 1).ravel() + np.random.normal(0, 0.1, X3.shape[0]) # Put it together y = np.concatenate([y1, y2, y3]) X_combined = np.vstack([X1, X2, X3]) # Create a kernel: Constant kernel * RBF kernel kernel = C(1.0, (1e-4, 1e1)) * RBF(1.0, (1e-4, 1e1)) # Initialize and fit GP gp = GaussianProcessRegressor(kernel=kernel, n_restarts_optimizer=10) gp.fit(X_combined, y) # Predict mean, covariance = gp.predict(X1, return_cov=True) # SD and COV std_dev = np.sqrt(np.diag(covariance)) # Plot plt.figure(figsize=(10, 6)) # Original data plt.plot(X1, y1, 'r.', markersize=10, label='Observation 1') plt.plot(X2, y2, 'g.', markersize=10, label='Observation 2') plt.plot(X3, y3, 'b.', markersize=10, label='Observation 3') # Predicted GP mean plt.plot(X1, mean, 'k-', label='GP Mean') # CI of GP plt.fill_between(X1.ravel(), mean - 1.96 * std_dev, mean + 1.96 * std_dev, color='gray', alpha=0.2, label='95% Confidence Interval') plt.title('Gaussian Process Regression on Multiple Observations') plt.xlabel('Time') plt.ylabel('Pressure') plt.legend() plt.show()
@NamrataSingh-pf1zn5 ай бұрын
The result is shwoing number.x in row the value is 1. Its not showing curly bracket in the resulat window
@learndataa3 ай бұрын
Apologies for delayed reply. Trying to understand the question. At 2:18 in the video, the output table i.e result is {"x":"1"} in the "number" column. Could you please elaborate! Thanks for watching!
@shiladityachowdhury32796 ай бұрын
I need to talk to you regarding this project how to connect with you
@learndataa3 ай бұрын
Apologies for delayed reply. Thanks for reaching out! If you have specific questions about the project please feel free to ask here, and I'll do my best to answer.
@josafatzamora52246 ай бұрын
help!! i REPLICATE DE CODE AND ITS NOT WORKING :(
@learndataa6 ай бұрын
I would need the python version, line that causes the error and the error message?
@sahilgarg93046 ай бұрын
i followed your series , its very amazing
@learndataa6 ай бұрын
Thank you for your support. Happy to hear that you enjoyed the series.
@singh-ml5mj6 ай бұрын
How to extract this data?
@learndataa6 ай бұрын
For the video, data was tabulated manually page by page. You could try their API's such as: developer-docs.amazon.com/amazon-business/docs/product-search-api-v1-reference
@ajijulislamhridoy92127 ай бұрын
Could i have the source code please, sir?
@learndataa7 ай бұрын
Code is available in the repository: github.com/learndataa/examples/tree/master/lite/examples Thanks for watching
@aditikar23837 ай бұрын
from line 45 its not working . kindly help
@learndataa7 ай бұрын
It is difficult to answer the question without more information such as error. Line # 45 @ 22:18 data['nutrient'].head(2) Try checking the shape of DataFrame 'data' to make sure it is correct. Thanks for watching!
@aditikar23837 ай бұрын
Yes I have done it like this itself but it is showing error … it would be better if I could show you the error .
@learndataa7 ай бұрын
could you post the error traceback?
@vsiddu957 ай бұрын
I am not able to get the explorer page
@learndataa7 ай бұрын
The explorer panel should be accessible after you login to BigQuery at link below BigQuery: console.cloud.google.com/bigquery Docs: cloud.google.com/bigquery/docs/sandbox Hope it helps!
@aguntuk108 ай бұрын
Hi realy nice video. Can you share the jupyter notebook?
@learndataa8 ай бұрын
Thank you for your support. All the code is derived from the docs . Below are 300+ notebooks from docs with original code and description (not directly from the video). Hope it helps! scikit-learn code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples
@sagewagner38038 ай бұрын
Where in this code would you split the data into training, testing, and evaluation sets? Would it be after concatenating the preprocessed inputs but before the line "titanic_preprocessing = tf.keras.Model(inputs, preprocessed_inputs_cat)"? Can you give an example of how this would be done?
@learndataa8 ай бұрын
The data should be split before any preprocessing begins to avoid information from test set getting to the model building process. This could lead to misleading performance of the final model. Depending on the data I would do a (stratified shuffle split to get train, validation (if needed) and test set right after the line below: titanic = pd.read_csv("storage.googleapis.com/tf-datasets/titanic/train.csv") CSV data --> split (train, validation, test) --> write code to preprocess train data --> put preprocess code in function or pipeline --> call function to preprocess train/validation --> iterate/optimize to get final model --> final model ready --> run the test data through same function used to preprocess train data --> use test data function output as input to final model for prediction Split could be something like the code below using random index or much easier using sklearn.model_selection.train_test_split(). ########## # Code ########## # (not stratified) # Trying to split data into train, validation and test set titanic = pd.read_csv("storage.googleapis.com/tf-datasets/titanic/train.csv") X = titanic.drop(columns=['survived']) y = titanic['survived'] # Define the number of samples in the dataset num_samples = len(X) # Define the ratios for train-validation-test split train_ratio = 0.6 val_ratio = 0.2 test_ratio = 0.2 # Compute the number of samples for each set train_size = int(num_samples * train_ratio) val_size = int(num_samples * val_ratio) test_size = num_samples - train_size - val_size # Shuffle the indices shuffled_indices = np.arange(0,num_samples) #<--not yet shuffled np.random.shuffle(shuffled_indices) #<-- inplace shuffle # Split the shuffled indices into train, validation, and test sets train_indices = shuffled_indices[:train_size] val_indices = shuffled_indices[train_size:train_size+val_size] test_indices = shuffled_indices[train_size+val_size:] # Split X_train = X.iloc[train_indices, :] y_train = y.iloc[train_indices] X_val = X.iloc[val_indices, :] y_val = y.iloc[val_indices] X_test = X.iloc[test_indices, :] y_test = y.iloc[test_indices] print("X_train:", X_train.shape) print("y_train:", y_train.shape) print("X_val:", X_val.shape) print("y_val:", y_val.shape) print("X_test:", X_test.shape) print("y_test:", y_test.shape) # Begin preprocessing ...
@BekkariMohammed-dk9ub8 ай бұрын
Thank u so much .... Can I send the full code to me please?
@learndataa8 ай бұрын
Thanks for watching. All the code in. this series is derived from the examples in the docs. I have created a new repository at the link below that has a compilation of ~302 notebooks from scikit-learn. Although these are not directly from the videos, they are better commented and descriptive. Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples
@sagewagner38038 ай бұрын
Can you further clarify the purpose of using tf.data.Dataset.from_tensor_slices? Why is it used and how does it change the dataset into a useful format?
@learndataa8 ай бұрын
From what I understand: "from_tensor_slices": - especially for larger datasets - creates a separate tensor for each row of input to make it easier to iterate and batch process - so if input has 3 rows and 4 columns (features), it would create 3 different tensors in the dataset for each row with 4 columns. "from_tensors": - especially for smaller datasets - creates just one tensor - so if input has 3 rows and 4 columns (features), it would create 1 tensor with all 3 rows and 4 columns ######## # Code ######## # Import libraries import tensorflow as tf import numpy as np # -------------------------------- # from_tensor_slices: input array # creates 3 arrays in dataset # -------------------------------- data = np.random.randn(3,4) dataset = tf.data.Dataset.from_tensor_slices(data) for element in dataset: print(element) # Output tf.Tensor([ 1.6346394 1.13362992 0.42821694 -0.15339032], shape=(4,), dtype=float64) tf.Tensor([ 0.90122249 0.27264101 0.26286328 -1.14954752], shape=(4,), dtype=float64) tf.Tensor([-0.27845238 -0.78464886 -0.11236994 -0.18858366], shape=(4,), dtype=float64) # -------------------------------- # from_tensor_slices: input tensor # creates 3 arrays in dataset # -------------------------------- data = tf.random.uniform(shape=[3,4]) dataset = tf.data.Dataset.from_tensor_slices(data) for element in dataset: print(element) # Output tf.Tensor([0.38493872 0.44316375 0.14045477 0.8924254 ], shape=(4,), dtype=float32) tf.Tensor([0.7913748 0.9827099 0.8950583 0.36067998], shape=(4,), dtype=float32) tf.Tensor([0.65940714 0.5389466 0.7395221 0.8307824 ], shape=(4,), dtype=float32) # -------------------------------- # from_tensors: input array # creates 3 arrays in dataset # -------------------------------- data = np.random.randn(3,4) dataset = tf.data.Dataset.from_tensors(data) for element in dataset: print(element) # Output tf.Tensor( [[-1.41672221 0.81045198 -0.3883847 -0.86726604] [ 0.69639162 -1.14857263 0.37013669 0.56729552] [-0.1541059 0.09261183 -0.00200572 -0.12433269]], shape=(3, 4), dtype=float64) # -------------------------------- # from_tensors: input tensor # creates 3 arrays in dataset # -------------------------------- data = tf.random.uniform(shape=[3,4]) dataset = tf.data.Dataset.from_tensors(data) for element in dataset: print(element) # Output tf.Tensor( [[0.30948663 0.27289176 0.6494436 0.7968806 ] [0.10863554 0.36693168 0.18443334 0.07225335] [0.2699784 0.26086116 0.88859296 0.03361833]], shape=(3, 4), dtype=float32) Thanks for watching!
@CJP38 ай бұрын
Hi Learndataa! Quick question, why do you code in Google Colab vs Jupyter or another IDE?
@learndataa8 ай бұрын
It depends! Google Colab: Feels easier for deep learning to later on use free GPU (TensorFlow, Keras, GPU) or larger datasets Jupyter: Easiest to learn analytics on a local computer (Python, numpy, pandas, matplotlib, scikit-learn) PyCharm: Steeper learning curve to teach/learn analytics Thanks for watching!
@manasbagul78668 ай бұрын
Helloo! I had a doubt. Is there a way to augment data after loading it using 'image_dataset_from_directory'? I am getting input shape errors and all other stuff. Can you please help? Thank you!
@learndataa8 ай бұрын
In the code below images are retrieved from directory to augment. The directory needs to have a structure such as: image_data |___ train |__ class_1 |__ class_2 |__ class_3 |___ validation |__ class_1 |__ class_2 |__ class_3 |___ test |__ class_1 |__ class_2 |__ class_3 Check link below for further details: www.tensorflow.org/api_docs/python/tf/keras/utils/image_dataset_from_directory ################## # sample code ################## import tensorflow as tf from tensorflow.keras.layers.experimental import preprocessing import matplotlib.pyplot as plt # Define the directory containing your images image_dir = "/content/data2/" # Create an image dataset from the directory image_dataset = tf.keras.preprocessing.image_dataset_from_directory( image_dir, batch_size=32, # You can adjust batch size as needed image_size=(224, 224), # Adjust image size as needed shuffle=True, ) # Define your data augmentation pipeline data_augmentation = tf.keras.Sequential([ preprocessing.Rescaling(1./255), # Rescale pixel values to [0,1] preprocessing.RandomFlip("horizontal"), # Random horizontal flip preprocessing.RandomRotation(0.2), # Random rotation with 20% angle preprocessing.RandomZoom(0.2), # Random zoom with 20% zoom range # Add more preprocessing layers as needed ]) # Apply data augmentation to the image dataset augmented_dataset = image_dataset.map(lambda x, y: (data_augmentation(x), y)) # Define a function to visualize augmented images def visualize_augmented_images(dataset): plt.figure(figsize=(10, 10)) for images, labels in dataset.take(1): for i in range(9): # Visualize the first 9 images ax = plt.subplot(3, 3, i + 1) plt.imshow(images[i].numpy().astype("float32")) plt.title(f"Class: {labels[i].numpy()}") plt.axis("off") plt.show() # Visualize the augmented images visualize_augmented_images(augmented_dataset) # Now you can use augmented_dataset for training # .... Hope it helps.
@manasbagul78668 ай бұрын
@@learndataa Thank you so much! This helps a lot!❤️
@sagewagner38038 ай бұрын
How would the training loop be different if you had a large dataset and wanted to load it in small batches?
@learndataa8 ай бұрын
Does below help! 00:30:52 - Train and evaluate the model: set batch size kzbin.info/www/bejne/l6e6nIaciL-qnsU Thanks for watching.
@tomrhee18 ай бұрын
When triggering a rolling regression (or for that matter, a simple mean) with a window size of 10, I wonder if you could show us how I can start the rolling process at a particular time date, let's say, '2024-01-02 9:31:00' .
@learndataa8 ай бұрын
How about getting a subset from '2024-01-02 09:31:00' (code below)!. Thanks for watching! ##### # Code ##### import pandas as pd # Create data date_range = pd.date_range(start='2024-01-01', end='2024-01-10', freq='H') data = {'value': range(len(date_range))} df = pd.DataFrame(data, index=date_range) # Specify the start time start_time = pd.Timestamp('2024-01-02 09:31:00') # Select subset of data starting from start_time subset = df.loc[start_time:] # Calculate rolling mean or regression on the subset window_size = 10 rolling_result = subset['value'].rolling(window=window_size, min_periods=1).mean() print(rolling_result)
@tomrhee18 ай бұрын
Thank you guys!!!
@tomrhee18 ай бұрын
In your answer the date_range was only for 10 days. My project deals with a large data unfortunately. How would you handle the situations when the date range is for longer than 1 year with periods = 1 minute? Can you help me? I look forward to getting an answer from a pro. Thanks again.
@learndataa8 ай бұрын
I'll look into it. If a window has half a million records that would be memory intensive.
@learndataa8 ай бұрын
With a window size of half a million it may need a GPU. In the code below, I have tried to use the cudf and cupy libraries. Note the code ran out of free GPU memory of 15 GB. That would be expected because we are looking at an array of of size (365*24*60)x(365*24*60) i.e. about 554,429,160,000 (554 billion) values! I may be wrong here but we may be looking at 4000 GB of RAM! I am sorry I do not have an answer at this moment. But GPU would be the way to go! and writing a custom code and/or exploring pre-built libraries such as "cudf". ############# # example code ############# ### Install cuda (if not installed) #!pip install cudf-cu12 --extra-index-url=pypi.nvidia.com ### Import libraries import pandas as pd import numpy as np import math import cudf import cupy as cp ### Create date range # Large dataset t = pd.date_range(start='2000-01-01', end='2030-12-31', freq='1 min') # Small dataset #t = pd.date_range(start='2030-01-01', end='2030-01-02', freq='1 min') print(t.shape) # Create dataframe df = cudf.DataFrame({ 'x': np.random.randn(len(t)), 'y1': np.random.randn(len(t)), 'y2': np.random.randn(len(t)), 'y3': np.random.randn(len(t)), }, index = t ) print(df.shape) df.head(3) ### Calculate covariance by window window_size = 365*24*60 # 1 year rolling window size in minutes rolling_covariance = [] for i in range(window_size, len(df)): window_df = df.iloc[i - window_size:i] # Get the rolling window window_values = window_df.values.get() # Convert cuDF DataFrame to NumPy array window_values = window_values.T # Transpose to ensure proper shape for covariance calculation covariance_matrix = cp.cov(window_values) # Calculate covariance matrix using cuPy rolling_covariance.append(cudf.DataFrame(covariance_matrix, index=window_df.index[-len(covariance_matrix):])) rolling_covariance_df = cudf.concat(rolling_covariance) print(rolling_covariance_df.shape)
@pradhumngoyal98568 ай бұрын
Thank you for clear n crisp explanation, exactly what I was stuck upon..
@learndataa8 ай бұрын
Thank you. Glad it helped.
@jeeteshb9 ай бұрын
Hi thanks for the tutorials. If possible can you also provide the jupiter notebook links or git link for the notebook where we can find the notebook for all tutorials. These are really helpful thanks for your efforts.
@learndataa9 ай бұрын
Thank you for your support. All the code is derived from the docs. Below are 300+ notebooks from docs with original code and description. Hope it helps! Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples
@rahmadinaadityana26739 ай бұрын
PLS discriminant analysis please
@learndataa9 ай бұрын
Something like this! plsda = Pipeline([ ('pls', PLSRegression(n_components=n_components)), ('classifier', LogisticRegression()) ]) where, - PLSRegression: for multicollinearity, high dimensionality - a classifier (logistic, svc, RF): for classification Thanks for watching!
@hubabokuti9 ай бұрын
Hi Nilesh, I really appreciate what you did here. It's an enormous effort to put together such a vast material. It's a shame the videos have so low number of views, your channel is hugely underrated. If by any chance you could share the scripts in any way that would be great. I wish you all the bestT
@learndataa9 ай бұрын
Thanks a bunch for your awesome feedback and support. All the code in. this series is derived from the examples in the docs. I have created a new repository at the link below that has a compilation of ~302 notebooks from scikit-learn. Although these are not directly from the videos, they are better commented and descriptive. Code notebooks: github.com/learndataa/scikit-learn-docs/tree/main/notebooks/auto_examples If you have any questions regarding any specific video please feel free to post a comment. Thanks again for the support, and hope you enjoy diving into the series!
@hubabokuti9 ай бұрын
@@learndataa Thanks a lot and all the best!!!
@eq7169 ай бұрын
thanks. clear!
@learndataa9 ай бұрын
You're welcome!
@apolloandartemis54159 ай бұрын
Thank you! Your videos are very helpful!
@learndataa9 ай бұрын
You are welcome. Glad you found the videos helpful.
@undertaker75239 ай бұрын
Hello, thank you for this series. I have a question. Why do we have 3 different functions we sample our priors from? Is the idea that we're sampling each point 3 times and we're using our functions to generate the 3 values at each point? In practice, do we expect that sampling the same point multiple times will result in different values due to normally distributed noise? Are we capturing the mean of those 3 output values at the sample point and determining the mean of those points?
@learndataa9 ай бұрын
Thanks for watching! (1) Why do we have 3 different functions we sample our priors from? - Because we want to explore different possibilities or hypotheses about the underlying data generating process. Each sampled function represents a possible function that could describe the data. (2) Is the idea that we're sampling each point 3 times and using our functions to generate the 3 values at each point? - Yes, exactly. When we sample three functions from the Gaussian Process prior, we are effectively generating three sets of function values at each point in the input space. These values represent different possible outcomes or predictions for the target variable. (3) In practice, do we expect that sampling the same point multiple times will result in different values due to normally distributed noise? - Yes, that's correct. In Gaussian Process regression, the function values are distributed according to a multivariate Gaussian distribution. This implies that for the same input point, we would expect different function values in general due to the randomness introduced by the Gaussian noise. (4) Are we capturing the mean of those 3 output values at the sample point and determining the mean of those points? - Yes, in practice, we can capture the mean of the sampled function values at each point to estimate the mean function of the Gaussian Process. This mean function represents the expected value or average behavior of the target variable at each point in the input space. Additionally, we can also compute other statistics such as the variance to assess uncertainty in the predictions. Thus, sampling multiple functions from the Gaussian Process prior allows us to explore different hypotheses about the data, and by observing the variations in function values across samples, we can estimate the mean function and assess uncertainty in our predictions. Hope it helps!