Most underrated problem in data science. I have seen so many models deployed into production, never delivered any value because of this.
@lokeshjoshi540012 күн бұрын
Should feature creation, such as rolling mean and lagging indicators, also be done separately on train and test data, or can it be done on all historical data?
@prosmartanalytics12 күн бұрын
Good question! When we create features for the first time, we are not always sure if those will prove out to be fruitful. Therefore, to start with, we may create features just for the train set, and check if our models assign them due importance. If so, we can replicate it for the test set separately. However, if one is a domain expert and knows a feature being seen in a certain way is more meaningful e.g. ratios in finance, he/she can create features for the entire data at once. The only thing to be careful about is that at no point test begins to influence the train set.
@lokeshjoshi540012 күн бұрын
@@prosmartanalytics Thanks! Your response in such a short time is very helpful.
@younesgasmi851811 ай бұрын
Undersampling before Splitting the dataset Can lead to data leakage or not?