I have data samples, where each sample is represented by 200 x 6 (time steps x features) numeric time series, while the label for each sample is 4 x 1 numeric vector. I tried to train multitask self-attention network (like transformer), but all it gives me is (roughly) an average 4 x 1 vector of all the training samples. Can you please make video where we train transformer (or LSTM with attention ) for (numeric time) series to (numeric) vector prediction model ?