hey sir, your videos are very informative, i have got so much knowledge about parallel programming but i have question to solve which im stuck in , The problem is , I have for loop which has sequentail opencv matrix operations including Multiplication, Transpose, Inverse the size of the matrices are 3x4, 4x3, 3x3 3x1 but the for loop should run for a long range around 4000000 with a changing vector value which is used in one of the Matrix operations. It will be realy helpful if a get a idea about how to approch to this, i tried with CUDA, OpenMP, TBB but im ending up with some errors. Hope ill get a solution for this from Nick By CoffeeBeforeArch.