so in the Sun Niagra system we only describe a single execution unit? the MUX that feeds into the decode stage suggests so; which means effectively you are creating an in-order pipeline but with instructions from different threads? am i wrong? so the IPC is still 1? it makes sense nevertheless since you eliminate any logic required for branch prediction/data dependency checking but nevertheless you would have to stall the pipeline for variable latencies however. why not simply define 4 of everything instead?