Found this while playing with text-to-speech on Azure and looking for inspiration. Indeed, it's not cheap but the company I work for has the $$$ for it and it's worth considering that the compute time is significantly less - typically a tiny fraction of a second - than the time it takes to play the voice - typically several seconds.