I talked about this a bit in a comment on Whatever (John Scalzi's excellent blog). The upshot is, while text-to-speech sucks today, it won't always. In the future it is not ridiculous to imagine that a computer reading a text will not just speak fluidly with reasonable intonation, but could use different voices for different characters, even the voices of famous actors. (On a side note, I wonder whether public figures, who do not need to be paid for likenesses, will also not need to be paid for voice-likenesses. Could I do an audiobook of, say, "A Confederacy of Dunces" using only voices from, say, the 110th Congress?)
For that matter, it would not be terribly difficult to do something like this already, if one is willing to put some work into it. An audiobook markup language could do the job admirably. It would need to have several things:
- The ability to mark particular characters, and a table of voices to match characters to voice "actors"
- Markup for pauses, emphasis, volume, and speed (like music)
- Either markup for or a glossary for pronunciation