Рет қаралды 3,339
Personalization in human-robot interaction (HRI) has been shown to have powerful effects on both users' perception of robots and objective interaction outcomes. Calling a human user by their name, an important signal to communicate understanding the user and memorizing information about them, remains an ongoing challenge in HRI research as typical text-to-speech algorithms struggle correctly pronouncing the numerous names that exist even just in the English language. This paper presents a pipeline for fusing text and audio features to extract and re-use user information like names with the correct pronunciation. We discuss technical guidelines for implementation and remaining challenges