Microsoft Research Develops Ai That Learns Through Positive Human Feedback

Microsoft Research wants to integrate the concept of positive feedback into machine learning models. Such a training technique would help AI build confidence and work more effectively to fulfilling a goal. According to the research team running the project, learning through reinforcement is often used to achieve a goal through policy rewards. However, these rewards are often too broad so intrinsic rewards are more effective. These rewards depend on the task at hand and can more accurately highlight success in achieving a goal. “We present a novel approach leveraging a task-independent intrinsic reward function trained on spontaneous smile behavior that captures positive affect.” Looking for such an intrinsic approach in AI, Microsoft Research built a framework that is motivated by human feedback. For example, it is motivated by affects like happiness and a smile. This is possible through a computer vision platform.

Reward System

Through the reward system, the framework predicts responses like a human smile and learns it as a positive feedback towards a general policy. “Here we were not attempting to mimic affective processes, but rather to show that functions trained on affect like signals can lead to improved performance,” wrote the coauthors of the paper. “In summary, we argue that such an intrinsically motivated learning framework inspired by affective mechanisms can be effective in increasing the coverage during exploration, decreasing the number catastrophic failures, and that the garnered experiences can help us learn general representations for solving tasks including depth estimation, scene segmentation, and sketch-to-image translation.”

Microsoft Research Develops AI that Learns Through Positive Human Feedback - 33

Reward System#

Reward System