The third Multidisciplinary Conference on Reinforcement Learning and Decision Making took place at the Rackham Graduate School between Sunday and Thursday this week.
Reinforcement learning refers to the relationship between the agent — like a person or robot, for example — and the environment — like a video game or puzzle the agent is trying to complete. The environment represents a certain state of being, while the agent acts upon that state and receives a reward, or response.
Among the event speakers were Ece Kamar, a researcher from Microsoft who specializes in artificial intelligence, Joelle Pineau, an assistant professor of computer science at McGill University and Kent Berridge, James Olds Distinguished University Professor of Psychology and Neuroscience at the University of Michigan.
The days were broken into sessions of several brief presentations from speakers describing their work.
On Tuesday afternoon, Max Kleiman-Weiner, a Ph.D. student in Computational Cognitive Science at MIT, presented “Learning to Cooperate and Compete.”
Prefacing his address with a quote from psychologist Nicholas Humphrey, Kleiman-Weiner related social interactions to games — incorporating elements of game theory in his presentation.
From a research standpoint, games like tic-tac-toe and checkers have effectively been solved and can be modeled by computers, he explained. There are other games where computers can act like humans, but cannot necessarily solve the game, such as in a game of chance.
“It’s these kinds of ad hoc interactions that involve lots of complexity, where you’re cooperatively playing a game, but there might be aspects of competition in that game that can explain a lot of the richness of human cooperative behavior in a lot of that real world contexts that we might want to have machines working with us and at least understanding us, so things like how to negotiate, playground games,” he said. “We want to study cooperation.”
There are models for studying cooperation, one of which is the prisoner's dilemma, which highlights a struggle between what is good for the well-being of both players versus what is more advantageous to a single player. Mathematically, the dilemma points to tension between cooperation and competition but, experimentally, it doesn’t capture several important factors, according to Kleiman-Weiner.
He explained his team of researchers designed a system to overcome obstacles presented in current models of cooperation.
“What we’ve done… to study these questions both in a way where we can build more models and look at human behavior is to take an approach that’s been well studied in multi-agent systems literature,” he said.
The team developed games with naturalistic environments that people can play intuitively, like video games. The games can represent different social situations.
“If we have these programs, we can sort of reconstruct the original social dilemma and think, ‘Well, I have this choice to cooperate or compete, and that corresponds to certain rewards,’ ” he said. “In reality, it’s a lot more challenging than that.”
Different ways of implementing potential plans yield different payoffs, Kleiman-Weiner explained. However, the team aims to build algorithms to play the games.
Difficulties arise when the researchers want to be able to plan at a high level of abstraction — determining cooperation or competition — but need to implement these goals through low-level actions, like moves in a game. They need to be able to determine if certain low-level actions confirm the higher-level goals. Furthermore, they also want to coordinate cooperation across different scenarios, so as to generalize and create coherent plans to tackle certain games.
Overall, the researchers aim to make plans for using generalizations and best-response scenarios for competition and coordination using reinforcement learning.
“The way we’re formalizing this is by thinking about the joint nature that plans as if it can control both of the players,” he said. “We construct this meta-agent or group-agent, and that agent can plan out… and create a final policy over the joint action space.”
The individual players then use the policy to marginalize the other players, as there is not an actual “group-agent.”
The players will also be able to use the provided plans, given their own plans, to infer the abstract, high-level intention of the other players.
Kamar came to discuss her work, in a presentation titled “Directions in Hybrid Intelligence: Complementing A.I. Systems with Human Intelligence.” She talked about humans and machines working together, as well as the role of humans in creating better A.I. systems.
She said despite advances in A.I. systems, there are still problems. She referenced an incident in which a Google system tagged two African American teenagers as gorillas in a photograph.
“Things definitely change from the lab where we spend a lot of time building these systems and testing these systems to the real world,” she said.
Kamar then provided the example of an autonomous vehicle, which many consider an A.I. system. She said, rather, this system is a hybrid system as it depends on interaction between the driver and the car, and incorporates feedback from the driver.
She said humans and machines have comparable abilities and gave an example of a chess game. Humans have not improved much in recent years, in terms of succeeding in the game game, whereas machines have improved. Humans and machines working together have improved significantly, as a hybrid system.
“The focus of the research I’m going to be talking about is really about trying to bring these ideas to development in A.I. systems,” she said. “The idea here is that we can infiltrate human intelligence into the development of A.I. systems in a way that we can build reliable A.I. systems in the future, and by doing this we can overcome the shortcomings of A.I. systems today and make them better for the real world.”
Kamar believes humans are crucial to not only the training, but the execution of A.I. systems.
In her presentation, Kamar discussed how to do better data collection for supervised learning, emphasizing labeling techniques as well as how humans can help to detect blind spots in A.I. systems.
Several presenters were also featured in very brief “Spotlights Sessions,” which took place Monday and Tuesday after the regularly scheduled sessions.
Niranjani Prasad, a Ph.D. student studying computer science at Princeton University, also presented on her work entitled “A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units.”
Prasad explained mechanical ventilation used in place of spontaneous breathing is one of the most common medical interventions in the Intensive Care Unit, with 40 percent of patients being ventilated in any given hour.
She said the timeliness of weaning patients off ventilators can drastically improve health, but is unfortunately poorly understood and unquantified.
“What we’re trying to do here is develop a decision support tool that alerts the clinician when it seems like a patient is stabilized and ready to be taken off the ventilator and recommend interventions accordingly,” she said.
Prasad related that the agent, in this case, is the clinician who observes the patient — the ventilator settings and sedation dosages, for example — and chooses an action accordingly, about which they will receive feedback from the new tool, which is designed to reflect any adverse clinical outcomes.
Monday and Tuesday evening saw poster presentations, in which presenters set up posters outlining their work on the fourth floor of Rackham as attendees and fellow researchers browsed the studies.
Shabnam Hakimi, a postdoctoral research associate at Duke University, discussed her work, “Neural Correlates of Cognitive Control as a Function of Learned Automaticity.”
The project centers on how people control their behavior and how behavior becomes automatic through habits. Hakimi explained there are instances when individuals become very efficient at completing certain tasks, such as driving a car. However, sometimes the unexpected occurs.
“At the same time, there is a chance you might be called on to do something that interferes with that,” she said. “We drive our cars every day and we don’t think about it too much… but you still need to be very aware of something jumping in front of your car. It happens very infrequently but (confronting it) is still an important thing to be good at.”
Hakimi’s work looks at whether people can get practice in overcoming such intervening circumstances.
“I study the brain basis of how that works,” she said. “We have a task here where people learn a set of associations between cues and responses, and they’re just told that every now and then they will have to change their response. They don’t know how frequently, they just know it’ll happen sometimes.”
The project models how individuals learn such information over time and what the brain is doing as that happens.
Though people are very different, Hakimi has found thus far that certain people become more efficient at confronting those infrequent events, while simultaneously becoming efficient at the common events.