Having AI systems try to outwit one another could help judge their intentions
Take, for instance, an AI system designed to defend against human or AI hackers. To prevent the system from doing anything harmful or unethical, it may be necessary to challenge it to explain the logic for a particular action. That logic might be too complex for a person to comprehend, so the researchers suggest having another AI debate the wisdom of the action with the first system, using natural language, while the person observes.
Further details appear in a research paper.