Intelligence is notoriously hard to define, but when most people (including computer scientists) think about it, they construe it on the model of human intelligence: an information-processing capacity that allows an autonomous agent to act upon the world.
But Michael I. Jordan, the Pehong Chen Distinguished Professor in both the computer science and statistics departments at the University of California, Berkeley, and a Distinguished Amazon Scholar, thinks that that’s too narrow a concept of intelligence.
“Swarms of ants are intelligent, in the sense that they can build ant hills and share food, even though each individual ant is not thinking about hills or sharing,” Jordan says. “Economists have taken this perspective further, with their focus on the tasks accomplished by markets. Accomplishing those tasks is by some definition a reflection of intelligence. A market that brings food into, say, New York every day is an intelligent entity. It's akin to a brain, and it’s important to remember that a brain is a loosely coupled collection of neurons that are each performing relatively simple functions. Analogously, a bunch of loosely coupled decisions made by producers, suppliers, and consumers constitute a market that is a form of intelligence. A grand challenge is to marry this kind of intelligence with the form of intelligence that arises from learning from data.”
Jordan argues that distributed, social intelligence is better suited to meeting human needs than the type of autonomous general intelligence we associate with the Terminator movies or Marvel’s Ultron. By the same token, he says, AI’s goals should be formulated at the level of the collective, not the level of the individual agent.
“A good engineer is supposed to think about the overall goal of the system you’re building,” Jordan says. “If your overall goal is diffuse — create intelligence, and somehow it will solve problems — that's not good enough.
“What machine learning and network data do is bring people together in new ways to share data, to share services with each other, and to create new kinds of markets, new kinds of social collectives. Building systems like that is a perfectly reasonable engineering goal. Real-world examples are easy to find in domains such as transportation, commerce, health care. Those are not best analyzed as some super-intelligence coming in to help you solve problems. Rather, they're best analyzed as, Hey, we're designing a new system that has new kinds of data flows that were never present before and there’s a need to aggregate and integrate those flows in various ways, with the overall goal of serving individuals according to their utilities.”
New signals
At this year’s International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Jordan will elaborate on these ideas in a plenary talk titled “An alternative view on AI: Collaborative learning, incentives, and social welfare”. ICASSP might seem like an odd venue for so expansive a talk, but Jordan argues — again — that that’s only if you rely on an overly restricted definition.
“You can make signal processing very narrow, and then it's, how do you do compression, how do you get high-fidelity recordings, and so on,” he says. “But those are all the engineering challenges of the past. In emerging domains, the notion of what constitutes a signal is broader. Signals are often coming from humans, and they often have semantic content. Moreover, when people interact with an economic relationship in mind, they signal to each other in various ways: What am I willing to pay for this? And what is someone else willing to pay? Markets are full of signals. Machine learning can create new vocabularies for signaling.
“So part of the story here is going to be to say, hey, signal-processing folks, it's not just about the data and the algorithms and the statistics. It's about a broader conception of signals. Signal processing isn’t just about the processing and streaming of bits but about what these bits are being used for and what market forces they can set in motion. I definitely would hope to convince signal-processing people to think ambitiously about what the scope of the field can be.”
Statistical contract theory
One of the tools that Jordan and his Berkeley research group are using to make markets more intelligent is what they call statistical contract theory. Classical contract theory investigates markets with information asymmetries: for instance, a seller doesn’t know how potential buyers value a particular good, but the buyers themselves do.
The goal is to devise a menu of contracts that balances out the asymmetries. An example is tiered-class seating on airplanes: some customers will contract to pay higher fares for more room and better food; some customers will contract to forego those advantages in exchange for lower fares. The seller doesn’t have to know in advance which population is which; the populations are self-selecting.
In statistical contract theory, Jordan explains, the contracts have statistical analyses embedded within them. The example he likes to use is the drug approval process.
“The job of the regulatory agency is to decide which drugs go to market,” Jordan says. “And it's partially a statistical problem: You have a drug candidate, and it may or may not be effective on humans. You don't know a priori. So you do an A/B test. You bring in people, and you either give them the treatment, or you give them a control, and you see if there has been an improvement.
“The problem is that there are more players in this game. The drug candidates are not coming just from nature or from the agency itself. There are these third-party agents, which are the pharmaceutical companies, that are generating drug candidates. They can generate tens of thousands of them, which would be far too expensive to test.
“The agency has no idea whether a candidate is good or bad before they run their clinical trial. But the pharmaceutical company knows a little more. They know how they develop the candidates, and maybe they did some internal testing. So there you have your asymmetry. The agency can’t just ask the pharmaceutical company, Hey, is that candidate good or not? Because the pharmaceutical company is just hoping that it passes the screening and gets onto the market and they make some money.
“The solution is something we call statistical contract theory, and hopefully, it will begin to emerge as a new field. The mathematical ingredients are again menus of options, including license fees, durations of licenses, sizes of the trials, and so on. And every drug company gets to look at that same menu for every possible drug. They make a selection, and then nature reveals an outcome via a clinical trial.
“In the selection process, the drug company is revealing something. The drug company says, hey, on this candidate drug, I know it's really good, so I'm going to take ‘business class’. And now you kind of revealed something to the agency. But the agency doesn't use that information directly; they set up a contract a priori, and you made your selection. We have a new mathematical theory that exactly addresses that kind of design problem and, hopefully, a range of other problems.”
Prediction-powered inference
Another tool that Jordan’s group has been developing is called prediction-powered inference.
“How do I use neural nets not just to make good predictions but to make good confidence intervals?” Jordan says. “The problem is that even if these predictions are very accurate, they still make big errors in some instances, and those can conspire to yield biased confidence intervals. We have this new technique called prediction-powered inference that addresses this problem.
“Classical bias correction would be just that I estimate the bias, and I correct the original estimate for the bias to get a more unbiased estimator. What we're doing is different. We're estimating not the bias but a confidence interval on all the possible biases. And then we're using that confidence interval to do all possible adjustments of the original value to get a confidence interval on the true parameter. So we don't just get a better predictive estimate; we get a whole confidence interval that has a high probability of covering the truth. It is able to use all of these biased predictions from the neural net and nonetheless provide an interval that has a guarantee of covering the truth. It's kind of almost magical that it can be done. But it can.”