Army Robots Hunt Tanks in Project Convergence
WASHINGTON: A pair of unprepossessing robots, looking more like militarized golf carts than Terminators, trundle across the Yuma Desert, part of the Army’s Project Convergence exercise on future warfare.
Like human troops, the machines take turns covering each other as they advance. One robot finds a safe spot, stops, and launches the tethered mini-drone it carries to look over the next ridgeline while the other bot advanced; then they switch off.
Their objective — a group of buildings on the Army’s Yuma Proving Ground, a simulated town for urban combat training. As one robot held back to relay communications to its distant human overseers, the other moved into the town – and spotted “enemy” forces. With human approval, the robot opened fire.
Then the robot’s onboard Aided Target Recognition (ATR) algorithms identified another enemy, a T‑72 tank. But this target was too far away for the robot’s built-in weapons to reach. So the bot uploaded the targeting data to the tactical network and – again, with human approval – called in artillery support.
“That’s a huge step, Sydney,” said Brig. Gen. Richard Ross Coffman, the Project Convergence exercise director. “That computer vision… is nascent, but it is working.”
Algorithmic target recognition and computer vision are critical advances over most current military robots, which aren’t truly autonomous but merely remote-controlled: The machine can’t think for itself, it just relays camera feeds back to a human operator, who tells it exactly where to go and what to do.
That approach, called teleoperation, does let you keep the human out of harm’s way, making it good for bomb squads and small-scale scouting. But it’s too slow and labor-intensive to employ on a large scale. If you want to use lots of robots without tying down a lot of people micromanaging them, you need the robots to make some decisions for themselves – although the Army emphasizes that the decision to use of lethal force will always be made by a human.
So Coffman, who oversees the Robotic Combat Vehicle and Optionally Manned Fighting Vehicle programs, turned to the Army’s Artificial Intelligence Task Force at Carnegie Mellon University. “Eight months ago,” he told me, “I gave them the challenge: I want you to go out and sense targets with a robot — and you have to move without using LIDAR.”
LIDAR, which uses low-powered laser beams to detect obstacles, is a common sensor on experimental self-driving cars. But, Coffman noted, because it’s actively emitting laser energy, enemies can easily detect it.
So the robots in the Project Convergence experiment, called “Origin,” relied on passive sensors: cameras. That meant their machine vision algorithms had to be good enough to interpret the visual imagery and deduce the relative locations of potential obstacles, without being able to rely on LIDAR or radar to measure distance and direction precisely. That may seem simple enough to humans, whose eyes and brain benefit from a few hundred million years of evolution, but it’s a radical feat for robots, which still struggle to distinguish, say, a shallow puddle from a dangerously deep pit.
“Just with machine vision, they were able to move from Point A to Point B,” Coffman said. But the Army doesn’t just want robots that can find their way around: It wants them to scout for threats and targets – without a human having to constantly stare at the sensor feed.
That’s where Aided Target Recognition comes in. (ATR also stands for Automated Target Recognition, but the Army doesn’t like the implication that the software would replace human judgment, so it consistently uses Aided instead).
Recognizing targets is another big challenge. Sure, artificial intelligence has gotten scarily good at identifying individual faces in photos posted on social media. But the private sector hasn’t invested nearly as much in, say, telling the difference between an American M1 Abrams tank and a Russian-made T‑72, or between an innocent Toyota pickup and the same truck upgunned as a guerrilla “technical” with a heavy machinegun in the back. And the Army needs to be able to tell enemy from friendly from civilian in messy real-world combat zones – and not only from clear overhead surveillance shots, but from the ground, against troops trained to use camouflage and cover to break up easily recognizable silhouettes.
“Training algorithms to identify vehicles by type, it’s a huge undertaking,” Coffman told me. “We’ve collected and labeled over 3.5 million images” so far to use for training machine-learning algorithms, he said – and that labeling requires trained human analysts to look at each picture and tell the computer what it was: “That’s someone sitting there and going, ‘that’s a T‑72; that’s a BMP,’” etcetera ad nauseam, he said.
But each individual robot or drone doesn’t need to carry those millions of images in its own onboard memory: It just needs the “classifier” algorithms that resulting from running through images through machine-learning systems. Because those algorithms themselves don’t take up a ton of memory, it’s possible to run them on a computer that fits easily on the individual bot.
“We’ve proven we can do that with a tethered or untethered UAV. We’ve proven we can do that with a robot. We’ve proven we can do that on a vehicle,” Coffman said. “We can identify the enemy by type and location.”
“That’s all happening on the edge,” he emphasized. “This isn’t having to go back to some mainframe [to] get processed.”
In other words, the individual robot doesn’t have to constantly transmit real-time, high-res video of everything it sees to some distant human analyst or AI master brain. Sending that much data back and forth is too big a strain on low-bandwidth tactical networks, which are often disrupted by terrain, technical glitches, and enemy jamming. Instead, the robot can identify the potential target itself, with its onboard AI, and just transmit the essential bits – things like the type of vehicles spotted, their numbers and location, and what they’re doing.
“You want to reduce the amount of information that you pass on the network to a tweet, as small as possible, so you’re not clogging the pipes,” Coffman told me.
But before the decision is made to open fire, he emphasized, a human being has to look at the sensor feed long enough to confirm the target and give the order to engage.
“There’s always a human that is looking at the sensor image,” Coffman said. “Then the human decides, ‘yes, I want to prosecute that target.’”
“Could that be done automatically, without a human in the loop?” he said. “Yeah, I think it’s technologically feasible to do that. But the United States Army an ethics-based organization. There will be a human in the loop.”