Smart cyber adversaries are starting to turn machine learning algorithms against the defence. But adversaries could be frustrated by deliberate cyber deception.
Machine learning is one of the biggest buzzwords in cybersecurity in 2017. But a sufficiently smart adversary can exploit what the machine learning algorithm does, and reduce the quality of decision-making.
“The concern about this is that one might find that an adversary is able to control, in a big-data environment, enough of that data that they can feed you in misdirection,” said Dr Deborah Frincke, head of the Research Directorate (RD) of the US National Security Agency/Central Security Service (NSA/CSS).
Adversarial machine learning, as Frincke called it, is “a thing that we’re starting to see emerge, a bit, in the wild”. It’s a path that we might reasonably believe will continue, she said.
As one example, an organization may decide to use machine learning to develop a so-called “sense of self” of its own networks, and build a self-healing capability on top of that. But what if an attacker gets inside the network or perhaps was even inside the network before the machine learning process started?
“Their behavior now becomes part of the norm. So in a sense, then, what I’m doing is that I’m protecting the insider. That’s a problem,” Frincke said.
“What’s also interesting in the data science, is that if you are using a data-driven algorithm, [that algorithm] is what feeds the machine learning technique that you disseminate. Unless you keep that original data, you are not going to know what biases you built into your machine learning approach.
“You would have no way of that needle in the haystack, because you threw away the haystack, and all that’s left are the weightings and the neural networks and so on.”
Machine learning has other limitations too.
In 2016, for example, Monash University professor Tom Drummond pointed out that neural networks, one of the fundamental approaches to machine learning, can be led astray unless they’re told why they’re wrong.
The classic example of this problem dates back to the 1980s. Neil Fraser tells the story in his article Neural Network Follies from 1998.
The Pentagon was trying to teach a neural network to spot possible threats, such as an enemy tank hiding behind a tree. They trained the neural network with a set of photographs of tanks hiding behind trees, and another set of photographs of trees but no tanks.
But when asked to apply this knowledge, the system failed dismally.
“Eventually someone noticed that in the original set of 200 photos, all the images with tanks had been taken on a cloudy day, while all the images without tanks had been taken on a sunny day,” Fraser wrote.
“The military was now the proud owner of a multi-million dollar mainframe computer that could tell you if it was sunny or not.”
Frincke was speaking at the Australian Cyber Security Centre (ACSC) conference in Canberra on Wednesday. While she did point out the limits of machine learning, she also outlined some defensive strategies that the NSA has found to be effective.
Organizations can tip the cybersecurity balance of power more in their favour by learning to deceive or hide from the adversary, for example.
By its very nature, network defense is asymmetric. That imbalance is usually expressed as the defender having to close off every security vulnerability, while the attacker only has to be right once.
“On the face of it there should be something we should be able to do about that. You’d think there’d be some home-court advantage,” Frincke said.
Traditionally, organizations have tried to make their data systems as efficient as possible. It makes the network more manageable. But from an attacker’s point of view, it’s easy to predict what’s going on in any given system at any given time.
Taking a defensive deception approach, however, means building an excess capacity, and then finding ways to leverage that excess capacity to design in a deceptive or a changing approach. That way, an attacker can’t really tell where the data is.
If you process data in the cloud, then one simple example might be to duplicate your data across many more nodes than you’d normally use, and switch between them.
“If you’re trying to do an integrity attack, changing that data out from under me, you don’t know which of, say, those hundred nodes I’m using. Or I might be looking at a subset of those nodes, say three, and you don’t know which ones I’m using. So you could try to change them all at once [but] that’s a lot harder,” Frincke said.
The RD’s research has shown that this approach increases the attacker’s cognitive load and plays on their cognitive biases.
“We can try to lead them into wrong conclusions. In other words, we’re frustrating them. We’re trying to make them work too hard, to gain ground that they don’t need. And that will make it easier for us to find them,” Frincke said.
“It’s a little bit like the old honeypot [or] honeynet writ large, but designed into the system as an integral part of the way that it works, and not an add-on.”
The downside to defensive deception is that it’s harder to manage.
“Now I have to do more work as a system manager, and as a designer, to be sure I know which one of those three of the hundred I should use, otherwise I could end up shooting myself in the foot, especially if I’ve [been] deploying some kind of misleading changes for the adversary,” Frincke said.