Impact Detection Engine (IDE)

We started with a Network Detection and Response (NDR) product that utilizes NetFlow analytics to assess the reputation of IP addresses in all communication flows. Our approach involves leveraging both public community-based Thread Intelligence (TI) sources and commercial ones to raise alarms when a company asset communicates with an Internet IP address that has a poor reputation.

However, our customers reported that they were receiving an overwhelming number of alerts each day, sometimes over 100K. To address this, we developed an Impact Detection Engine (IDE) that acts as our secret sauce in filtering out the noise and low-severity alarms, enabling our clients to focus on the most severe ones.

Let’s take a look at some examples.

SSH brute force attacks

Such events are very common, and often just noise resulting from various types of Internet scanners. In the case of a single TCP session with SSH service, there are two possible scenarios. The first scenario involves attempts to provide an incorrect password, followed by a period of typically 60s or 120s of silence due to the SSH password lockout timeout feature. The second scenario occurs when SSH authentication is successful, and traffic is present continuously.

As presented, it is relatively easy to differentiate between both scenarios by examining the distribution of bytes and packets sent/received over time, paying close attention to their interspacing. When investigating a security incident, our primary focus is on situations where a possible malicious actor has successfully logged in, while disregarding Internet scans.

We have developed an automated solution that enables us to differentiate between both cases, significantly reducing the noise level (by over 95%) and enabling our investigators to concentrate on critical alarms.

We do also look for a change, if for some period of time specific public IP address generated pattern similar to #scenario1 and then at some point changed to #scenario2 – that means that DoS attack and password brute force were succesfull.

Obviously vast majority of such attacks are coming from different types of VPN or TOR proxy nodes. It’s very difficult to determine who was behind that attack in reality. Still we do often recommend to block (blacklist) those IP addresses.

Web application attacks

It’s similar to SSH, although the pattern difference is not that strong anymore. Again the key here is to filter out IP addresses well know of bad reputation which are only scanning the Internet (looking for vulnerable services). We keep trackig of those and constanly update our IDE engine to detect the communication patterns which are important vs non-important.

Usage of ML methods to detect malicious vs non-malicious network patterns

Many cybersecurity products are marketed with a strong emphasis on Machine Learning (ML), despite poor results. We prioritize outcomes over AI/ML marketing and train our ML classifiers using both non-malicious and malicious traffic patterns. We use different models for different types of traffic because a single model that can accurately classify all network patterns did not yield satisfactory results. In addition to ML, we employ various statistical methods, including Bayesian and anomaly detection, to identify uncommon issues. Thus, our IDE engine comprises multiple engines, some with a specific use case and high accuracy rate, others with more generic capabilities and a lower accuracy rate. We continuously improve our engine using heuristics suggested by our investigators, which also involves false positives. Therefore, automated feedback from our customers is critical to the ongoing enhancement of the IDE.

NetFlow and precision

Having precise NetFlow telemetry is critical for the IDE engine. The accuracy depends on the NetFlow active flow timer configuration, which is typically set to 60 seconds. This means that all active flows are updated every 60 seconds. Thus, we compute statistical 60-second profiles for various activities. As most customers use the same active flow timeout, we can reuse these profiles across different customers/tenants.