Stakeholder report | Machine Learning Regression for proactive attack pattern detection in IoT networks.

Modern IoT (Internet of Things) networks connect many devices (like sensors, cameras, and appliances). These devices communicate by sending flows of data. Most flows are short and routine, but some abnormal flows can run much longer than usual. An unusually long-lasting flow might be a sign of a problem – for example, a cyber-attack or a device malfunctioning. This project uses machine learning to predict how long a network flow will last. By doing so, we can catch unusual network behaviors early and alert the team before they escalate into bigger issues. This proactive approach helps keep critical IoT systems (like smart city infrastructure or hospital sensors) secure and reliable.

By predicting flow_duration from basic network telemetry in real-time IoT traffic, we can spot unusual resource use early and surface potential attack patterns before they escalate. This enables proactive capacity planning (autoscaling, QoS tuning) and faster security response, reducing downtime and operating costs while keeping connected devices reliable.

In line with SDG 9 (Industry, Innovation & Infrastructure) and SDG 16 (Peace, Justice & Strong Institutions), this approach strengthens digital infrastructure and improves cyber-resilience for services that increasingly depend on IoT.

Impact: Securing IoT networks helps keep critical infrastructure - such as smart cities, healthcare, and energy systems - safe and reliable. Concretely, this means hospital sensor networks remain stable and smart city street lighting is protected from attack-driven disruptions.

Hypothesis. Network flows last longer when there is more forward-side activity and greater directional imbalance. Concretely, higher values of total forward packets, forward payload, forward header size, and a larger forward ↔ backward packet ratio are associated with higher flow_duration.

1. Project Goal

Predict Flow Duration: Develop a machine learning model that estimates how long a given IoT network flow will last, based on its early characteristics (such as packet counts and flags).
Detect Anomalies: Use the model’s predictions to automatically flag flows that are predicted to last far longer than normal for their type. Such flows could indicate security threats or system faults.
Improve Response: By identifying odd flows in advance, enable network operators and security teams to respond quickly (e.g. investigating a device or adjusting network resources) before an incident causes downtime or damage.
Real-World Impact: Minimize disruptions in IoT-powered services. For example, catch a misbehaving smart sensor or an attack in progress early, so that smart city lights, medical devices, or other IoT systems continue to run smoothly.

<aside> 💡

Flow duration simply means how long a connection between two devices lasts. For example, when a smart bulb talks to a mobile app, or when a hacker tries to send many requests, the flow duration tells us the time from the start until the end of that communication.

Short or very long flow durations can give hints about normal behaviour or suspicious activity in the network.

</aside>

2. Who, what, when, where, why and how?

The predictions from our model will be used by network operators and security teams, who rely on these insights to keep IoT systems stable and safe. With these predictions, they can decide whether to adjust resources in real time (such as scaling capacity or tuning quality of service) or investigate unusual traffic patterns that might signal an attack. Because IoT networks operate continuously, predictions must be delivered in real time, directly within monitoring systems that oversee devices ranging from smart bulbs to hospital sensors. This predictive approach is more powerful than traditional monitoring because it surfaces anomalies early, before they escalate into costly downtime or security breaches. We measure success not only by technical accuracy (e.g., explaining 83% of flow variation with low error) but also by the practical impact: unusual flows are flagged quickly, critical infrastructure remains stable, and security teams can respond proactively rather than reactively

<aside>

5W+H Framework

Who: Network administrators and IoT security teams.
What: They use the model's predictions to decide whether a connection is suspicious.
When: Predictions are needed in near real-time, during live traffic.
Where: In IoT networks with many connected devices (smart homes, industrial systems).
Why: Manual monitoring is impossible at large scale; the model enables proactive and scalable detection.
How: Success is measured by timely alerts and fewer missed attack patterns (false negatives).

</aside>

3. Cleaning the data

The dataset starts with 84 columns. Our goal is to predict the ‘flow_duration’, so we can detect unusual or potentially harmful behaviour in network traffic at an early stage. This can help to signal possible attacks before they fully develop.

During the data cleaning process, we made sure to remove all time-related variables that could cause data leakage. In simple words: some features only become available after the flow duration is finished. If we kept those, the model would not be “predicting” anymore, it would just be “cheating” by using information from the future. By removing these variables, we ensure that the model learns only from the information that would realistically be available in real time.

Scatter plot of actual vs. predicted flow durations (most points near the diagonal line)

Fig.1

1. Project Goal

2. Who, what, when, where, why and how?

5W+H Framework

3. Cleaning the data

4. Feature engineering