Modern IoT (Internet of Things) networks connect many devices (like sensors, cameras, and appliances). These devices communicate by sending flows of data. Most flows are short and routine, but some abnormal flows can run much longer than usual. An unusually long-lasting flow might be a sign of a problem – for example, a cyber-attack or a device malfunctioning. This project uses machine learning to predict how long a network flow will last. By doing so, we can catch unusual network behaviors early and alert the team before they escalate into bigger issues. This proactive approach helps keep critical IoT systems (like smart city infrastructure or hospital sensors) secure and reliable.
By predicting flow_duration
from basic network telemetry in real-time IoT traffic, we can spot unusual resource use early and surface potential attack patterns before they escalate. This enables proactive capacity planning (autoscaling, QoS tuning) and faster security response, reducing downtime and operating costs while keeping connected devices reliable.
In line with SDG 9 (Industry, Innovation & Infrastructure) and SDG 16 (Peace, Justice & Strong Institutions), this approach strengthens digital infrastructure and improves cyber-resilience for services that increasingly depend on IoT.
Impact: Securing IoT networks helps keep critical infrastructure - such as smart cities, healthcare, and energy systems - safe and reliable. Concretely, this means hospital sensor networks remain stable and smart city street lighting is protected from attack-driven disruptions.
Hypothesis. Network flows last longer when there is more forward-side activity and greater directional imbalance. Concretely, higher values of total forward packets, forward payload, forward header size, and a larger forward ↔ backward packet ratio are associated with higher flow_duration.
<aside> 💡
Flow duration simply means how long a connection between two devices lasts. For example, when a smart bulb talks to a mobile app, or when a hacker tries to send many requests, the flow duration tells us the time from the start until the end of that communication.
Short or very long flow durations can give hints about normal behaviour or suspicious activity in the network.
</aside>
The predictions from our model will be used by network operators and security teams, who rely on these insights to keep IoT systems stable and safe. With these predictions, they can decide whether to adjust resources in real time (such as scaling capacity or tuning quality of service) or investigate unusual traffic patterns that might signal an attack. Because IoT networks operate continuously, predictions must be delivered in real time, directly within monitoring systems that oversee devices ranging from smart bulbs to hospital sensors. This predictive approach is more powerful than traditional monitoring because it surfaces anomalies early, before they escalate into costly downtime or security breaches. We measure success not only by technical accuracy (e.g., explaining 83% of flow variation with low error) but also by the practical impact: unusual flows are flagged quickly, critical infrastructure remains stable, and security teams can respond proactively rather than reactively
<aside>
</aside>
The dataset starts with 84 columns. Our goal is to predict the ‘flow_duration’, so we can detect unusual or potentially harmful behaviour in network traffic at an early stage. This can help to signal possible attacks before they fully develop.
During the data cleaning process, we made sure to remove all time-related variables that could cause data leakage. In simple words: some features only become available after the flow duration is finished. If we kept those, the model would not be “predicting” anymore, it would just be “cheating” by using information from the future. By removing these variables, we ensure that the model learns only from the information that would realistically be available in real time.