User:Graham.Fountain/Congestion prevention

Congestion Prevention[edit]

Congestion prevention involves action taken to positively and proactively prevent congestion, e.g. in a packet switched network. Congestion prevention may, for example, be necessary in networks supporting firm real-time computing systems, such as those embedded in avionic or industrial control systems. It may be done in order to make it possible to prove that traffic tranported over the network will always meet its real-time requirements for reliable delivery within a deadline. This proof of reliable, timely delivery may be necessary where the operation of the system, and thus the data, is safety-critical or otherwise safety related.

Requirements[edit]

Transport layer protocols, such as theTransmission Control Protocol (TCP), are only reliable in the sense that they deliver the data or notify its source of failure to do so. Since this notification of delivery failure to the source does not meet the real-time requirements for reliable delivery, this is not sufficient for critical data, where it is necessary to prove that this critical traffic will be delivered, to some specific probability.

Delivery of data is, in part at least, dependent on the actions of the other equipment connected to the network. This equipment may, e.g. due to a fault or actions of malware, transmit more traffic than the network, or some part of it, has the capacity to transport, i.e. cause congestion. This will cause losses or excessive delays to the critical data and probably cause it to fail its requirements for reliable, timely delivery. Hence, some means is necessary to ensure that other equipment cannot transmit more than is expected of it or that, should it do so, this excess cannot adversely affect the reliability or timeliness of the critical traffic.

While self-limitation of the transmissions by the other equipment can ensure that critical transport will be reliable and timely, this only works as long as this equipment co-operates. Hence, in conditions of faults or where there may be malware in this other equipment, the means of ensuring that it cannot adversely affect the reliability or timeliness of the critical traffic has to be implemented in the network.

To ensure that faults and failures in the network cannot also cause loss or excess delay, etc. it is also necessary to provide multiple paths between the source and destination of critical traffic. These paths must also be complete, i.e. use separate network interfaces, physical media, and network switches. Hence, multiple redundant networks are required. Moreover, since reliable, timely delivery has to be maintained, the sources of the data must transmit multiple copies of the critical data, and the destinations must consolidate this data on arrival, e.g. accept the first and discard any subsequent copies.

Implementations[edit]

The Asynchronous Transfer Mode (ATM), Avionics Full-Duplex Switched Ethernet (AFDX), and TTEthernet protocols are all capable of actively limiting traffic levels within the network, as well as at the traffic sources. These limits can then be used to show that there will or can be no congestion in the network and thus ensure that traffic is transported reliably (tolerant of faults in the supported system) and in a timely manner (meets defined deadlines). Since these limits are applied within the network, transport between two end systems is not significantly affected if one or more of the other connected end systems exceed their expected transmission levels, either due to a fault or malicious action. As a result, if the limits are set and applied correctly, it is possible to prove that there can be no emergent properties of the network in any unexpected circumstance, such as a fault or failure condition, e.g. TCP global synchronization or congestion collapse.

ATM and AFDX limit traffic by bandwidths and jitter. In the case of ATM, customer-premises equipment generally uses traffic shaping and UPC and NPC within the network employ Generic Cell Rate Algorithm (GCRA) traffic policing to limit traffic on a specific VCC or VPC to a traffic contract. In AFDX, control of the Bandwidth Allocation Gap (BAG) is employed at the sources on a per VLink basis, as a form of traffic shaping. Maximum jitter is calculated from the parameters of the Vlinks that a given end system is the source of. Token bucket traffic policing is applied by the network switches, again to limit the traffic to specified levels. Traffic shaping in the end systems then ensures that this traffic will not exceed the limits imposed within the networks and so will not be discarded by them.

Various methods have been proposed for using the bandwidth and jitter limits applied to the traffic to prove that they will not or cannot become congested, i.e. that the shared resource in the network, e.g. the buffers or queues used in switches and routers to allow multiplexing at their outputs, will not be overloaded; for example, from the set of Vlinks or VCC/VPCs routed through a network resource and the individual bandwidth limits of the Vlinks and VCC/VPCs. End-to-end delays through the network can also be bounded, based on the delays caused by the shared resources, e.g. due to multiplexing the bounded connections, which may also be predicted from the set of Vlinks or VCC/VPCs and their individual bandwidth limits. There will still be losses due to, e.g., the bit error rates (BERs) of the components of the physical layer. However, these losses will generally be low and stochastic, and may be tolerable or compensated for by other means.

TTEthernet operates in the time domain for time triggered messages (rate constrained messages are, essentially, the same as AFDX): Time-triggered messages are synchronously scheduled and sent over the network at predefined times. The switches then check that each message is transmitted at its expected time. It is then possible to show that congestion cannot occur, e.g. because all transfers that use a common network resource do so at different times, and so never contend. Similarly, with no delays due to contention or multiplexing, it is possible to show that network delays are bounded. Again, there will still be losses due to the BERs of physical layer components. However, for the switches to be able to determine that the transmissions are correctly timed, the end systems and the switches must have a closely synchronized view of time, and the end systems must transmit to a schedule that is predictable, e.g. a cyclic executive.

Where the limits applied do prevent congestion, the BERs are low enough that traffic is delivered with sufficient reliability, and the maximum delays are within requirements, e.g. deadlines, transport between any two equipment connected to the network will be tolerant of faults in any other connected equipment. In all cases, network redundancy, e.g. fully dual redundant networks, can also be used to provide tolerance of network faults. This can make the transport of data tolerant of all faults outside its source and destination. The use of fully redundant networks will also reduce, but not entirely eliminate, losses due to errors in the physical layers, since both copies of a message/frame must contain an error to prevent correct delivery. Hence, a fully dual redundant network will approximately square the loss rate, e.g. from 10exp-5 to 10exp-10, etc.

Proposals have been made for how some of these methods may be applied outside such private networks. However, such methods currently appear to require modification to the networks involved, at least at their peripheries – e.g. equivalent to ATM’s UPC/NPC. Whilst TTEthernet and AFDX equipments may be integrated with IEEE802.3 Ethernet networks, in general, the end systems using these protocols require specific hardware supporting the protocols or additional processing to emulate the differences from Ethernet.

Implications for the security of data in transport[edit]

If the critical traffic is constrained such that it meets its real-time requirements, then its availability is also maintained in transport. This will also require that the data is constrained so that it is only transported by the network between its legitimate source and destination; otherwise data could affect parts of the network where it is not expected or allowed for. Hence, its confidentiality and integrity in transport across the network will also be preserved.

Where the data is constrained by the network, this preservation of the data security will depend on the components of the network. Hence, it will be necessary to ensure that these network components themselves meet security requirements, i.e. themselves contain no malware, are configured as required, and have acceptable failure rates. However, as the number of switches will normally be significantly (an order of magnitude) less than the number of components (equipment) connected to the network, this should be a less difficult task than securing the equipment and ensuring that their combined failure rates are acceptable.