Providing resilience as the goalposts move

Author: Ian Jones
Day: Aspect Day One
Session: Systems

Since the 1850s signalling has had the primary aim of preventing
trains colliding. Mechanical levers have now been replaced by
electronics, software, secure digital telecommunications,
sophisticated systems engineering, and increased automation to
enhance safety and increase efficiency. That quantum leap in
technology has brought new challenges, particularly to keep the
railway running. So what’s changed? Compared to the Victorian
railway there are more systems to go wrong, fault-find and repair,
and for maintainers and operators to understand. Yet system
safety is at the highest level ever. Reliability and, more
importantly, availability are at high levels, but the challenge to the
industry to meet stakeholders’ expectations for resilience is also
at an all-time high. How do we manage all this complexity? The
industry is investing in improved methods of testing. System
engineering gives us clearly defined requirements that can be
verified and validated during the product lifecycle, however it is
important to make sure that systems not only do what they
should, but that they don’t do what they shouldn’t! Following the
aerospace model of testing to destruction has delivered benefits
for projects such as London Underground’s Victoria Line, allowing
us to understand what will ultimately lead to system failure. This
has offered significant mitigation when systems have pushed
beyond their original design limits, in the Victoria Line example
upgrading the railway to support a 36 train per hour timetable.
How do we prove that it’s going to work before we put it on the
railway? Schemes including Network Rail’s Thameslink benefited
from the extensive use of testing rigs allowing system
components to be brought together and tested off-site. Interfaces
have been defined, proven and tested using target hardware. The
use of technologies like ‘digital twins’ offers a further progression
to creating resilient systems. Where one organisation is delivering
two large parts of the system such as the signalling and the
trains, there can be huge opportunities for reducing the risk in
managing, integrating and delivering a homogeneous, integrated
solution. The delivery of the Riyadh Metro System took full
advantage of integrated train and signalling tested on the
Wildenrath Test circuit before deployment. Designing for
resilience. We can learn from the original mechanical interlockings
which had no duplication built-in, instead having failure modes
where functionality would be degraded without stopping trains.
Such ‘graceful degradation’ needs to be considered to avoid
blindly designing for availability through providing redundancy.
The design of highly reliable systems, rather than duplicate
systems, would take up less space and reduce vehicle weight.
Continuous change. Changing technology has introduced new
threats to safety and availability, most obviously cyber-security.
The underlying technology has developed extremely quickly, and
many major railway schemes are now dependent upon complex
networks based on commercial systems which we now need to
guard from cyber threats.