Asset management review called for RailCorp

Faulty Cisco network switch culprit in delaying, cancelling hundreds of trains

A technical investigation into the signalling issues that caused delays to 40 per cent of the Sydney metro rail network in early April has recommended a complete review of RailCorp asset management protocols to more clearly define refresh cycles for critical equipment.

In a post-mortem report released Friday (PDF), the government department attributed the problems to an eight-year-old Cisco 3550XL network switch forming part of the Advanced Train Running Information Control System (ATRICS) LAN at Sydenham station.

The report indicated intermittent failures in the switch just after 7.30am on 12 April this year caused the entire network to reconfigure itself, ultimately leading to a failure that wasn’t completely solved until after 4pm that day.

The failures ultimately caused 240 trains to be cancelled and 847 trains delayed over most of the rail network at an average waiting time of 27 minutes.

The age of the switch, which was designed to last at least another seven years, was not called into question. However, the report found the failed switch was one of a batch of Cisco routers the manufacturer had warned could fail due to a fault in the power supply capacitor.

The failures were noted up to two months preceding the failure, but were not acted on. In a patch bulleting released on the issue in 2003, the manufacturer suggested the switch be replaced on fail.

“Although ‘replace-on-fail’ may be appropriate for an enterprise network, processes need to be introduced to consider the risk this approach poses in a high criticality application,” the report reads.

Among seven recommendations outlined in the report, the technical investigation indicated a more clearly defined refresh cycle and asset management protocol was required to prevent future accidents. The team also recommended a review of the ATRICS software’s ability to manage fail-over scenarios and the slowness of the network at Sydenham, which prevented engineers from applying fixes more quickly.

The faulty switch was quarantined and analysed as part of the investigation, while fixes were made to discovered issues at both the Sydenham ATRICS network and Revesby.

Follow James Hutchinson on Twitter: @j_hutch

Follow Computerworld Australia on Twitter: @ComputerworldAU

Join the newsletter!

Or

Sign up to gain exclusive access to email subscriptions, event invitations, competitions, giveaways, and much more.

Membership is free, and your security and privacy remain protected. View our privacy policy before signing up.

Error: Please check your email address.

Tags cityrailCisco networkCisco network switch

More about CiscoetworkLAN

Show Comments
[]