this is not accurate/correct (anymore). there was a point in time when Melbourne was not "the worlds most livable city" and as such we have/had a train network from the 1950s, including the control systems which mostly ran up untill 2009 under connex' tenure post privatisation.
Metro Trains which is a consortium took over the train network at this point and was contractually obligated by the state to upgrade all core and edge technology systems.
I worked at MTM circa 2016/2017 and all the truly legacy operation control systems had been replaced.
there was a heap of funding allocated for fun things like maintaining OCMS/IT systems and updating customer facing/passenger information systems.
> The next challenge was the choice of the operating system for the host platform. We choose Windows XP because the Proprietary Co-processor card at the heart of the system was especially adapted to run under Windows XP. This operating system is not the best ‘choice for real time applications’ therefore special attention was required in the method of integration. Running the Windows operating system in cut down mode by disabling unnecessary apps and keeping it free of all ‘gadgets’ (Office software etc.) increased the system stability sufficiently
> Due to the core software limitation (no source code available) we were compelled to integrate some original PDP-11 computer cards into the final product and this resulted in a hybrid PC platform. There are two very distinct hardware technologies in use in each complete system. The PC environment which is the host platform and the DEC environment which is the ‘legacy’ system. The connection is made using a proprietary Unibus adapter that links the Osprey co-processor to the short Unibus. This Unibus which is totally contained inside the special PC housing allows us to retain some key legacy DEC PDP-11 cards that in turn deceives the software ‘to think’ it is still running in a full DEC system in terms of timing. A marriage of ‘old’ and ‘new’ was borne
One of God's own prototypes. A high-powered mutant of some kind never even considered for mass production. Too weird to live, and too rare to die.
The PDP-11 bus is really simple, it's easy to whip up custom cards with a handful of TTL chips. Also DEC used to sell all the parts separately, including backplanes which had a wirewrap pin on every contact. As a result, integrating a PDP-11 (more usually LSI-11 on QBus) into a system was quite simple - bolt a backplane to the end of a 19" rack of DIN41612 connectors and wirewrap the two together (we did systems with 10 or more motor control cards, each with around 100 TTL chips on).
A particularly nice feature of the bus is that it was memory-mapped, and everything (memory as well as custom hardware) would send an ACK signal back to the processor when addressed; addressing a non-existent memory location would cause a bus trap.
Edit: The bus is simple enough that creating an arduino, say, interface to drive old DEC cards woould not be hard. I'll keep that thought for the next time someone needs their nuclear power station updating [0].
Am I the only one to whom the phrases "windows XP" and "trains" in the same sentence makes your blood chill a little? Also -- what about virtualizing the OS with pcie passthrough and being able to regularly snapshot it / migrate as running to other hosts (if they have enough odd cards, etc)
I know of a hospital still running a Meditech MAGIC system for historical data purposes. The system is a collection of IBM servers, all running Windows Server 2000.
Meditech, however, uses its own network stack on the OS, replacing all Windows networking components with a Meditech MAGIC driver, and Meditech's own software then controls TCP/IP inside its emulation that runs on top of 2000 in usermode. It's amazing how well put together the stack is that this could be (fairly easily) migrated up to a Server 2012-era box* and still work the exact same way.
*(that I know of - I assume they have newer support by now)
To this day there are probably thousands to tens of thousands of SCADA and other industrial control systems around the world running on XP. It's an incredibly stable system when configured properly.
Several nuances that we replicated were probably never used by the production software. For example the order of execution of a floating point instruction trap followed by a stack pointer trap due to the pipelining of the fpu instructions, even though we weren’t executing anything out-of-order. I also produced the correct partial result for the division algorithm in the event of overflow condition.
Test vectors are the biggest secret sauce for any CPU or microcontroller. Word on the street is that the STM32Fxxx test vectors were stolen and that’s why you have perfect replicas being churned out of China, not like the typical cases where your devices fail with the fake chip installed.
Not even close. The STM32Fxxx is a series of microcontrollers. They have tons of internal state which isn't exhaustively covered by test vectors. (For example, a basic 16-bit timer peripheral has roughly 60 bits of internal state, and there certainly aren't going to be 2^60 test vectors to cover that.)
You can for some things, but that wouldn’t be a very efficient result (it’s basically going to give you a massive lookup table). It’s also not that simple since we are not talking about stateless behavior - think more along the lines of interrupts, counters, events, dma, analog input/output, etc. none of which really map into a straightforward input:output relationship, most of which can be used together to some extent to form a gestalt. Even with the test vectors it is a Herculean task.
No, there's too much complexity. What you can do is reverse engineer it more manually using data sheets etc., and then use the test vectors to ensure that you got it right. They're likely to be designed to cover all the interesting corner cases that might cause problems.
I did some security work for a bank that was running a software-emulated HP3000 for their core banking platform to run on. They were so scared of us breaking something that they had us write in the contract we wouldn't touch the production unit whatsoever. In retrospect it was a good idea because we killed one of their dev boxes with a scan and it took a couple of days for them to get it back online.
You definitely can, we use cycle accurate simulators to model logic. These models can be tied in to emulators to give them cycle accurate timing information.
In terms of expressiveness, digital logic can't "do" anything that software can not (including drive real electrical / analog IOs provided the software and the digital logic have interfaces for). Whether you implement the pdp11 in software or logic, either way you still have to create it to take into account all the timing details of the original hardware.
It's just some operations can be done much faster in hardware, but I doubt any part of the pdp11 can't easily be emulated in software on modern CPUs with a lot of timing margin to achieve real-time cycle accuracy. I could be wrong because pdp11 I don't know much about, but for an early simple RISC pipeline of similar performance it is certainly possible.
I assume this is just a solution that works and is supported and they've been using for a long time. It would
As soon as you try to interact with real hardware you'll hit latencies that your simulator can't deal with. Modern processors of the sort you'd use for such a system are designed around intermediary busses that don't provide the sort of timing guarantees legacy hardware needs.
Most hard realtime control systems don't use FPGAs but hard processors. I don't relaly believe those can't achieve as good timing control as a pdp11.
EDIT: From the FPGA vendor:
> Double+ speed Osprey Co-Processor. Occupies one ISA/EISA PC slot. Writable control store, FPGA implementation of PDP-11 architecture with 4 MBytes of tightly coupled, zero wait-state memory. Includes FPJ11® compatible hardware floating point. Performance equivalent twice the DCJ-11® and 100%; of FPJ-11. On-card x86 microprocessor rapidly processes "virtual" I/O instructions. This card is used for Osprey/PC applications where all applicable I/O devices are emulated on the PC by Osprey software. Includes standard environment support for MS-DOS® V6.2.2. Windows/NT® environment support available.
So, highly unlikely that such a system using standard x86 PCs with all their SMBIOS traps and other garbage is able to emulate IO could possibly drive external IOs with better precision and accuracy than a proper real time SOC.
Cycle accuracy may be a requirement in the processor such that legacy software is guaranteed to run the same, but they almost certainly are not doing cycle accurate IOs if they're going across an ISA bus to a DOS or NT PC to be emulated there.
Not on the software I'm involved with, but that models a vast modern system running at GHz rather than a system with some tens of thousands of transistors running at 10-20MHz.
Modeling a system isn't the same as emulating it in real time. You have to match up the timing of your inputs and outputs with the exact right clock cycle that is coming with a hard deadline, and if your emulator does a garbage collect, or some other thing, you'll miss it.
It seems like a super fast system would be easy to do this with, but reality is different.
> Modeling a system isn't the same as emulating it in real time. You have to match up the timing of your inputs and outputs with the exact right clock cycle that is coming with a hard deadline,
Sure, my point with anecdotes is that the hardware system can be modeled in software to cycle accuracy, and it can be emulated with software. Therefore it can be emulated with cycle accuracy (provided you have the CPU power to meet timing).
> and if your emulator does a garbage collect, or some other thing, you'll miss it.
Well if you do anything that blows your timing budget then you miss your deadline by definition on any real time system. Obviously you can use real time garbage collection (or not use it) and control other latencies.
I wasn't suggesting you could just write some code and make a real time cycle accurate emulator without paying attention to real time deadlines.
> Due to the core software limitation (no source code available) we were compelled to integrate some
original PDP-11 computer cards into the final product and this resulted in a hybrid PC platform.
Amazing. I've never been anywhere that lost their source code. Would love to know how it happens.
I have been in two different situations many years ago, neither of which I take responsibility for, ha-ha:
1. Contractor relationship gone sour, turned out they had withheld source code from us. This lead to a dispute regarding the code ownership. Meanwhile we were running the corresponding binaries in the wild. We ended up reverse engineering the missing parts.
2. Source code is centrally managed on a server, hard disk failure, restore not working or backups wholly missing, don't remember. This was the pre-git era, probably Visual SourceSafe, which was entirely centralized.
For me personally, let's just say I feasted my eyes and learned a lot. But I wager that there is still a lot of companies vulnerable in very similar ways today.
Sourcesafe is a 90-era product. PDP-11 was discontinued long before Sourcesafe was even released.
There were version control software around in the 70’s but I do not think that the adoption rate was very high. I think it’s reasonable to assume that they didn’t use it in this case.
> The Programmer's Workbench has proven to be very popular with both management and programmers, and is now used by almost all software projects at the author's installation.
> SCCS is used extensively within Bell Laboratories. For example, at one installation more than 3 million lines of source code are controlled with sccs. These 3 million lines represent 5000 files and the work of approximately 500 programmers. Additionally, sccs is used at many installations outside the Bell System.
So, SCCS runs under Unix. The software package we are talking about here is Ericsson JZA715, written mostly in Pascal with a bit of PDP-11 assembler, running under RSX-11M, it was first put into operation in Oslo in 1979. I don't know what development platform Ericsson was using, but I doubt it was Unix, seems far more likely to have been RSX-11. What version control systems would Ericsson have been using on RSX-11 in the 1970s? I doubt it was SCCS – could it have been something else? Anything? RSX-11 has a versioned file system, could they have just used that? Or did people build more advanced version control systems layered on top of that base?
I played with SCCS on RSX-11M in the 1980s. SCCS is not a client-server architecture that allows you to share file history across the internet. It allows you to keep change history locally with your local source files. If you don't keep viable backups of your local files offline and offsite, it's going to be easy to lose even if your local files contain the entire history of your changes. It was only when we moved to DOS that SCCS became useful, since you could back up and share code using floppies.
When I worked on RSX-11M, and later VMS, we did not rely on SCCS because it had no added value. We relied on the file versioning provided by the file system and rarely kept more than one old version around because disk space was always at a premium. Changes to production code required filling out a paper form and passing it to the test department, who would submit the appropriate batch job to rebuild everything, then copy your changed files into the production directories after the tests pass. It was just like modern CI systems, except for the fact that it was mostly manual and a build-and-test cycle was at least 2 days.
Quoting the 1975 paper: "There are two implementations of SCCS: one for the IBM 370 under the OS and one for the PDP 11 under UNIX."
The original version was for OS/MVT and implemented in SNOBOL4 using the SPITBOL compiler. https://apps.dtic.mil/sti/pdfs/ADA084326.pdf says that SNOBOL4 was available for the RSX-11D by 1979.
I have no clue what Ericcson was using. My inference from the literature is that apparently when software developers have access to a version control system, they used it, even back in the 1970s.
FWIW, I regard versioned file systems as a form of version control.
> One of the more innovative of recent software engineering activities is the concept of a Programmer's Workbench (PWB)
(PWB is the term for the product which included SCCS.)
> A simple generalization of the PWB concept results in the idea of a total software engineering facility, i.e., a Designer's Workbench (DWV) ...
> Using a PDP 11/70 architecture base, and system software which includes RSX-11M and the UNIX operating systems, standard PWB tools, and INGRES data base capabilities ...
Since it looks like UNIX supported the RSX-11M in 1978, why couldn't Ericsson be using Unix on an RSX-11M at the same time in Norway?
I asked Marc Rochkind, he doesn't have a copy of the OS/360 SCCS version any more. He said maybe it survives buried deep in some Bell Labs corporate archive, and made reference to the scene at the end of Raiders of the Lost Ark
A C++ project I inherited had some binaries checked in for rarely-updated utilities - no sense in forcing everyone to build everything from scratch every time, right? Unfortunately, this wasn't done by CI, but by hand - so while those binaries could be built on the original maintainer's machine, some stale & missing source files (which said maintainer forgot to `p4 add`, `p4 reconcile`, or perhaps submit?) meant the only buildable copy of source was on their workstation, without anybody noticing.
...which was then presumably lost/reformatted some time after said dev left (quit? laid off? fired?)
Unable to build said source code, a subsequent developer added a python script to postprocess the output of said binary for their own needs. A hack, but it worked.
Then I got my hands on it. Well, I needed data that was discarded by the binary, so hacking on the python script was a non-starter. Instead, I reverse engineered what the binary was doing by stubbing out enough of the source code to get it building - then compared hex dumps of the outputs and tweaked said source code until my modified source code gave the same output as the checked-in binaries. Then I got rid of the python script by moving the logic into the binary's source code, where it really belonged. Then I made my own changes.
Didn't bother setting up CI though. Was porting a project in a heavily compressed (<1 month!) timeframe, and nobody else needed to touch said binary, and the codebase wasn't going to be used for any future projects (it was pretty terrible, and much time had been spent creating a much better codebase for future projects.) I at least made sure I checked everything in, though!
>> the codebase wasn't going to be used for any future projects
I have heard this many times, especially for the "single use executable for a specific use case". Many years later, it had made its way into production code and is on its 10th iteration.
The software package they were using was an Ericsson product, called "JZA 715", for PDP-11s. Did Ericsson ever give them the source code? Possibly not.
Does Ericsson still have the source code for a PDP-11 software package they sold in the 1970s? Maybe they do, maybe they don't.
Did they ever even ask Ericsson for the source? Possibly, they may have thought that modifying PDP-11 software was too hard–people with skills in doing it are hard to find these days, it is unlikely to be written in C, probably either assembly or some obscure language few have heard of–so even if Ericsson still had the source and was happy to hand it over (free of charge or for a reasonable fee), they might not have thought it worth their while to acquire it, and hence never have asked Ericsson for it.
EDIT: Actually, turns out JZA715 is mostly written in Pascal (so not that obscure a language after all) with some modules in PDP-11 assembly – see page 18 of http://srsv.org.au/wp-content/uploads/2017/02/S-38-1-Jan.pdf – the software ran under RSX-11M
Surely they could use a disassembler and from that (and monitoring the i/o of a live system) infer a test suite from which they could rebuild the API. The new system could be tested to have the same outputs of the live system, and eventually swapped in, fixing bugs and making it more reliable in the process. But I guess that was more expensive than the FPGA emulator
Reverse-engineering of even simple binaries tends to be difficult enough, without the stakes being as high as the program safely running the train control system of a major city. I can't pretend to know what conditions this was originally engineered under, but these kind of systems today are subject to intensely rigorous testing, and sometimes formal analysis. In a safety-critical system it's just not going to be worth doing that. Plus, if the original source code has been misplaced, the original requirement documents are probably gone too.
I sort-of kind-of agree with this. It is certainly why most of these systems aren't worth upgrading. But reasons why it might be desirable to wear this cost is if the original system is already showing signs of bugs, which may be true if it's being operated in new conditions (e.g., interop with new systems, increasing workload), or there are safety features that could be added but not without refactoring. But, yes, if it ain't broke, surely there's no need to fix it
Edit: Also worth pointing out that formal verification was likely not a component of building the original software, and it likely has bugs it's kept since inception. Updating the API to a language more conducive to formal verification might be an option that improves safety over the legacy system
Edit: Apparently they did replace it all eventually
In defense of government, they need to have an audit trail and (in functioning democracies) can be held accountable when things go seriously wrong, so it means they only have a select few orgs to choose from that have passed all the certifications/due diligence. Connex (the company running the show at the time) were a private operator who has lease of the rail system for probably long enough to justify modernizing some of the systems, but maybe not enough to justify a full rewrite. Rewrites of systems with big safety implications, like civil transport, would have a different risk profile to upgrades of less consequential legacy systems
This appears to be quite an old document (2012 judging by some of the dates in the screenshots, and the network map) but fascinating nonetheless, thanks for posting!
I'd be interested in learning how the signalling tech has evolved over the years. There has always been a push by PTV (the government body responsible for public transport in Victoria) to establish a "turn up and go" timetable where trains depart often enough that you shouldn't need to plan ahead. One of the restrictions of the old signalling system was a fairly large minimum distance between trains that prevented frequent service. This caused some congestion, especially in the hub and spoke model of the Melbourne network where many lines converge to share tracks as they approach the CBD.
IIRC high speed signalling is coming with the Melbourne Metro trains (and the existing lines those trains will run through).
Replacing train management systems like this with new ones seems to be all but impossible, my brother in law worked on one in the UK that was designed, built and implemented and the controllers basically said "no thanks". It only got implemented when they rebuilt the UI and control panels to look like the old one.
> This appears to be quite an old document (2012 judging by some of the dates in the screenshots, and the network map) but fascinating nonetheless, thanks for posting!
Yeah, is this factoid - central Melbourne train signals being controlled by an emulated PDP-11 - still true in 2021? I don’t know. Definitely this is known to have been true some years ago but can anyone confirm if it remains true today? The construction of the Melbourne Metro requires new signals, which I assume would be controlled by a brand new control system not the old PDP-11-based system. But if you are going to introduce a new system for one line, wouldn’t it make sense to try to migrate the existing lines to it as well?
Disassemblers have been around forever... recreating a working codebase that compiled to the same executable is definitely doable, however the fact that it runs a piece of infrastructure critical to the public makes the cost of being even one bit off unacceptable. This is the reason for going with emulation. Its a smaller change, and thus a smaller risk.
I know that a company I worked for bid for the platform information system to be revamped from old CRTs that would draw at 1200bps to show next train info across the system, and at that point, the PDP11 was still being used. I got to spend a few hours in there with it trying to debug a serial protocol that, again, was poorly documented, that was being used to render that information.
Yet again I point to Vernor Vinge's A Deepness in the Sky, which describes in some detail the job position of programmer archaeologist (<https://en.wikipedia.org/wiki/Software_archaeology>) thousands of years from now.
Melbourne's new Metro tunnel (and Cranbourne, Pakenham and Sunbury lines) will use high capacity moving block signalling, with a minimum headway of 120 seconds!! [1]
I assume this will be similar to the European Train Control System Level 3 [2] (which does not exist in a cross-vendor spec yet afaik) - it's being built by Rail Systems Alliance, a consortium made up of CPB Contractors, Alstom and Metro Trains Melbourne.
That's the same as on the London Underground, in 2011 they rolled out 30 trains per hour on the Jubilee line and a few years later on the Northern line.
The Jubilee line is 36km long, and each train is 125m [1] so that means there is on average just over a kilometer between each train at 30 trains per hour. That might sound like a lot, but stopping distances of trains are much greater than cars. First of all the trains weigh around 200 tons plus another 70 tons for passengers, and metal on metal doesn't have as much traction as car tires. Even if you could decelerate as fast as a car, everyone standing up is going to end up in a pretty bad shape after that so you need to do it slower. According to someone on StackOverflow passenger trains are designed to decelerate at 1.2m/s2 [2], which at 100km/h gives a stopping distance of 350m.
The Victoria line is up to 36 trains per hour at peak times now. (And the Moscow metro runs 60 trains per hour, though some have questioned whether their practices would meet western safety standards).
Can you point to a source for the 60 tph in Moscow? I know Moscow/Russia/ex Soviet Union is always mentioned in that context, but the figures usually given are more in the 40 – 45 tph range.
The pinch point limiting capacity on urban railways normally isn't the plain line between stations, though, but rather the stations themselves: It's the combination of dwell times (doors opening, passengers alighting and boarding, doors closing) and platform reoccupation time that determines available capacity.
My naive assumption was that as trains operate in a relatively controlled environment a headway much smaller than 120s would've been easy to achieve. Cars regularly drive sub one second from each other! Is it just about minimizing the chance of catastrophe or is there more to it?
Car stopping distances from 60mph/100kph are ~100 metres. 39,888 people died in the US in 2018 from car crashes. [1]
Emergency braking distance of a TGV from 300 kmph (186 mph) is something like 3,500 m [2], carrying ~450 people.
Moving block signalling systems take into account the current positions and velocities of both trains, and generate a maximum speed and braking curve. See [3] for the details, it's pretty fascinating systems design!
Cars are driven based on relative braking distances, i.e. it is assumed that the vehicle in front of you will never come to a sudden stop. So you only need to keep a certain smaller margin to account for your reaction time (and some drivers ignore even that). This mostly works, but sometimes that assumption breaks down and you get something like a mass pile-up for example.
Trains are held to higher safety standards, and so even modern train control system are usually based on absolute braking distance (and even when you want to space trains closer together regardless, as soon you need to throw a set of points between trains at a junction you're back to absolute braking distances, because as long as they aren't set safely towards one direction or the other, a set of points is effectively a stationary obstacle), i.e. there's always enough free space in front of the train to safely brake to a stop.
This limits how closely trains can follow each other, and on urban railways it's normally the combination of dwell times (doors opening, passengers alighting and boarding, doors closing) and platform reoccupation times (wheels starting turning on the first train leaving the station to wheels stopping turning on the next train arriving at the station) that determine your minimum headway. (And for actually usable – as opposed to merely theoretical – capacity you also need to add at least a small amount of extra margin to compensate for possible dwell time variations and other day-to-day occurrences)
While it is true that cars regularly are less than a second apart, safety experts give 2 seconds as the minimum safe distance, and 3 is often recommended. You should be the different person who follows their advice, no matter how many people cut in front of you. Better late than dead, and highways are very dangerous.
People say that the Melbourne city loop was the world's first computerized train line. If that's true, this hardware and software had some serious heritage value. It's a pity that Museums Victoria didn't try harder to preserve it when these replacements were made.
> People say that the Melbourne city loop was the world's first computerized train line. If that's true,
I don't believe that is true. The Melbourne city loop was using the Ericsson JZA 715 system. According to page 9 of [0], Melbourne's implementation of JZA 715 went live in 1982, but Oslo's JZA 715 implementation went live in 1979. Furthermore, JZA 715 was an evolution of earlier Ericsson computer-based rail traffic control systems, going back to the JZA 410 which went live in Stockholm in 1971 and Copenhagen in 1972. So, if these Ericsson systems were in any sense a "world-first" (don't know enough about the history of this technology to say), it looks like the "world-first" happened in Scandinavia, and Melbourne may have simply been "Australia-first" (or "Southern Hemisphere-first" or maybe even "not-Scandinavia-first")
As the old saying goes "don't fix it if it ain't broke"... but I wonder how long this new system will last. The PDP-11 ran for a few decades, but will the replacement outlast it?
They aren't even clones, the surviving models are mostly compatible but independent implementations - of your system somehow manages to gain a dependency on undocumented quirk of a different CPU, they are unlikely to work.
Metro Trains which is a consortium took over the train network at this point and was contractually obligated by the state to upgrade all core and edge technology systems.
I worked at MTM circa 2016/2017 and all the truly legacy operation control systems had been replaced.
there was a heap of funding allocated for fun things like maintaining OCMS/IT systems and updating customer facing/passenger information systems.