Departure times in public transport seem to be less determined than we wish them to be. Not only can a bus be late, it can also be early. Our impression was that some bus and tram lines in Berlin were prone to one or the other in a systematic manner, the latter being particularly inapprehensible:
Firstly, a delay makes us wait exactly the time span of the delay, while an early departure makes us wait the whole time span until the next bus arrives (headway) which is considerably longer for a lot of cases. And secondly, while we can think of a lot of reasons for being late, the only excuse for leaving early is to anticipate and trying to neutralise a future delay, which is an indication of bad scheduling.
The quality report of the Berlin transportation authority defines a departure as "on time" if it is not more than 90 seconds early and not more than 210 seconds late.Ekki: "When I missed a 2 or 3 minutes early tram at the same stop more often than I found it to be late or on time, I started to wonder how I could investigate the systematics of early departures."
Our plan was to retrieve the actual departure times and visualise the respective statistics for each stop of a particular line (in both directions) and a given time frame: how many buses/trams left in the specified time frame, and how early or late they were.
The transportation authority offers an API for public transport data, but this interface does not provide any delay data. But they do publish expected departure times real-time on the internet for each stop of all bus and tram lines with minute accuracy. A stop is subdivided into posts that represent the directions in which buses and trams leave the stop.
To retrieve the real departure times of all buses/trams of a line, we scraped the real-time data of all posts of that line every minute over a period of several days. Each result is a list of the expected departure times of the oncoming buses/trams at this particular post, the topmost being the most imminent departure.
We then matched the result lists in a way that we could isolate two different kinds of events:
A single departure times that were appended at the bottom of the list from one minute to the next
B single departure times that were removed from the top of the list from one minute to the next
We took the departure time from an event B as the best estimate for the real departure time. We then had to find the matching scheduled departure time.
The fragmentary nature of the provided realtime data made it difficult to match the resulting list of these B events to a complete list of scheduled departures.
We observed that the first estimate of the departure time (event A) of a vehicle's trip is actually the scheduled time when the trip in question has not started yet*. Therefore, we only had to trace back each estimated departure time to the event B when it had been freshly appended at the bottom of the list.
*However, we found that for lines with long end-to-end trip times, this is not applicable for a considerate amount of cases.
This gave us consecutive pairs of scheduled and real departure times for every post of the bus/tram line throughout the time frame of the data scraping.
The data analysis is presented in an iPad app. The data is visualised in a zoomable and draggable map. One bus or tram line is presented at a time.
The standard view for all stops reveals a doughnut chart containing the fractions of on-time departures, late departures, early departures, and the late and early departures considered on-time by the BVG quality standards.
There are various stories which you can read out of the data. Here are some examples:
Early departures increase reaching the end of a line. Maybe the drivers want to add this time to their break instead of waiting at stops.
There are a lot of delays on Friday nights at Oranienstraße - one of Berlins best known partymiles.
Delays accumulate in areas with frequent bus stops close together.
We are currently collecting real-time data from more bus and tram lines.
The described methodical problems in the existing retrieval approach for matching scheduled and real departure times need to be addressed. We have to find a way to match the real departure times with certified schedule times.
The open data community is negotiating an API with the local and regional transportation authorities that may at some point contain comprehensive real-time traffic data. We will abandon the existing data retrieval approach in favour of that data as soon as it is available.
From the beginning of the project, we had plans to visualise individual vehicles in a preferrably real-time scenario. Once we have the real-time data, this feature will be implementable as well. It would give BVG a more precise analysis tool, moreover it would broaden the project's functionality from a mere expert tool to a customer app providing more precise journey planning.
Nevertheless, this will raise some ethical questions about individual rights of BVG employees: Could the employer use that feature to surveil drivers more effectively than with the existing real-time data provided? Can drivers be held responsible for early departures? What are the reasons for early departures? We have to deal with these questions before we can implement the individual vehicles feature.