Part 1 in a (perhaps?) series of posts analyzing Capital Bikeshare usage data. This post focuses on system-level usage by a few dimensions. Check out parts 2, 3, 4, 5, 6, 7 (maps of travel patterns), 8, 9 (weather).
Capital Bikeshare just crowd-sourced a huge dataset about the usage and growth of their system. As an early Bikeshare skeptic but now full-on evangelist, member, and big nerd, I decided to mess around with the data myself. I’ve tried some basic, system-level graphs to visualize performance over time. Here goes.
Data Preparation: The dataset totals about 1.34 million anonymous trip records over 15 months. The data starts when the system was in its infancy in the Fall 2010, to its explosion last summer, up through the end of 2011. I didn’t do the cleaning that Corey H. did, where he found and removed about 1.5% of the records as obviously erroneous, most notably trips that started and ended at the same station within 60 seconds – probably the result of a stalled/erroneous transaction or something. But nevertheless, I jumped in, ignoring the 1.5% noise. I pulled it into Microsoft Access and began querying.
Trips Over Time: First, a basic handle on system growth. The system began Fall 2010, and really took off last summer. The tourists (casual users) love it, it’s true, but regular users are really the core ridership – registered users are about 80% of the triip-making in an average month, and almost always 70% of the trips or more. Although casual users are only 20% of the riders, others have found they tend to pay usage fees a lot more than regular riders.
Here’s the same graph, only in percentage terms:
Trips By Day Of Week: Okay, so the system has grown over time, and although casual users like it more in the spring and summer, registered users are still about 80% of ridership. When during the week do people use it? On the weekends, or weekdays? On Tuesdays, or Fridays? Here’s all trips, by day of week (P.S. I normalized this by the number of each day that occurred over the 15 months, for math reasons and to make the units intuitive).
Trips by Day of Week, by User Type and Season: So, maybe I’m missing something by aggregating all 15 months together. After all, the system grew significantly over that time, and usage probably varies a lot by season. So, I decided to focus just on 2011 to exclude the 2010 growing pains, and called January-March “winter,” April-June “spring,” July-September “fall,” and October-December “
winter fall.” It ain’t perfect (June feels sorta summery, e.g.), but it’ll work as a proxy. Here’s trips by day of week, but by season, 2011 only:
A few observations:
- Winter usage is lower than spring/summer; no surprise there
- Fall usage (Oct, Nov, Dec) is strangely lower that spring, but that could be because Cherry Blossoms push up usage in the spring. Or that I arbitrarily put December in that line.
- There’s a small “Friday bump” last winter and fall – any ideas what’s going on there?
- No real huge “weekend bumps,” even in the high season. That’s surprising to me – I figured weekends would be much higher. But it seems that in terms of number of total trips, the commuting patterns during the week are strong enough to offset any out-of-towner weekend use
Why doesn’t ridership from tourists move the needle of system-level ridership more on the weekends? Aren’t they strong on the weekends? (Chances are good that the station choices change geographically on the weekends, but I’m still at the system level.) Turns out, yes – the share of rides in the system from casual users nearly doubles on the weekends, even in the winter. But core users dominate ridership more in the winter than the summer. Here’s how the “casual user rate” changes by day of week, by season:
There seems to be a “Monday bump” in casual users – any ideas what’s happening here, or how I could dig deeper?
Trips by Half-Hour Interval: Okay, when during the day do people ride? How do the patterns change in the morning, afternoon, and night? I coded each trip into a half-hour interval based on when the trip began, and plotted all days, by season:
So, it looks pretty strongly oriented towards peak commuting times, with the afternoon peak even higher than the morning. But, realizing that a Sunday at 8am probably looks a bit different from a Tuesday at 8am, I broke this up into to graphs, for weekday and weekend, and threw in user type as well:
Some observations on these:
- Ridership patterns during the weekday are strongly commuter-oriented, with major peaks in the morning and afternoon rushes
- The afternoon rush is pretty consistently stronger than the morning rush. There’s an “afternoon surge” where usage from about 5pm to 7:30pm or so is more intense, and lasts longer, than the morning peak.
- There’s a“mid-morning lull”driven by regular users, all seasons. Ridership drops between 9:30am and noon, and then recovers for lunch, and thenstays strong until the afternoon rush takes over. There’s no discernible lunch rush that dies off at 1:30pm – the lunch “bounce” persists straight through til evening. Do people take Bikeshare to afternoon meetings, but not morning meetings? Or is it afternoon errands? Or does the lunch rush blend into the early-bird commute?
- Casual users are strongest around between about 10am and 7pm on weekends, where they make up nearly half of all rides.
- Contrast that to weekend rush hours, where registered users are 90% of all rides.
That’s about all I can handle at this point. What do you think? What am I missing? What other analysis would you like to see? (I’m still at the system-level stage; station- and flow-level analysis may/will follow).