Posted by: jdantos | January 17, 2012

Capital Bikeshare Data, Part 1

Part 1 in a (perhaps?) series of posts analyzing Capital Bikeshare usage data. This post focuses on system-level usage by a few dimensions. Check out parts 2, 3, 4, 5, 6, 7 (maps of travel patterns), 8, 9 (weather).

Capital Bikeshare just crowd-sourced a huge dataset about the usage and growth of their system. As an early Bikeshare skeptic but now full-on evangelist, member, and big nerd, I decided to mess around with the data myself. I’ve tried some basic, system-level graphs to visualize performance over time.  Here goes.

Data Preparation: The dataset totals about 1.34 million anonymous trip records over 15 months.  The data starts when the system was in its infancy in the Fall 2010, to its explosion last summer, up through the end of 2011. I didn’t do the cleaning that Corey H. did, where he found and removed about 1.5% of the records as obviously erroneous, most notably trips that started and ended at the same station within 60 seconds – probably the result of a stalled/erroneous transaction or something.  But nevertheless, I jumped in, ignoring the 1.5% noise. I pulled it into Microsoft Access and began querying.

Trips Over Time:  First, a basic handle on system growth.  The system began Fall 2010, and really took off last summer.  The tourists (casual users) love it, it’s true, but regular users are really the core ridership – registered users are about 80% of the triip-making in an average month, and almost always 70% of the trips or more.  Although casual users are only 20% of the riders, others have found they tend to pay usage fees a lot more than regular riders.

Capital Bikeshare Trips by Month by User Type

Capital Bikeshare Trips by Month, by User Type. The system has grown substantially since launching in Fall 2010, and casual users are typically 20% of ridership

Here’s the same graph, only in percentage terms:

Capital Bikeshare Trips by Month by User Type, in Percentage Terms. Registered users are the system’s core (~80%) ridership, even in the summer.

Trips By Day Of Week: Okay, so the system has grown over time, and although casual users like it more in the spring and summer, registered users are still about 80% of ridership.  When during the week do people use it? On the weekends, or weekdays? On Tuesdays, or Fridays? Here’s all trips, by day of week (P.S. I normalized this by the number of each day that occurred over the 15 months, for math reasons and to make the units intuitive).

Capital Bikeshare Trips per Day, All Data

Capital Bikeshare Trips per Day, All Data. Not a whole lot of obvious differences here?

Trips by Day of Week, by User Type and Season: So, maybe I’m missing something by aggregating all 15 months together.  After all, the system grew significantly over that time, and usage probably varies a lot by season. So, I decided to focus just on 2011 to exclude the 2010 growing pains, and called January-March “winter,” April-June “spring,” July-September “fall,” and October-December “winter fall.”  It ain’t perfect (June feels sorta summery, e.g.), but it’ll work as a proxy.  Here’s trips by day of week, but by season, 2011 only:

Capital Bikeshare Trips, by Day of Week, by Season

Capital Bikeshare Trips, by Day of Week, by Season

A few observations:

  • Winter usage is lower than spring/summer; no surprise there
  • Fall usage (Oct, Nov, Dec) is strangely lower that spring, but that could be because Cherry Blossoms push up usage in the spring. Or that I arbitrarily put December in that line.
  • There’s a small “Friday bump” last winter and fall – any ideas what’s going on there?
  • No real huge “weekend bumps,” even in the high season.  That’s surprising to me – I figured weekends would be much higher.  But it seems that in terms of number of total trips, the commuting patterns during the week are strong enough to offset any out-of-towner weekend use

Why doesn’t ridership from tourists move the needle of system-level ridership more on the weekends? Aren’t they strong on the weekends?  (Chances are good that the station choices change geographically on the weekends, but I’m still at the system level.)  Turns out, yes – the share of rides in the system from casual users nearly doubles on the weekends, even in the winter.  But core users dominate ridership more in the winter than the summer.  Here’s how the “casual user rate” changes by day of week, by season:

Capital Bikeshare Trips by Day of Week by Season by User Type

Capital Bikeshare Trips by Day of Week by Season by User Type. The percent of rides from casual users nearly doubles on the weekends, even in winter. But this doesn’t change overall ridership levels vs. weekdays a whole lot.

There seems to be a “Monday bump” in casual users – any ideas what’s happening here, or how I could dig deeper?

Trips by Half-Hour Interval: Okay, when during the day do people ride? How do the patterns change in the morning, afternoon, and night?  I coded each trip into a half-hour interval based on when the trip began, and plotted all days, by season:

Capital Bikeshare Trips by Time of Day by Season

Capital Bikeshare Trips by Time of Day, by Season (click for larger)

So, it looks pretty strongly oriented towards peak commuting times, with the afternoon peak even higher than the morning.  But, realizing that a Sunday at 8am probably looks a bit different from a Tuesday at 8am, I broke this up into to graphs, for weekday and weekend, and threw in user type as well:

Capital Bikeshare Trips by Half-Hour by User Type-Weekdays

Capital Bikeshare Trips by Half-Hour by User Type – Weekdays

Capital Bikeshare Trips by Half-Hour by User Type - Weekends

Capital Bikeshare Trips by Half-Hour by User Type – Weekends

Some observations on these:

  • Ridership patterns during the weekday are strongly commuter-oriented, with major peaks in the morning and afternoon rushes
  • The afternoon rush is pretty consistently stronger than the morning rush. There’s an “afternoon surge” where usage from about 5pm to 7:30pm or so is more intense, and lasts longer, than the morning peak.
  • There’s a“mid-morning lull”driven by regular users, all seasons.  Ridership drops between 9:30am and noon, and then recovers for lunch, and thenstays strong until the afternoon rush takes over.  There’s no discernible lunch rush that dies off at 1:30pm – the lunch “bounce” persists straight through til evening.  Do people take Bikeshare to afternoon meetings, but not morning meetings? Or is it afternoon errands? Or does the lunch rush blend into the early-bird commute?
  • Casual users are strongest around between about 10am and 7pm on weekends, where they make up nearly half of all rides.
  • Contrast that to weekend rush hours, where registered users are 90% of all rides.

That’s about all I can handle at this point.  What do you think?  What am I missing? What other analysis would you like to see?  (I’m still at the system-level stage; station- and flow-level analysis may/will follow).

Advertisements

Responses

  1. Excellent work.

  2. As to why the casual user increase on weekends doesn’t push overall ridership up, I think some residents just give up on using bikeshare on the weekends. I know most of the time if I want to use a bike during the day on a weekend, I can’t get one. It seems to me that on the weekend the bikes tend to disappear from the stations in the U Street/Columbia Heights/Adams Morgan area.

    I’ll grab a bike if the station is on the way to where I’m going, but I won’t walk out of my way to go to a station with one bike docked.

    • Very true – we could just be observing the system more or less at capacity, not a true level of demand.

      This is one reason I’m intrigued by CaBi’s idea to try a “bottomless dock” someday, to get a feel for purely unconstrained demand.

  3. I looked at one of the raw data files and couldn’t find any entries to or from the station at Rhode island ave & 4th st NE. Am I missing something?

  4. “There seems to be a “Monday bump” in casual users – any ideas what’s happening here, or how I could dig deeper?”

    I expect the reason for the Monday bump is due to airline pricing. Mondays are a low demand day for airlines since business travellers prefer to go on trips starting on Tuesdays (so they can be in the office on Monday to prepare). To attract the leisure crowd from flying on Sundays (which are relatively high demand), the airlines frequently offer lower prices than what is available on weekends if one is willing to travel at non-peak times.

    You will also see a small spike because of Monday holidays.

    Great job BTW!

    • Good point Kathy – there is a small uptick in the “percentage casual” on Mondays, which is helping make the “Monday bump” happen. Across all seasons too, strangely. And you’re right, a few holidays do fall on Monday…

      The next step would be to look at what stations, or station-to-station pairs, have abnormally high usage on Mondays.

      And what time of day. It could be that on 3-day weekends, CaBi could pick up late Sunday night (actually Monday morning) sliding by one day. There is noticeable use in the midnight-2,3am range on weekends.

      Want to take it on? 🙂

      • I would think the Monday bump is caused by the usual Monday holidays, and also the fact that in 2011 July 4 was on a Monday, adding another summer holiday on that day.

  5. I suspect the lower use in the fall might be caused by the fact it starts getting dark early by mid September. I and many other riders do not like to ride in the dark.

  6. Wow, this makes me sorry I’m not a math nerd. The numbers! The graphs!

    Thanks for posting this. Very interesting to see the usage patterns!

  7. […] use the Bikeshare system. So far, JDAntos has assembled two posts, one that deals with various facets of Bikeshare use, and another that discusses CaBi in terms of trip duration. Thanks for helping us make sense of […]

  8. Jdantos,
    I’d been meaning to play around with the CaBi data and seeing both of your posts got me inspired.

    I don’t want to hijack your post, but figured I’d share a heatmap I did of ridership through the year. Among other things, I wanted to see how sunrise & sunset hours affected ridership (if at all). For sunrise, you see more people wait until after the sun is up in the winter months than fall with roughly the same trend for sunset.
    I just PDFd the images for now: http://goo.gl/pQQaU

    Separately, I want to gauge how hourly temps affect number of rides started and duration of rides (and how it affects Casual v. Member riders). You can see on the heatmap that ridership drops off on rainy days (the week after Labor Day, for example) but I wonder what rider tolerance is for really hot or cold weather.
    I have the data sets, just haven’t put everything together yet – maybe this weekend.

    Great work, by the way.

    • Bilsko – WOW! That is great stuff! What software do you use to create those visuals? I am seriously impressed.

      • jdantos – sadly its all in Excel. I’m still way too novice in R or any of the other more advanced data manipulation software packages to do anything substantive.

        I work with hourly interval electric data frequently so I’ve got some experience in getting Excel to replicate decent images with large data sets. [Pro-tip: Use the Zoom function with reckless abandon.]

        I’ve posted the Excel source file (75Mb) with the data arranged in a matrix of rows for all days and columns for each minute of the day with with each cell indicated how many rides started at a particular day-minute instance. Link here: http://minus.com/mduRSVBaO#
        (Just be warned that it will bog down your Excel)

        I think I saw you follow me on twitter – are you also attending the #fridaycoffeeclub meetups?

    • Bilsko, I’m struck by how the morning peaks remain relatively steady in time over the year, whereas the afternoon/evening peaks really tend to track with the sunset. E.g., even when the sun is rising super early, few are riding in the early mornings; but late evening sun keeps bikers out riding.

  9. […] how certain areas are more in balance than others. If you’re bored, nerdy, or both, read Part 1, 2, or […]

  10. Couldn’t the reason that “Fall” is lower than expected due to the fact that you used the months July-September, which are generally the hottest and most humid months? People may avoid riding during these times so as to avoid arriving to their destination (e.g. work) sweaty, even if they have a short or easy ride.

    I know that you changed the seasons for simplicity’s sake but the summer season starts at the end of June and lasts until late September. Also, consider that many registered riders (locals) travel out of the city during the summer for vacation, in an effort to avoid the oppressive heat or to visit places with even more oppressive heat!

    • Hey Mike –
      Do you mean summer? I called July, August, and September “summer,” while I called October, November, and December “fall.” It’s arbitrary, for sure, and I mainly just did it because it’s an easy way to divide up the year into equal parts. But the most humid, sticky months should be under Summer.

      Just wanted to be sure we’re on the same wavelength :). Thanks for the comment!

      • “So, I decided to focus just on 2011 to exclude the 2010 growing pains, and called January-March “winter,” April-June “spring,” July-September “fall,” and October-December “winter.” It ain’t perfect (June feels sorta summery, e.g.), ”

        Oh I see that in your original post you have two periods of winter and no summer, hence the confusion.

    • Ah! I see my mistake – corrected. Oct, Nov, Dec = “fall” in the above analyses. Good catch, thanks, and sorry for the confusion.

  11. […] This post quickly looks at how much each bike gets used and churned through the system. See parts 1, 2, and […]

  12. […] Capital Bikeshare usage data. This post focuses on system-level usage by trip duration. See parts 1, 3, and […]

  13. […] call this part 5, even though it’s kind of a postscript to part 4. See parts 1, 2, and […]

  14. […] Capital Bikeshare data. This one focuses on net flows of bikes across the system. Parts 1, 2, 3, 4, and […]

  15. […] crowd-sourced Capital Bikeshare data. This one maps a bunch of data by station. See also parts 1, 2, 3, 4, 5, and […]

  16. […] a simple correlation between temperature and usage. If you’re really bored, see also parts 1, 2, 3, 4, 5, 6,7, and […]

  17. […] D.C., 88 percent of bike-share trips are 30 minutes or less, and the vast majority of trips are made by subscribers who pay $75 per year for an unlimited supply of trips under 30 minutes. They are spending pennies […]

  18. […] system is overloaded at peak times/places –The weekday work commute trips are high-demand times for CaBi, and the highest variation in peak-period usage is very spatially concentrated.  We can […]

  19. […] has continued to publish trip-level usage statistics for 2012, so I thought I’d update my earlier analyses to reflect new information up to June 2012. In the last six months, Bikeshare has continued to grow […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Categories

%d bloggers like this: