DIVVY_BIKES| Maya Sandler

Big Data Analytics and Visualization of DIVVY Bikes

Diving into Products and Revenue

Posted by Maya Sandler on June 1, 2021

Intent

I love recreational biking and it's a great option for short-distance commute. In my search for data that will relate to this notion, I came across a Kaggle's Chicago Divvy bicycle sharing data.

DIVVY bike's is a bike rental service located around the Chicago area, and lets you pick up a bike at one of their hundreds of stations, bike to your destination and return the bike to any one of their stations - sounds great!

However, the data in the Kaggle website was outdated and only included the rides' data and not prices, so I decided to get what I was looking for from the DIVVY website. It was a good call. I got a lot of data there, including background about the company, their users, current pricing policy, bike docking station locations, and an 2004-2020_quarter-1 updated trips data.

I was interested to look at this business from a product and business point of view as well as the user behavior in this type of business. The insights from my analysis may have a major effect on decision making in the company per its products and policies. I looked into the last 5 whole years (i.e. 2015-2019) data.

I was interested mainly in answering these questions:

What is the revenue from each product?
What is the the most popular product?
What are the 20 most popular stations?
Who are the main users in each product and what is their behavior?

I downloaded DIVVY's trip data, wrote to myself the pricing rules (for later calculations), read and cleaned the data via a Python code, validate and analyzed the data via PostgreSQL and visualized it using Tableau Public. You can see the code and viz work in my GitHub and Tableau. I conclude the project with my insights, conclusions and recommendations from my analysis.

Description of the Data

Trips Datasets

Multiple trips datasets (.csv files) were downloaded (as .zip files) from DIVVY and contained data of divvy trips. each line represented a single trip with a unique trip id. The trip datasets between 2015-2019 contains 18,028,922 records and include the following data:

trip_id: ID attached to each trip taken
start_time: day and time trip started, in CST
stop_time: day and time trip ended, in CST
bikeid: ID attached to each bike
tripduration: time of trip in seconds
from_station_name: name of station where trip originated
to_station_name: name of station where trip terminated
from_station_id: ID of station where trip originated
to_station_id: ID of station where trip terminated
usertype: users can be either:
- "Customer": a rider who purchased a 24-Hour pass
- "Subscriber": a rider who purchased an annual membership
gender: gender of rider ; date only available for Subscribers
birthyear: birth year of rider ; date only available for Subscribers

The data had already been processed to remove trips that took below 60 seconds, as they are potentially false starts.

Station Location Datasets

Multiple station location datasets were also downloaded. These files were updated throughout the year, as the location of the docking stations changed. This datasets between 2015-2019 contains 16,598 records and include the following data:

id: ID attached to each station
name: name of the station
city: the city the station is located at
latitude: latitude coordinates of the station
longitude: longitude coordinates of the station
dpcapacity: number of total docks at each station
online_date: date the station went live in the system

* The column names changed their name along the years (see preprocessing section).

Pricing

In addition to the trip datasets, I pulled out the current pricing from DIVVY's current website for the three main users:

Customer (called "day pass" on the website) - $15 per day, limited to 3 hours ride in a 24 hour period. Addition fee of $0.2/minute if the ride takes more than the time limit.
Subscriber (called "annual member" on the website) - $9 per month, limited to 45 minutes rides. Addition fee of $0.15/min if the ride takes more than the time limit.
Single ride (called also "non-member" on the website) - $3.30 per trip (up to 30 minutes). Addition fee of $0.2/minute if the ride takes more than the time limit.

Assumptions and Limitations:

User Types:

In 2015-2017 there are three types of usertype values: "Subscriber" (annual member), "Customer" (day pass), and "Dependent". There is no documentation regarding the "Dependent" user type. After exploring the data, it seams that this user type was a very small portion of the total trips (2015 - 0.004%; 2016 - 0.001%; 2017 - 0.0001%). because of the because of the negligible decreasing percentages of the percentages of the "Dependent" user type until 0% in 2018 and 2019, and the fact that I don't have information about them, their product or pricing, I decided to ignore the trips done by those users.
In 2015-2019 there is only two types of users: "Customer" and "Subscriber", and no reference of a "single rider" that is mentioned in the current DIVVY website. Therefore, I assume that during 2015-2019, there were only two types of users and price plans: "Customer" and "Subscriber".
Therefore, for this analysis I refer to only two user types: "Subscriber" and "Customer".

Pricing, Revenue and Profit:

The pricing data is according to the current (April 27, 2021) published data on the DIVVY website. As I don't have the data of pricing earlier on, I will assume the pricing is fixed throughout the 2015-2019 period that is analyzed.
Businesses and universities have their own pricings, but the data of which member is a business/universities is not known to me. Therefore the price was calculated according to the usertype data on the, and the pricing in the website.
The Chicago area map is divided into two pricing zones with a different fee in case the bikes are not returned to a docking stations. This would be relevant to my analysis if I had the data for the expenses of retrieving these bikes. However, I don't have this information, and therefore, this data is irrelevant to my analysis and I will ignore this fee.
DIVVY does not share user_id data, and therefore I cannot know the single user usage of the services. This is problematic when calculating subscriber's and customer's revenues according to the company's published pricing. Due to this limitation, I will calculate the revenue from subscribers and customers per-ride (subscribers: $9 a ride, limited to 45 minutes rides; and customers: $15 a ride, limited to 3 hours).
I don't have the operation cost of the service and if there is a difference in cost between the different users. I assume that membership will cost more as it needs a greater customer support hours (at least), and more parameters I am not aware of as an outsider. Because I don't have the cost of the services, I cannot conclude on the profit of the products, only on their revenue.

Preprocessing

Preprocessing the data was done in Python by using Pandas, Numpy, Datetime, and math: Reading the multiple .csv files, cleaning it, transforming it, and creating several new columns. The preprocessed data frames were then and loaded to PostgreSQL database using Psycopg2 package. This is a big data (> 18M rows) and I used several methods to make the process shorter - check out this post.

Validate, Clean and Transform the Data

Validation of Data

Processes of validating that all the data was in proper formats and data type, looking for NULL values, and looking for repetitions in the data were performed via Python.

This process led to finding that the dataset changed during the years in column names, and data types. No missing information was found. The only NULL values were in gender and birthyear, that were related to the fact that unsubscribed users miss this information.

Reading Only Relevant Columns:

The stations data fields changed over the years. for example, 2015 stations data contained column 'landmark' which did no repeat in other datasets, and 2017-quarter2-3 trips data contained column 'Unnamed: 7' which did no repeat in other datasets and had only NULL values. Therefore, I read only relevant columns. This was done by using usecols function argument when reading the file, like so:

Creating New Relevant Columns:

In 2016, a new column online_date was added to the stations datasets, and in 2017, a new column city was added to the stations dataset, and existing stations data was updated. To allow merging and keeping only the updated data, new columns online_date and city were created in all datasets that were missing these columns. online_date values in 2015 stations were set on 01/01/2015 and city values in 2015-2016 stations were set as NULL, before merging the datasets:

This data was later sorted by date and duplicated stations were then deleted to keep only the updated stations information:

Rename and Reorder Columns:

Column names in the trips datasets changed their name and order along the years. In addition, new columns were added to the datasets. Therefore, column names were changed and reordered to match the other datasets to enable union of the data.

2015-2016 column starttime was changed to start_time in trips datasets.
2015-2016 column stoptime was changed to end_time in trips datasets.
All column names in 2018-quarter1 and 2019-quarter2 trips datasets were changed.
2015-2016 columns in stations datasets were reordered to match 2017 data frame.
After merging the trips datasets, tripduration was changed to a better understood column name, tripduration_secconds.

This was done by the rename function, like so:

Change Data Types:

Several data types were also changed along the years. In addition, the resolution of the datetime data changed along time. Therefore, they were changed in the cleaning process to enable union of the data:

starttime and end_time in trips dataset were changed from string to datetime, creating a high resolution date and time data that will match along all datasets.
online_date in stations dataset was changed from string to datetime and indicated the date.
2018-2020 column tripduration in trips dataset was changed from string to integer. To do that a coma needed to be removed, and the string was changed to float and then to integer (because there isn't a possibility to go directly from float to integer).
birthyear in trips dataset was set as float and had NaN values in it. To turn it to integer (for later calculations), NaN values were changed to 0 via fillna() function.

Merging Datasets:

All trips datasets were concatenated together, and all stations datasets were concatenated together while ignoring the original indexes. To allow concatenation, The columns order as well as the datatype were changed and checked for consistency before union of the datasets.

Check and Delete Duplicates Rows:

Because the data is divided into multiple files, duplicates might occur. Trips data had 61 duplicated rows, and stations data had 2752 duplicated rows.

Trips dataset had 61 duplicated rows. Data was then ordered by trip_id and removed from duplicates using drop_duplicates() function. The trips data without duplications contained 18,028,861 trips.

Stations dataset had 2752 duplicated rows. Data was ordered by city (so that the updated city value will be on top), by station id (to remove duplicated station id) and by online_date to have the most updated value on top), and then removed from duplicates. The clean stations data contained unique 586 stations. This is an example of the process in trips dataset:

Calculated Fields:

To make later analysis easier, I added several calculated columns in the trips dataset. The trips dataset is very large. Therefore, I needed a time and memory efficient method to read values and calculate the new columns. For this reason, I used iloc[] to petch the exact value from each row, as well as saved values to temporary parameters that were updated in each run (as shown below).

age: Age (in years) of user was calculated according to the year of birth and the year of the trip. After changing start_time data to datetime, the year of the trip was extracted from start_time data via pd.DatetimeIndex(db[column_name]).year function. This data only relevant in users that we have birthyear information on. Users that do not have a birthyear data have a NaN value in their birthyear. Therefore I transfer their value to 0 in order to change the column datatype to Integer and perform the calculations. These users will receive an age = NaN (Not A Number) to prevent their result affect the analytics and insights:

revenue: The revenue from each ride was calculated according to the user's plan. The revenue from subscribers was calculated according to $9 per ride, limited to 45 minutes rides, and the revenue from customers was calculated according to $15 per ride, limited to 3 hours (see assumptions and limitations section).

Thant was a lot of cleaning of a lot of data! The data was pretty dirty and needed a lot of work, but I'm all done and can now attend to the analytic part.

More Cleaning:

Following the export process of data to PostgreSQL, I discovered there are trips that refer to station ids that do not exist in the stations dataset. Therefore, I decided to leave these rows out of the clean database by sunsetting the trips data frame, which decreased the amount of data to 17,740,369 rows:

Transform Pandas Data Frame to PostgreSQL:

After transforming the data, I validated it by printing the head and tail of tables, counting the rows, checking data types, and making sure the cleaned data is good.

I then started transferring it to PostgreSQL database for analysis.

1. Create a Database in PostgreSQL:

To create a PostgreSQL database from Python, I needed to use package Psycopg2 and its extension Isolation_level_autocommit:

Then, to create the database: Create a connection with PostgreSQL, followed by creating a cursor and the creation of the database:

2. Create Tables and Inset the Data to PostgreSQL:

I created two tables of trips and stations in PostgreSQL via Python code, and added the relationship between these tables, as from_station_id and to_station_id columns in the trips table are related to the station_id column in the stations table:

3. Export big data to PostgreSQL:

The Divvy data is very big. The clean trips data alone is about 18 million rows. I was looking for a quick way to upload the data and found two nice blogs by data scientists Naysan Saran and Haki Benita, who compared timed experiments for different methods for this problem. According to their results, the best solutions is to use Python IO module (no need for pip install, as it is part of the python library). I did try their method (without a buffer) and unfortunately this method uploads all data to memory - and my computer crashed. Unfortunately, I am short in time right now to learn how to add the buffer method that Haki suggested, so note to self - this is a great method to learn and use, and I ought to get to it! I also tried to save the data to file and use PosgreSQL's import GUI. It worked well for the stations dataset, but failed for the big trips data, perhaps because of the size of the data frame.

I decided to run the upload the data row by row to PosgreSQL at night using this script and it was a success:

Validation of the Data in PostgreSQL Database:

After transforming the data, I verified that the changes were made and that the data remained intact. I ran several queries to do it:

1. Check that all rows were transferred:

I compared the amount of rows in the Python data to the amount of rows in the PostgreSQL datasets. Both stations dataset were equal to 586, and both trips datasets were equal to 17,740,369. Next I printed the first 5 rows of the data and compared the values. It looked the same as in the Python. Great!

2. Checking for NULL values:

I looked to compare the amount of NULL values in both tables. After cleaning the data via Python, the stations database contained 1 NULL value only in the city column. I checked this in Postgres, and it was the same case. Because this is only one missing value, instead of doing a complex code to fetch the data, I queries the latitude and longitude values of that point and used Google Maps to find the city :) Then, I updated the value in the database and then verified the update:

Now that the data is clean and transformed, I could start answering my questions.

Results:

All queries were done via PostgreSQL and visualized via Tableau.

What is the revenue from each product?

Next, I looked on the revenue of each product along the years:

Visualizing the revenue results indicated a possible problem with the data. The results showed that between 2015-2017 the revenue from Annual Memberships was higher or almost equal to the revenue from Day-Passes with an advantage for the Annual Memberships product. However, in 2018-2019 (1) the revenue from Annual Memberships doubled in size than passed years, and (2) the revenue from Day-Passes increased to double or more the revenue from Annual Memberships.

This huge shift in revenue in 2018-219 compared to previous years was interesting and suspicions to me and got my curiosity going. To investigate what caused the significant increase in revenue during 2018-2019, I next checked the trip duration according to years as revenue is a direct result of the trip duration. I decided to check the distribution of the trip duration according to years. Therefore, I created bins to count the trip durations by (<45 minutes, 45 min - 3 hours, 3-5 hours, 5-12 hours, 12-24 hours, and >24 hours), and grouped by year and user type:

The result shows that (1) the large majority of Annual membership users return the bike on time (< 45 minutes), (2) the large majority of Day Pass users return the bike on time (< 3 hours), (3) only 2018 and 2019 users return the bikes after more than 24 hours, (4) Significantly more Day Pass users returned the bikes after 24 hours.

This distribution indicates that a substantial influence of extremely long trip durations on the revenue in 2018-2019. To understand the significance of highly long trips on the revenue I ran a quick query on the trips data and found that the longest trip duration was set on 3982 hours and led to a calculated revenue of more than $2.8M.

Digging a little on the company's history (see StreetsBlog, Chicago Tribune and Curbed Chicago), I found that in 2018 Lyft bought Motivate, the company who owns Divvy bikes. Lyft took over Divvy's operation, and made Divvy's bikes available for Lyft's users, and experimenting with dockless options.

Therefore, these extremely long trip durations in 2018-2019 could have been the result of stolen or lost bikes, bikes that were not returned to the docking station appropriately, a technical problem with the bike's return procedure or a change in the company's policy of moving to dockless method.

According to the company's pricing policy, if a user did not return the bikes for more than 24 hours, the bikes would be considered lost and the trip fee would be set on a fixed rate of $250. (This price is almost equal to the maximal Day Pass price for using a bike for 24 hours including extra time fees).

Therefore, I calculated a real revenue, based on the maximal trip time of 24 hours in a new column called new_revenue:

As expected, the maximal limit on the trip duration changed for trip:

I was also interested to know the revenue from specific days from each product.

This data reveals that the mid-week revenue of the Annual Membership product is almost 2x the revenue from Day Pass. On the contrast, the weekends produce higher Day Pass revenue compared to Annual Membership, especially on Saturdays and Sundays, and also an slight higher revenue, compared to mid-week, on the long weekend days (Fridays and Mondays):

Conclusions:

Following Divvy's acquisition by Lyft in 2018, which introduced a different, dockless, operation model there have been many bike rides that exceeded 24 hours which have not happened before. Most of the unreasonably long trips, and significantly more in Day Pass product compared to Annual Membership product. If indeed these trips ended in lost/stolen bikes or in collecting them from a distance location into docking station, they concluded in higher operation costs, which make the Day Pass product "riskier" to the company. However this is only a theory, as I do not have the data weather the trip had a dockless bike-return or additional information from customer support.
The revenue from the Annual Membership product is usually higher compared to the Day Pass product, and in 2018 it reached a 1.44x in revenue.
The Day Pass product showed a decreasing revenue along 2015-2017 that was turned around in 2018, following the acquisition by Lyft's. This might have been the result of Lyft's policy from their acquisition in 2018, to make Divvy bikes available for all Lyft's users without the need for an Annual Membership specifically for Divvy. I do not have the user identification and their Lyft user information, therefore I can not check this theory.
Annual Membership had many ups and downs in revenue, and are constantly but slowly increasing along the years (1.13x along 5 years). One hypothesis is that there is a final amount of users that can use membership, as it targets people who need the bikes on a daily basis (for work, education, etc.). If I would have had the data of the user identification, I could have checked this theory. This will have to remain a hypothesis.
2017 was not a good year for revenue from either product of the company.
The daily revenue data indicates a different use for each product. During the mid-week, the Annual Membership constitutes a significant portion of the company's revenue, reaching 2x the revenue from Day Pass. However, during the weekends, the Day Pass product's revenue exceeds the Annual Membership mid-week revenue.

What is the Most Popular Product?

The company had two products: Annual Membership (subscribers user type in the data frame) or Day-Pass (customer user type in the data frame). I wanted to see first which product is the most popular by users. There are several parameters for popularity of a product: (1) the number of trips done in each product; and (2) the total time of usage of each product.

Therefore, I counted the number of trips, summed the trip duration (in hours), and calculated the usage-time in hours per trip, for each product along the year:

The analysis showed a significant more trips in the Annual Membership product compared to Day-Pass product in each year, consistently:

Does that mean that users in Annual Membership plan use the bikes more than users in the Day-Pass plan? To answer this question, I needed to calculate the average trip duration (with the maximum 24 hours policy) for both plans.

This analysis revealed that Annual Membership users use the bikes for less than 14 minutes a ride on average, while Day-Pass users use the bikes for 30 minutes and more on average, and in 2018-2019 this number stood on almost 60 minutes a ride.

This is partially a circumstances of the Divvy pricing plan, which limits Annual Membership users to 45 minutes a ride every day, while limiting Day-Pass users to 3 hours a ride for a single day. This plan may cause Annual Membership users to avoid passing the much shorter time limit per ride, and use the bikes to make short trips between nearby destinations (e.g. between train / bus station and workplace), whereas allowing Day-Pass users to go to much longer destinations and time consuming actions (like going out to each lunch or visiting a tourist area).

To understand the usage of the bike products along days of the week, I grouped the number of trips according to the day numbers:

This analysis revealed that the Annual Membership product is used significantly more mid-week (work) days compared to weekends, whereas the Day-Pass product is used significantly more during the weekends compared to mid-week days:

Looking at the averaged trip time for each product along the days of the week, we see that, we see that the trip duration of Day Pass product is higher on every day compared to Annual Membership product, and in both products there is a slight increase during the weekend:

Conclusions from this Questions

The most popular product is the Annual Membership, as it is used for significantly more trips compared to the Day Pass product during weekdays, as well as during the weekend.
The total number of bike trips of Annual Membership product increased along 2015-2018 and then became stable. On the contrary, the number of bike trips in the Day Pass product decreased along 2015-2018, and increased back to its value in 2017 only on 2019. This means that the the Day Pass product is less popular throughout the years, and had only increased a bit after Lyft encouraged their users to use Divvy bikes more easily.
These results indicates that the two products appeal to two completely different users. The Annual Membership attracts people who uses bikes for mobility mostly during the weekdays (for example between work and public transportation areas/home and ran errands), while the Day-Pass product attracts tourists and leisure time users and is mostly used during the weekends.
The average trip duration is about 2.8x higher in Day Pass product compared with Annual membership, and is quite constant along the week, with both product usage time increase a little during the weekends. In addition, the average trip duration of Day Pass product increased significantly in 2018-2019 compared to previous years.

What are the 20 Most Popular Stations?

The popularity of a station depends on the income of the population that can use the bikes, the need for bikes in that area, proximity to public transportation, proximity to tourism areas, and more.

Knowing the popularity of stations according to their location is important for the company to know how many bikes need to be in stock in each station, where to place more stations, what stations are not popular enough to justify the operation cost, and much more.

To look at the popularity of the bike stations, I looked on all user types and genders:

A visualization showing all 585 stations of the company in Chicago and close areas, indicates that the stations that are used more are in the city center and the north area:

Looking closer on the 20 most popular stations for all users, we see that they are located in the center of Chicago and along the shoreline of the city, when the most popular stations are located at the pier, Maggie Daley Park and shoreline and Chicago's main train/but stations:

However, because the company has two products that target two different users and different usage days, we can expect to find different popular stations for these two groups. Therefore, I have created station ranking for each product by using RANK() window function, partitioned by the usertype:

Conclusions from this Questions

The results of the 20 most popular stations for each product is another indication that each product is used by different users for different purposes.

Day Pass users mostly take and return bikes from- and to- specific areas along the shoreline, as well as top tourism sites (Navy Pier, Montrose Harbor, museums, Shedd Aquarium, Lincoln park zoo, etc.).
Annual Membership users mostly take and return bikes from and to bus and train stations, the main locations at the business area, and education facilities (colleges and universities).

Connecting this insight with the data of preferred days of use of each product (see last question), reveals that the Annual Membership product is used significantly more by workers and students who use the service on mid-week days compared to weekends, whereas the Day-Pass product is used significantly more by tourists during the weekends.

Who Are the Annual Membership Users?

My analysis indicated two different users for the Divvy products. To summarize my insights up until now, the Annual Membership is used in work/education/major public transportation areas and is mainly used during the middle of the week, whereas the Day Pass is used in tourism and leisure areas and mainly during the weekends.

I wanted to have a more complete understanding on the users. (1) What are their age distribution according to their gender? (2) Which days and times do Annual Membership users perform their trips? (3) Are there different stations that are more favorable for different users? (4) What is the revenue difference between users? All of these questions can have a highly important marketing insights, and the company could use them to suggest new features to its clients.

One must remember that the age and gender data exists only for users who had (ever) have an Annual Membership, therefore is only partial for all the trips. This is a limitation that we need to take under consideration.

Days and Times of Annual Membership Trips

In order to look into the data of the days and times according to the user gender, trip data was divided according to the product (usertype), name of trip day, trips' start hour, gender of the user and the number of the trips done. Again, because much of the gender and age data for Day Pass users was missing, this analysis was credible only for Annual Membership product.

This analysis revealed several interesting insights in the Annual Membership:

There are about 3-4x more men than women taking bike trips on weekdays, and about 2X more men then women on weekends.
During the weekdays there are two peeks of ride numbers:
- The first peak is in the morning hours midweek, and is similar for men and women. For both genders, the first peak is between 5-10 am and the highest number of rides occurs at 8 am.
- The second peak is in the afternoon hours midweek, and is slightly different between men and women. The second peak is between 3-10 pm for men , and 3-9 pm for women. For both genders the highest number of rides occurs at 5 pm. The shape of the women's chart is more right skewed compared to men, which means that most women take bike trips back from work earlier compared to men.
During weekends, trip time for both genders starts at 6am. For women the trips end at 9 pm at night, and for men the trips end at 2 am in the morning.

Annual Membership Users' Age and Gender

To answer these questions, I first binned the users' ages in each product in meaningful groups (Minor being less than 18, 18-28, 29-39, 30-49, and so on). Above the age of 100 years old was set as "Unknown", as this was probably a fake age. I ended up only visualizing the Annual Membership data, as the Day Pass users had few data points of gender and age. I then visualized only users between 0-79 years old, as they were the main users of the product.

Annual Membership users' distribution by gender and age indicated a difference between men and women users:

There are significantly more men users compared to women users (about 2-3.5x).
While most men users were in their 30s, then between 18 and 28 and then more than 40 years old, women users were the highest in their 18-28 years old, and then in their 30s, with a significant drop at more than 40 years old.

These result indicate that the Annual Membership product has two completely different major user personas based on their gender and age: Men in their 30s, and young women in their 18-28 years old.

Favorable Stations for Different Users?

To see the distribution of the most favorable stations of the Annual Membership users, according to their gender, I visualized the same data from the previous query as a geographic map. These maps show the 15 most popular stations color-rated according to their number of trips (red being the highest number of trips from a station).

These maps reveal that women and men that use Annual Membership product, use the bikes in a completely different pattern:

Men use most frequently stations that are very concentrated at the major public transportation areas, and with a lower frequency other stations at the central business district of Chicago (the "loop").
Women, like men, use most frequently stations at the major public transportation areas, and in contrast to men, use in addition stations in northern and eastern areas like Theater on the Lake and Delay Center Plaza, as well as along the coastline. That means that women prefer to start their trips in major cultural and scenery areas.

Annual Membership Trip Revenue According to Gender

As the Annual Membership product is a significant revenue source to the company, and genders make very different users in their behavior, I wanted further to check the average trip duration and revenue from each of the genders.

Therefore, I divided the data of the trips according to the day of the week, gender and aggregated the trip duration and revenue according to the maximum 24 hours policy (see "What is the revenue from each product?" in the results section).

When I visualized the total revenue for women and men that used Annual Membership product, I saw that men bring the highest revenue every day:

However, when I visualized the mean revenue by gender, I found that women tend to ride longer trips on average:

(Seconds)

And therefore produce higher revenue per trip:

These results indicate several interesting insights:

As there are between 2-3.5x more men compared to women Annual Membership users, the men users are a big source of revenue to the company, especially in the mid-week days.
Women has a big potential to have a strong revenue benefit to the Annual Membership product, as they are having longer trips and pay more per trip.

Conclusions for this Questions

Getting to know the different users of products reveal pros and cons as well as potential increase in revenue and profitability to companies. We do not have the number of men versus women Annual Membership users, as we do not have the data of user-IDs. However, looking into the different genders that use Annual Membership product revealed many interesting insights:

Quantity: 2 to 4x more men than women are using the product, especially during mid-week days. This numerical advantage gives rise to a high revenue from men-rides compared to women-rides.
Days and time: The time that both genders use the bikes suggest that women either need to be at home earlier than men or feel less protected in the streets to be riding bikes:
- As this product is used during mid-week days mostly to go to and back from work, we notice two peeks rides: Both genders go to work at the same time, but go back from work in different times (women finish their rides earlier compared to men, and most of the are in the early afternoon, compared to men).
- The Annual Membership product is used mostly for recreational usage during the weekends. Here we also see a difference between genders: Women stop riding bikes at 9 pm , while men continue riding until early morning hours.
Age: Most women riders are 18-28 years old and in their 30s, unlike men who are mostly in their 30s then in their 18-28 years old and 40s, and older.
Stations: Men use the stations that concentrated at major public transportation and the business district areas the most. Unlike women, that mostly use the stations at the major public transportation areas as well as major cultural and scenery areas.
Revenue per trip: On average, women take longer trips and pay more per trip.

Conclusions

In this project I asked several business questions regarding the Divvy bikes revenue and users' behavior. My analysis revealed several interesting insights:

In general, the Annual Membership product is a significant portion of the company's revenue, and in 2018 it reached 1.44x in revenue compared to Day Pass product. However, the Annual Membership revenue is increasing extremely slowly (1.13x between 2015-2019), compared to Day Pass product that increased significantly between 2017-2019 (following Divvy's acquisition by Lyft in 2018), reaching an almost equal revenue as Annual Membership in 2019.
The annual Membership is also the most popular product, used for more trips during weekdays and weekends, compared to the Day Pass product throughout the years that were analyzed (2015-2019). The Day Pass product popularity increased only a bit in 2019 following Lyft's acquisition and encouragement of its users to use Divvy bikes more easily.
The revenue depends on the day of the week: During the weekdays, the Annual Membership product reaches 2x revenue compared to the Day Pass product. However, during the weekends, the Day Pass product's revenue exceeds even the Annual Membership's mid-week revenue.
Different usage for the Annual Membership and Day Pass products:
- Day Pass product:
  - Used mainly as a single day tourism commuting, mostly during weekends.
  - Popular docking stations: specific areas along the shoreline, as well as top tourism sites.
  - The average trip duration is about 34 minutes, even though each trip is limited up to 3 hours.
- Annual Membership product:
  - Used mostly during the week on a daily basis to bike to- and back- from work, education institutes, etc.
  - Popular docking stations: bus and train stations, main locations at the business area, and education facilities.
  - The average trip duration is 12 minutes, even though each trip is limited up to 45 minutes.
  - Between 2-4x more men than women are using the product, especially during mid-week days. Therefore, higher revenue from men's rides.
Dockless Bikes: Changing the company policy to allow dockless bikes (bikes without a need to bring back to a specific docking station), following Divvy's acquisition by Lyft in 2018, caused many cases of unreasonably long (more than 24 hours) bike trips. These trips occurred mostly in Day Pass product compared to Annual Membership product. If these trips ended in lost/stolen bikes, they concluded in much higher operation costs than expected.

This analysis indicated that there are three major markets with specific preferences for bike ride usage, which is important for marketing and future business opportunities:

Day Pass users:
- Single day tourism trips.
- Focused mostly at areas along the shoreline as well as top tourism sites.
- Mostly done during the weekend.
- Trips take about 34 minutes on average.
Annual Membership men users:
- Ride to/back from work/school.
- Use mainly stations located at business district and public transportation area.
- Take later hour trips (weekdays - 10 pm, weekend - 2 am) compared to women.
- Ages mostly range at the 30s and 18-28 years old and in lesser extent 40s, and older.
- They comprise the main riders and are the majority of revenue source in this product.
Annual Membership women users:
- Ride to/back from work/school.
- Use mainly stations located at public transportation area major cultural and scenery public areas.
- During mid-week days, women tend to go back from their jobs/schools earlier as well as finish their rides earlier at night (weekdays and weekends - 9 pm) compared to men.
- Ages mostly range at 18-28 years old and 30s.
- There are less women than men, and on average, women take longer trips and pay more per trip.

Recommendations

The insights in this project regarding the different products and users are important to understand the personas that the product is aimed at. Getting to know the different users of products reveal pros and cons for each product, as well as a potential to increase revenue and profitability to the company.

Because the analysis shows that there are entirely different behaviors of bike usage for Day Pass and Annual Membership products, as well as in women behavior compared to men, the company can use these insights in its marketing.

Day Pass: As we do not have the information about the gender and age of these users, we will target both men and women.
- Cancel dockless bikes, as it cause too long ride times, more than 24 hours, as biked are probably stolen, lost, or not returned fully in the company's system (perhaps communication with the user is not self explanatory). This increases operational cost, may cause understocking in the docking stations, as well as gives people the notion that the company is not tracking its bikes and consequently encourages theft.
- Suggest combined tickets, discounts to tourism attractions or touristic bike routs to encourage more tourists to use the bikes for transportation. Perhaps also introduce tourism attractions that are not the main attractions and by that encourage the use of less used docking stations.
- Give a discount on Day Pass use during mid week days, perhaps specifically on Tuesday and Wednesday, when the number of trips is the lowest. This may encourage more users to choose Day Pass bike rides as their tourism commute.
- As the averaged trip duration in 34 minutes, I suggest to create two payment categories: Short trip and Long trip. Short trip will allow users to bike up to 1 hour and Long trip will allow them to use the biked up to 3 hours. This may increase the Day Pass use as well as to collect a fine in case the user returns the bike later than expected.
- Continue allowing Lyft's users to use the app for bike trips, as it encourages users to use the bikes.
Annual Membership - This product is the most popular product and a highly significant part of the company's revenue. However, the revenue from this product is increasing very slowly along the years. Therefore, it will be wise to invest in opening this product to more users and invest in its marketing.
- Marketing two personas: I suggest that publicity should focus on the two different personas that use Annual Membership: men - work areas, and women - work and culture areas.
- Promote women riders: 18-39 year old women may be a new focused marketing target. This can be achieved by (1) offering lower membership price for the first year for users who finish their rides ay 9 pm everyday of the week. (2) Another option for encouraging women to use this product is to offer benefits to members who use stations located at major cultural and scenery areas and and/or in the late afternoon hours during the week, in order to increase the number of women Annual Membership users and increase revenue as a result. These parameters are mostly done by women and as such will target them without saying it is especially for them.
- Promote men riders: They are the "easier" market that use bikes in large numbers already and are responsible of a high revenue from this product, therefore, it is wise to invest in this market and make it bigger. In marketing material focused on: (1) Ask what can you do in 45 minutes per ride in an Annual membership bike, in any age? Go from the bus station to the coffee shop, go to the office, go grab lunch, go meet a client in the business district, etc. - all of the most men-used stations. (2) Mention that they can use the bikes late at night or even in the early morning hours, which appeal especially to men. These specifically men-targeted point may increase the number of the men that purchase Annual Membership.