Examples of how to visualize the data, exploratory data, Insights, and regression analysis using Python.

Suraj sagar chathiri
8 min readMar 15, 2024

--

Bike Share data

https://drive.google.com/file/d/1syYLHNQR2_BUULJJYhOyv2No8cBuNZI_/view?usp=drive_link (dc bike share data in Google Sheets to get an idea on Python code and report below)

https://colab.research.google.com/drive/1n52vynK8iOpbl7lHsTA0hXo0qk4bI8_p?usp=drive_link (Free access to python code of bike share raw data)

Introduction:

This dataset provides an overview of the patterns of bike-sharing registrations over two years (2011 and 2012). It covers a wide range of variables such as date, season, year, month, hour, holiday status, weekday, working day status, and weather conditions, with a total of 17,379 bike-sharing data points. It can be used to develop models to predict demand, which can then be used to improve the efficiency of bike-sharing services. For example, a model could be used to predict demand at a particular station at a particular time of day. This information could then be used to adjust the number of bikes available at that station or to send alerts to users about stations that are likely to be busy. As for features, the dataset differentiates between the registered as well as the casual users which is something deeper. The weather condition data suggests that the measurements were taken during a period of average weather conditions.

Here are some of the things that can be learned from this dataset:

1. How does the demand for bikes vary throughout the day, week, month, and year?

2. What is the impact of weather conditions on the demand for bikes?

3. How do holidays and special events affect demand for bikes?

4. Is there a difference in demand between registered users and casual users?

Insights Gained from descriptive statistics:

Through the descriptive statistics, it is evident that more users are served during the summer. This implies that such strategies as running summer promotions would be useful as a way of gaining additional riders. In addition, the rents for an average holiday trip are usually lower compared to weekdays. The firm must develop promotional strategies to skew the riders towards rental on holidays. There is a positive relationship between temperature, humidity, and wind speed with the number of riders who registered which can help the company use weather forecasts to predict demand and hire or allocate resources accordingly.

Average Registrations per Season:

Graph 1: Average Registrations per Season

Graph 1 describes the insights observed and gained from average registrations per season.

1. Seasonal Variation: The graph clearly shows a variation in the average registrations per season. This suggests that the season significantly influences the number of registrations.

2. Winter Drop: The average registrations are lowest during the winter season. This could be due to the cold weather, which might discourage people from registering.

3. Spring Surge: There is a significant increase in the average registrations in the spring season. The warmer weather could be encouraging more people to register.

4. Summer and Fall Consistency: The average registrations remain consistently high during the summer and fall seasons. This could indicate that favorable weather conditions during these seasons maintain people’s interest.

5. Potential for Strategic Planning: These insights could be used for strategic planning. For instance, special promotions or events could be planned for the winter season to increase registrations.

Similarly, resources could be optimally allocated during the spring, it’s important to consider some other factors that could influence registration rates throughout the year. For instance, if the registration process is tied to school schedules, we might see higher rates in the fall before the new school year starts. Additionally, if the bar graph depicts registrations for a fitness centre, we might see a surge in registrations in January, corresponding to New Year’s resolutions.

Unique number of registrations:

Graph 2: Unique number of registrations

Graph 2 discovered valuable insights from the unique number of registrations. This graph provides a clear representation of the unique number of registrations, allowing you to gain a deeper understanding of your audience. Use this information to improve your business strategy and stay ahead of the competition. The graph shows a consistent increase in the unique number of registrations over the months from 0 to 8. This indicates a positive trend in growth, suggesting that whatever is being registered is gaining popularity or usage over time. There is a slight decrease in registrations at month 10. This could suggest a seasonal trend where registrations dip during a particular time of the year. It could be useful to investigate this further to understand the cause of this decrease. After the dip at month 10, registrations pick up again at month 12, surpassing the previous peak at month 8. This suggests that the decrease at month 10 was temporary and did not impact the overall upward trend.

Potential for Forecasting: The trends observed in this graph could potentially be used for forecasting future registrations. Understanding the factors contributing to these trends could help in making more accurate predictions.

Registered riders per hour on Fridays (2011–12):

Graph 3: registered riders per hour on all Fridays (2011–12)

Morning Commute Peak: There is a significant spike in registrations at 8 AM. This could indicate that many people use this service to commute to work or school in the morning.

Evening Activity Increase: There is another increase in registrations from 5 PM to 7 PM. This could be associated with people returning home or going out for the evening.

Low Activity Hours: The lowest numbers of registrations occur between midnight and 5 AM, which are typically non-peak hours.

These insights are useful for understanding user behavior and planning service operations accordingly. For example, ensuring enough vehicles are available during peak hours and maintenance activities are scheduled during low activity hours.

Output:

Month casual registered

1 8.426872 85.997901

2 11.158091 101.706935

3 30.172437 125.238289

4 42.311761 144.949200

5 50.594758 172.312500

6 51.323611 189.191667

7 52.524866 179.295027

8 48.840000 189.257627

9 48.937370 191.835769

10 41.185389 180.973122

11 25.471816 151.863605

12 14.627782 127.675657

Graph 4: Casual and Registered Riders Every Month (2011–12)

Graph 4 represents the relationship between registered and unregistered riders in the year (2011–12). Both casual and registered ridership appears to increase around the middle months (5–10). This could indicate a seasonal trend possibly related to weather conditions or holidays. Registered users always outnumber casual users. This could suggest that the service has a strong user base of regular riders. There is a significant drop in the number of both types of riders in the last month shown. This could be due to several factors such as holidays, weather conditions, or end-of-year travel. To increase your registered ridership during peak months, deploy offers that convert casual riders to registered ones and maximize your impact. These insights are useful in business to enhance growth by planning marketing campaigns, resource allocation, and enhancing rider experience.

Relation between Temp vs Registered riders & Windspeed vs Registered riders:

Graph 5: Temp vs Registered riders
Graph 6: Windspeed vs Registered riders

Temperature Influence: There is a noticeable increase in the number of riders as the temperature rises until around 0.8, after which it starts fluctuating. This suggests that warmer temperatures encourage more people to ride, but extremely high temperatures might deter riders. There is a significant spike in the number of riders at a temperature of 1.00. This could be due to various factors such as promotional activities or special events. These insights can help in planning marketing strategies, scheduling maintenance, and managing resources effectively. For instance, more vehicles could be allocated during warmer temperatures to accommodate the increase in riders.

Wind Speed Influence: As the wind speed increases beyond 0.3, the number of registered riders noticeably decreases. This suggests that high wind speeds might deter riders. A concentration of riders is visible at lower wind speeds indicating more riders register when the wind speed is low. For instance, more vehicles could be allocated during lower wind speeds to accommodate the increase in riders.

These insights can be useful for understanding the factors that influence rider counts and can guide decision-making in operations and marketing strategies. For instance, more marketing efforts could be made during non-holiday times and warmer temperatures when rider counts are typically higher.

Regression Analysis:

From the regression model, we observed that variables such as season, working days, holidays and weather conditions have a statistically significant impact on the number of people using the bike-sharing service. The coefficient for the season is positive and statistically significant. This means that there are more bikes registered in some seasons than in others. The number of registered bikes is increasing over time. This could be due to several factors, such as an increasing awareness of the bike-sharing program, or an increase in the number of bike stations. During holidays, registered riders do not prefer the bike-sharing service, as compared to weekdays. The coefficient for temperature is positive and statistically significant. This means that there are more bikes registered on warmer days. This is consistent with the idea that people are more likely to ride bikes when the weather is nice. The coefficient for humidity is negative and statistically significant. This means that there are fewer bikes registered on more humid days. This could be because people are less comfortable riding bikes when it is humid. The R-squared value of 0.388 implies the model can explain a moderate substituting of the total variance in bikes registered. This regression analysis information can be used by bike-sharing companies to improve their operations and marketing efforts. For example, companies could allocate more resources to areas with higher ridership or run targeted marketing campaigns during times when ridership is lower.

Conclusion:

I conclude that after analyzing the data, I have successfully obtained the answers to the questions that were initially mentioned in the introduction. The bike share data provides valuable insights that can drive business growth and improve service offerings. Registered users outnumber casual users, indicating a strong user base. Strategies could be developed to convert casual riders into registered ones, enhancing customer loyalty and increasing revenue. We observed that bike-sharing services are often used for commuting to work or school during weekdays. Therefore, demand is expected to remain high during morning and evening peak hours. The data shows a decrease in ridership during holidays and weekends. By making strategies like Offering holiday-specific promotions or discounts can incentivize users to register during these times. The usage of bikes is heavily influenced by temperature, with more riders during warmer months. To increase the number of riders in the remaining seasons, special promotions or events could be planned. Wind speed also influences ridership, with more riders as wind speed increases up to a certain point.

The business could benefit from considering different subscription plans or rental durations (it could be an advantage), so the business can use as it a strategy. Besides, the company could forecast rush hours to ensure that suitable personnel and necessary equipment were available at specific bike-sharing stations. Along with that the company must check if there are any mediocre registration activities and may also put some regulations against those fraudulent accounts. Therefore, it is important to anticipate possible demand variation that can occur due to seasonal differences or promotional activities so, you can adjust in time and avoid service interrupting issues.

The end

--

--