RevoU SQL Project
Project Summary
Improving the Understanding of SQL function to help access, read, manipulate, and analyze the data stored in the database and generate useful insights to drive an informed decision-making process, and this 3 main things that I did in this project :
- Created SQL query based using aggregate functions, date format, joins, CTE, and window function.
- Created monthly retention cohort analysis using SQL function
- Created the insights for the project
Project Background & Dataset
Project Background
Explore and get the hands on Structured Query Language (SQL) specially using Google Big Query, As a data analyst, we are expected to use SQL to formating and cleaning the data, joining the data, using CTE and Window function also generating useful insights from the data. Therefore, we tested our level of comprehension by doing the following assignment.
Dataset 1 | San Fransisco Ford GoBike Dataset
This dataset was provided by google bigquery public data in google cloud platform. San Francisco Ford GoBike ,managed by Motivate, provides the Bay Area’s bike share system. Bike share is a convenient, healthy, affordable, and fun form of transportation. It involves a fleet of specially designed bikes that are locked into a network of docking stations. Bikes can be unlocked from one station and returned to any other station in the system. People use bike share to commute to work or school, run errands, get to appointments, and more. The dataset contains trip data from 2013-2018, including start time, end time, start station, end station, and latitude/longitude for each station.
Dataset 2 | Hacker News Dataset
This dataset was provided by google bigquery public data in google cloud platform. This dataset contains all stories and comments from Hacker News from its launch in 2006 to present. Each story contains a story ID, the author that made the post, when it was written, and the number of points the story received. This public dataset is hosted in Google BigQuery and is included in BigQuery's 1TB/mo of free tier processing. This means that each user receives 1TB of free BigQuery processing every month, which can be used to run queries on this public dataset. Watch this short video to learn how to get started quickly using BigQuery to access public datasets
Project Files
For a more comprehensive analysis and visualization, please open the project files.Dataset 1
1. Create a query to get the average amount of duration (in minutes) per month during 2014-2017
in 2014-2017 July is the month with the longest duration average of Sanfransico bike rental with 19.94 minutes, followed by August with 18.85 minutes and September with 18.12 minutes.
2. Create a query to get total trips and total number of unique bikes grouped by region name during 2014-2017
in 2014-2017 San Francisco has the most total trips and bikes, followed by Oakland, and finally San Jose.
3.Assume this year is 2022, Find the youngest and oldest age of the members, for each gender.
The oldest age of Male is 136 years and the youngest age of Male is 22 years.
The oldest age of Female is 122 years and the youngest age of Female is 22 years.
The oldest age of Others is 122 years and the youngest age of Other is 22 years.
4. Get the latest departure trip in each region.
The last recorded departure trip form all the region
5. Create a query to get Month to Date of total trips in each region, breakdown by date during November 2017 - December 2017.
Total trips cumulative each region using Window Function
6. Find monthly growth of trips in percentage, ordered by time descendingly. Only for trips from the region that has the highest total number of trips and during 2014-2017.
July 2017 was the month with the highest growth percentage of 1311% and June 2017 was the month with the lowest growth percentage of -91%.
Dataset 2
7. Given another dataset (Hacker News) please use table “stories” to create monthly retention cohorts (the groups, or cohorts, can be defined based upon the date that a user/author started a story) and then how many of them (%) coming back for the following monts in 2014. After analyzing the retention cohort, is there any interesting insight that we can get?
By the Graph above we can clearly see the average activity dropped 85,5% for 0 to 1 month users age, there is a tendency that users don't create stories for the second time. So probably there’s the problem with bad first impression user experience