RevoU Final Project
Project Summary
This project purpose is to find out holistic view of the IFLIX customer, analyze the data and generate useful insights to drive an informed decision-making process, The dataset originally came from kaggle.And this 5 main things in this project : :
- Cleaned raw data using various method and make sure the data is feasible to analyze.
- Conducted exploratory data analysis (EDA) to help find out what variable were causing booking cancellation.
- Made a customer segmentation to help determine the quality of users
- Built Algorithm to perform a Market Basket Analysis
- Create predictive model that have the highest predictive power to help increase the quality of users.
Project Files
For a more comprehensive analysis and visualization, please open the project files.Project Background
Increasingly businesses today are being forced to implement a more customer centric approach to their business strategy. This is in response to rapidly changing consumer behavior and the compelling move to digital platforms. To be able to respond and make better, faster decisions, business need a clear view of their decision-making strategies and the ability to apply risk analytics, strategy improvements, automation and advanced analytics. Before a company can personalize experiences for customers, it must first have a holistic view of the customer, which can often be a challenge, This project purpose is to find out holistic view of the IFLIX customer, analyze the data and generate useful insights to drive an informed decision-making process
Data Scope
We used data from Movie streaming datasets iflix. The dataset provides data from global inflix consumen in 2018-2019, that contain 110k unique registered users form 40 contries, 55k users are female, and 55k users are male,17k unique films, and 542k recorded activities
- Data_users_dataset
- UserID : Unique identifier of user
- Country Code : Country code where user registered
- Gander : user's gander
- Age : user's Age
- Interest : user's intereset (genre)
- Plays_dataset
- platform : Platform of consumption
- minuted_viewed : Total number of minutes viewed
- Assets_dataset
- AssetID : Unique identifier of video content, at the most granular level (a movie or an episode of a TV series)
- Showtype : Type of content, whether the asset is a movie or an episode of a TV series
- Genre : Genre of content
- Source_language : Orginal languange of the content
- Running_minutes : Runtime of content
Data Analysis
Starting with discribing the dataset that have succecfully cleanned, we found from 542k user-watching activity, 50% user activities only watch 0-30% of total asset duration, it indicate the lack of user loyalty and inability to stick on the platform
Then we create customer segmentation to determine quality of users based on total assets/films that user watch, the average of persetange(%) of duration film that users watch, and average amount of re-watch films that user watch we found that there are 4 types of users that we named as :
- Low quality user that contain 70.933 users, They watch 2 films, and they just finish 25% of duration film on average, and they never re-watch the same film
- Medium quality user that contain 17.026 users, They watch 15 films, and they finish 57% of duration film on average, and they never re-watch the same film
- High quality user that contain 86 users, They watch 43 films, and they finish 79% of duration film on average, and they re-watch 2 films
- Loyal quality user that contain 1.557 users, They watch 86 films, and they finish 92% of duration film on average, and they re-watch 20 films
From the data we know that Low quality users dominated 70% of our total users, so we check the distribution of customers segmentation in each country, and we found that this problem (low quality users) dominated in every country. We conclude this problem is global issue for netflix (not local issue)
And then we come up with hypothesize what things caused most of users have really low percentage of watch
- The low percentage of watching activity is caused by users watched film that not match with their interest
- The low percentage of watching activity is caused by an application error
- The low percentage of watching activity is caused by rating / users dislike the films
- 5 stars rating with total 5 films, watched by 6.662 users, and re-watch for 16 times in average, with the genre TV_Drama_Chinese, and Movies_Comedy_Indonesian
- 4 stars rating with total 150 films, watched by 1.292 users, and re-watch for 5 times in average, with the genre TV_Drama_Chinese, and Tv_Kids_English
- 3 stars rating with total 10.440 films, watched by 1.292 users, and re-watch for 5 times in average, with the genre TV_Drama_Indonesian
- Abandoned films with total 6675 films, watched by 4 users, and re-watch for -1 times in average it means users never watch until movies is finished, with the genre TV_Drama_Indonesian
We found in each customer segmentatiion, users already watch films that matched with their interest, even though most of users already matched, Iflix still experienced a low-users activities
Due to public data limitation we can not dig dive to this hypothesize, But we found signs that make this hypothesize very possible to happen, we found there are several film that has really low persentage of watch (Comparing films duration Vs how long users watch) with big number of users. for example Asset 1708 and Asset 4282 are watched by more then 200 users but the precent watch is below 5%, its indicating users might have technical issue while watching, the relevant team should try to check it further
For the beggining we thought that we can not found the rating of Iflix movies on public. and it turns out that this is caused by this Snapshot, This is from Iflix mobile android version. We found all this time Iflix has no space for users to give their rating and comments so we tried to create implicit rating to assess quality of films
We created films clustering by looking from total users watch that films, the average of re-watch activity, and film's genre, and we come up with 4 cluster of films
From hypothesize that we haved test, we come up with the question "How to increase percentage watch of users?"
- Finding factors that correlete with percentage watch
- How to decreasing Not_match values
- How to increasing no_of_assets values
- How to increasing rating values
- Forcasting data
We discovered No_of_assets(average total asset that watched by each users), not_match (average total asset watched by each users which match with user's interest), and rating(implicit rating to assess quality of films) are factors correlated with pct_watch(Comparing films duration Vs how long users watch)
Because of not_match(average total asset watched by each users which match with user's interest) correlated with pct_watch, we tracked down the iflix asset and came to know that Iflix assets are not accomodating users interest, we can see Iflix asset 80-95% contained TV_shows (on right graph with pink colour), while in other hand, user's iflix 40-90% more interest in Movie Show(on left graph with red colour) this could increasing the risk of users choosing different assets from their interest.
In order to increase number of asset watched, we built algortihm that have Hybrid model to be recommendation system for each users based on user's profile, asset's profile, and asset's rating. for example users that have watched film "journey of men" will offer another film that related to each users, and surely AB testing is needed before releasingthe algortihm
As we just have implicit rating for the films, increasing total user that watch the film, and re-watch time for each user will be increasing average rating values, But for better recommendation system alogrithm, we suggest to Adding rating space and comment box on our platform to help us to understand user's individual tastes
We built predicton percent watch model with a multivariate linear regression based on no_of_asset,asset_not_match and rating
if we increase no_of_asset (average total asset that watch by users) values from 6 to 9, increase average rating from 3 to 5, and decrease not_match(average total asset watched by each users which match with user's interest) values form 1 to 0 we will successfully increase our pct_watch(Comparing films duration Vs how long users watch) and it will change our customer cluster that dominated by low quality user to high quality users and surely it will really help the business to improving users stickienst
Recommendation
- Evaluation for application complaints. Checking for abandoned assets with high number of user-watch to ensure the app not have eror/crash
- Additional for rating space and comment box. Adding rating space and comment box will help to understand user's individual tastes for better recommendation system alogrithm
- Further sentiment analysis from social media. Conducting sentiment analysis for original content will also help to understand whether or not the original content is loved by the market
- Adding more variationof movies or shifting market targets. To impress more users to engage with our platform, iflix could add more variation of movies or shift the marketing targets matching the assets owned by iflix
- Using Hybrid recommendation system in order to increase number of asset watched based on user's profile, asset's profile, and asset's rating