RevoU Data Visualization Project


Project Summary

Improving the Understanding of Data visualization to help analyze the data and generate useful insights to drive an informed decision-making process, and this 2 main things that I did in this project :

  1. Using Tableu to Conect & Combine the data, filtering, create hierarchies, use calculated field, generate visual exploratory data analysis, and set up interactive dashboards
  2. Using Google Data Studio to Conect & Combine the data, filtering, create hierarchies, use calculated field, generate visual exploratory data analysis, and set up interactive dashboards

Project Files

For a more comprehensive analysis and visualization, please open the project files.

Project Background, Data set

Project Background

Data visualization is useful for data cleaning, exploring data structure, detecting outliers and unusual groups, identifying trends and clusters, spotting local patterns, evaluating modeling output, and presenting results. It is essential for exploratory data analysis and data mining to check data quality and to help analysts become familiar with the structure and features of the data before them. To test our capability in data visualization, we need to visualize some data in following assignment. I am going to use tableau public as a main data visualization tools. Other than using tableau, I also use Google Data Studio.

Dataset 1 | Brazilian E-Commerce Public Dataset by Olist

Provided on Kaggle,This is a Brazilian ecommerce public dataset of orders made at Olist Store. The dataset has information of 100k orders from 2016 to 2018 made at multiple marketplaces in Brazil. Its features allows viewing an order from multiple dimensions: from order status, price, payment and freight performance to customer location, product attributes and finally reviews written by customers.
on this project we only use 6 tab of brazilian ecommerce dataset (order_datasets , customers_datasets , order_items_dataset , products_dataset , product_category_name_translation)

Dataset 2 | Singapore AirBnB listing

Provided on Kaggle, The data was collected on 28 August 2019 according to the website (It's not mine). There is 7907 sample, but there is some missing data on some feature/variable

Data Analysis

Data set 1

From October 2016 – August 2018, the number of orders experienced an upward trend. The number of orders reached its peak in March 2018 with 347 orders.


Of the product categories that are in the top 10 based on Sales, Bed Bath Table ranks first followed by Health Beauty in second place and Sports Leisure in third place


Price Group By Order, the majority sales per order worth more then 150$


In this visualization, sorting is used with a custom index at the deepest level. Sorting will order the top 3 Customer City and will repeat in each Customer State. Sorting is done based on the count of the unique Customer ID.


Segmenting is carried out on the Payment Value from the Order Payments Dataset. Price group is divided into 4, 0-50 (low), 51-100 (medium), 101-1.000 (high), and >= 1001 (very high). The price group with the highest demand is the high price group with a price range of 101-1.000 which has a total order of 2.385.

Dashboards

Data set 2

- Tableau


In this visualization can be seen that the data is not evenly distributed, the histogram shows a positive skewness, and the most frequent price data are in the price range 50-100$.


It can be seen that from the neighborhood group visualization of the total listings divided by price category, Central Region is the neighborhood group with the most total listings.


It can be seen that from the room type visualization of the total listings divided by price category, Private Room is the room type with the most total listings.


In this visualization, I plot the neighborhood based on longitude and latitude data. I choose to add coloring and size difference to provide more information in the visualization. The darker the color the more expensive the neighborhood is. The bigger the size, the more frequent neighborhood listed. The most expensive neighborhood is Tuas and overall most neighborhood located in Central Region.

- Google Data Studio


In this visualization can be seen that the data is not evenly distributed, the histogram shows a positive skewness, and the most frequent price data are in the price range 50-100$.


It can be seen that from the neighborhood group visualization of the total listings divided by price category, Central Region is the neighborhood group with the most total listings.


It can be seen that from the room type visualization of the total listings divided by price category, Private Room is the room type with the most total listings.


In this visualization, I plot the name of listing based on Geodata (using calculated field by concatenate latitude and longitude field). I choose to add coloring based on their respective neighborhood and size difference based on min price (using calculated field by multiplying price and min nights) to provide more information in the visualization. The bigger the size, the minimum price is higher. The most expensive listing is Corner Terrace located in Bedok with minimum price of $1.825.000 SGD.

Dashboards