Trang chủ Data Visualization Data Visualization: Python Seaborn part 1

Data Visualization: Python Seaborn part 1

0
Web design development, design studio, creative process. 3d render
  1. In the world of Analytics, the best way to get insights is by visualizing the data. We have already used Matplotlib, a 2D plotting library that allows us to plot different graphs and charts.
  2. Another complimentary package that is based on data visualization library is Seaborn, which provide a higher level interface to draw statistical graphics.

Seaborn

• Is a python data visualization library for statistical plotting

• Is based on matplotlib (built on top of matplotlib)

• Is designed to work with NumPy and pandas data structures

• Provides a high-level interface for drawing attractive and informative statistical graphics.

• Comes equipped with preset styles and color palettes so you can create complex, aesthetically pleasing charts with a few lines of code.

Seaborn vs Matplotlib

Seaborn is built on top of Python’s core visualization library matplotlib, but it’s meant to serve as a complement, not a replacement.

• In most cases, we’ll still use matplotlib for simple plotting

• On Seaborn’s official website, they state: “If matplotlib “tries to make easy things easy and hard things possible”, seaborn tries to make a well-defined set of hard things easy too.

  • Seaborn helps resolve the two major problems faced by Matplotlib, the problems are −
 *      • Default Matplotlib parameters

 *      • Working with data frames

In [ ]:

# Let's see the difference between codes of matplotlib and Seaborn 

In [ ]:

# Matplotlib 
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

x = np.linspace(0, 10, 1000)
plt.plot(x, np.sin(x), x, np.cos(x));
plt.show()

In [ ]:

# Seaborn 
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

sns.set()

x = np.linspace(0, 10, 1000)
# print(x)
plt.plot(x, np.sin(x), x, np.cos(x));
plt.show()

Data visualization using Seaborn

  1. Visualizing statistical relationships
  2. Visualizing categorical data

Visualizing statistical relationships (This can be also defined as relationship between variables)

The process of understanding relationships between variables of a dataset and how these relationships, in turn, depend on other variables is known as statistical analysis

relplot()

• This is a figure-level-function that makes use of two other axes functions for Visualizing Statistical Relationships which are –

* scatterplot()

* lineplot()


  • By default it plots scatterplot()

In [ ]:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
sns.set()

In [ ]:

df = sns.load_dataset('tips')
df.head()

Out[ ]:

total_billtipsexsmokerdaytimesize
016.991.01FemaleNoSunDinner2
110.341.66MaleNoSunDinner3
221.013.50MaleNoSunDinner3
323.683.31MaleNoSunDinner2
424.593.61FemaleNoSunDinner4

In [ ]:

df.tail()

Out[ ]:

total_billtipsexsmokerdaytimesize
23929.035.92MaleNoSatDinner3
24027.182.00FemaleYesSatDinner2
24122.672.00MaleYesSatDinner2
24217.821.75MaleNoSatDinner2
24318.783.00FemaleNoThurDinner2

In [ ]:

sns.relplot(x = 'total_bill', y = 'tip', data = df, kind = 'scatter')
plt.show() #that how there is direct relation between the food ordered and tip given. 

In [ ]:

# We can also change kind to line. 
sns.relplot(x = 'total_bill', y = 'tip', data = df, kind = 'line')
plt.show() #there is direct relation between the food ordered and tip given. 

In [ ]:

# Parameters -
# • x, y
# • data
# • hue: It separtes the colour of dots with their types. 
# • size
# • col: It can help to have different sex graphs. 
# • style: They are used for showing differnt style of points.

In [ ]:

sns.relplot(x = 'total_bill', y = 'tip', data = df, hue = 'time')
plt.show() # By using hue we can see different time of lunch and dinner. 

In [ ]:

sns.relplot(x = 'total_bill', y = 'tip', data = df, hue = 'time', style = 'sex')
plt.show() # By style we can see circle are male and x are female. 

In [ ]:

sns.relplot(x = 'total_bill', y = 'tip', data = df, hue = 'time', col='sex')
plt.show() # col generated two differenet graphs when sex is male or female. 

Let’s do the same with lines.

In [ ]:

import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
sns.set()

In [ ]:

print(sns.get_dataset_names())
['anagrams', 'anscombe', 'attention', 'brain_networks', 'car_crashes', 'diamonds', 'dots', 'exercise', 'flights', 'fmri', 'gammas', 'geyser', 'iris', 'mpg', 'penguins', 'planets', 'tips', 'titanic']

In [ ]:

df = sns.load_dataset('flights')
df.head()

Out[ ]:

yearmonthpassengers
01949Jan112
11949Feb118
21949Mar132
31949Apr129
41949May121

In [ ]:

df.tail()

Out[ ]:

yearmonthpassengers
1391960Aug606
1401960Sep508
1411960Oct461
1421960Nov390
1431960Dec432

In [ ]:

sns.relplot(x = 'year', y = 'passengers', data = df, kind = 'line')
plt.show() # So the dark blue line gives us exact average and rest of the shade tells us the diversity at that point. 

In [ ]:

sns.lineplot(x = 'year', y = 'passengers', data = df)
plt.show()

In [ ]:

sns.relplot(x = 'year', y = 'passengers', data = df, kind = 'line', hue = 'month')
plt.show()

In [ ]:

sns.relplot(x = 'year', y = 'passengers', data = df, kind = 'line', 
            col = 'month')
plt.show()

0 BÌNH LUẬN

BÌNH LUẬN

Vui lòng nhập bình luận của bạn
Vui lòng nhập tên của bạn ở đây

Exit mobile version