All Courses
All Courses
Courses by Software
Courses by Semester
Courses by Domain
Tool-focused Courses
Machine learning
POPULAR COURSES
Success Stories
import numpy as np import pandas as pd import matplotlib.pyplot as plt import seaborn as sns auto= pd.read_csv("auto_clean.csv") print(auto) symboling normalized-losses make aspiration num-of-doors \ 0 3 122 alfa-romero std two 1 3 122 alfa-romero std two 2 1 122 alfa-romero std two 3 2 164 audi std four 4 2 164…
Sushant Ovhal
updated on 17 Oct 2022
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
auto= pd.read_csv("auto_clean.csv")
print(auto)
symboling normalized-losses make aspiration num-of-doors \ 0 3 122 alfa-romero std two 1 3 122 alfa-romero std two 2 1 122 alfa-romero std two 3 2 164 audi std four 4 2 164 audi std four .. ... ... ... ... ... 196 -1 95 volvo std four 197 -1 95 volvo turbo four 198 -1 95 volvo std four 199 -1 95 volvo turbo four 200 -1 95 volvo turbo four body-style drive-wheels engine-location wheel-base length ... \ 0 convertible rwd front 88.6 0.811148 ... 1 convertible rwd front 88.6 0.811148 ... 2 hatchback rwd front 94.5 0.822681 ... 3 sedan fwd front 99.8 0.848630 ... 4 sedan 4wd front 99.4 0.848630 ... .. ... ... ... ... ... ... 196 sedan rwd front 109.1 0.907256 ... 197 sedan rwd front 109.1 0.907256 ... 198 sedan rwd front 109.1 0.907256 ... 199 sedan rwd front 109.1 0.907256 ... 200 sedan rwd front 109.1 0.907256 ... compression-ratio horsepower peak-rpm city-mpg highway-mpg price \ 0 9.0 111.0 5000.0 21 27 13495.0 1 9.0 111.0 5000.0 21 27 16500.0 2 9.0 154.0 5000.0 19 26 16500.0 3 10.0 102.0 5500.0 24 30 13950.0 4 8.0 115.0 5500.0 18 22 17450.0 .. ... ... ... ... ... ... 196 9.5 114.0 5400.0 23 28 16845.0 197 8.7 160.0 5300.0 19 25 19045.0 198 8.8 134.0 5500.0 18 23 21485.0 199 23.0 106.0 4800.0 26 27 22470.0 200 9.5 114.0 5400.0 19 25 22625.0 city-L/100km horsepower-binned diesel gas 0 11.190476 Medium 0 1 1 11.190476 Medium 0 1 2 12.368421 Medium 0 1 3 9.791667 Medium 0 1 4 13.055556 Medium 0 1 .. ... ... ... ... 196 10.217391 Medium 0 1 197 12.368421 High 0 1 198 13.055556 Medium 0 1 199 9.038462 Medium 1 0 200 12.368421 Medium 0 1 [201 rows x 29 columns]
auto.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 201 entries, 0 to 200 Data columns (total 29 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 symboling 201 non-null int64 1 normalized-losses 201 non-null int64 2 make 201 non-null object 3 aspiration 201 non-null object 4 num-of-doors 201 non-null object 5 body-style 201 non-null object 6 drive-wheels 201 non-null object 7 engine-location 201 non-null object 8 wheel-base 201 non-null float64 9 length 201 non-null float64 10 width 201 non-null float64 11 height 201 non-null float64 12 curb-weight 201 non-null int64 13 engine-type 201 non-null object 14 num-of-cylinders 201 non-null object 15 engine-size 201 non-null int64 16 fuel-system 201 non-null object 17 bore 201 non-null float64 18 stroke 197 non-null float64 19 compression-ratio 201 non-null float64 20 horsepower 201 non-null float64 21 peak-rpm 201 non-null float64 22 city-mpg 201 non-null int64 23 highway-mpg 201 non-null int64 24 price 201 non-null float64 25 city-L/100km 201 non-null float64 26 horsepower-binned 200 non-null object 27 diesel 201 non-null int64 28 gas 201 non-null int64 dtypes: float64(11), int64(8), object(10) memory usage: 45.7+ KB
auto[auto == '?']
symboling | normalized-losses | make | aspiration | num-of-doors | body-style | drive-wheels | engine-location | wheel-base | length | ... | compression-ratio | horsepower | peak-rpm | city-mpg | highway-mpg | price | city-L/100km | horsepower-binned | diesel | gas | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
1 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
2 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
3 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
4 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
196 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
197 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
198 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
199 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
200 | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | ... | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN | NaN |
201 rows × 29 columns
autoclean = auto.replace('?',np.NaN)
autoclean
symboling | normalized-losses | make | aspiration | num-of-doors | body-style | drive-wheels | engine-location | wheel-base | length | ... | compression-ratio | horsepower | peak-rpm | city-mpg | highway-mpg | price | city-L/100km | horsepower-binned | diesel | gas | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 3 | 122 | alfa-romero | std | two | convertible | rwd | front | 88.6 | 0.811148 | ... | 9.0 | 111.0 | 5000.0 | 21 | 27 | 13495.0 | 11.190476 | Medium | 0 | 1 |
1 | 3 | 122 | alfa-romero | std | two | convertible | rwd | front | 88.6 | 0.811148 | ... | 9.0 | 111.0 | 5000.0 | 21 | 27 | 16500.0 | 11.190476 | Medium | 0 | 1 |
2 | 1 | 122 | alfa-romero | std | two | hatchback | rwd | front | 94.5 | 0.822681 | ... | 9.0 | 154.0 | 5000.0 | 19 | 26 | 16500.0 | 12.368421 | Medium | 0 | 1 |
3 | 2 | 164 | audi | std | four | sedan | fwd | front | 99.8 | 0.848630 | ... | 10.0 | 102.0 | 5500.0 | 24 | 30 | 13950.0 | 9.791667 | Medium | 0 | 1 |
4 | 2 | 164 | audi | std | four | sedan | 4wd | front | 99.4 | 0.848630 | ... | 8.0 | 115.0 | 5500.0 | 18 | 22 | 17450.0 | 13.055556 | Medium | 0 | 1 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
196 | -1 | 95 | volvo | std | four | sedan | rwd | front | 109.1 | 0.907256 | ... | 9.5 | 114.0 | 5400.0 | 23 | 28 | 16845.0 | 10.217391 | Medium | 0 | 1 |
197 | -1 | 95 | volvo | turbo | four | sedan | rwd | front | 109.1 | 0.907256 | ... | 8.7 | 160.0 | 5300.0 | 19 | 25 | 19045.0 | 12.368421 | High | 0 | 1 |
198 | -1 | 95 | volvo | std | four | sedan | rwd | front | 109.1 | 0.907256 | ... | 8.8 | 134.0 | 5500.0 | 18 | 23 | 21485.0 | 13.055556 | Medium | 0 | 1 |
199 | -1 | 95 | volvo | turbo | four | sedan | rwd | front | 109.1 | 0.907256 | ... | 23.0 | 106.0 | 4800.0 | 26 | 27 | 22470.0 | 9.038462 | Medium | 1 | 0 |
200 | -1 | 95 | volvo | turbo | four | sedan | rwd | front | 109.1 | 0.907256 | ... | 9.5 | 114.0 | 5400.0 | 19 | 25 | 22625.0 | 12.368421 | Medium | 0 | 1 |
201 rows × 29 columns
auto.isna().sum()
null=auto.isnull().any(axis = 1)
nullvalue=null.index[null.values]
nullvalue
Int64Index([46, 52, 53, 54, 55], dtype='int64')
missingrows = auto.iloc[nullvalue,:]
missingrows
symboling | normalized-losses | make | aspiration | num-of-doors | body-style | drive-wheels | engine-location | wheel-base | length | ... | compression-ratio | horsepower | peak-rpm | city-mpg | highway-mpg | price | city-L/100km | horsepower-binned | diesel | gas | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
46 | 0 | 122 | jaguar | std | two | sedan | rwd | front | 102.0 | 0.921192 | ... | 11.5 | 262.0 | 5000.0 | 13 | 17 | 36000.0 | 18.076923 | NaN | 0 | 1 |
52 | 3 | 150 | mazda | std | two | hatchback | rwd | front | 95.3 | 0.812110 | ... | 9.4 | 101.0 | 6000.0 | 17 | 23 | 10945.0 | 13.823529 | Low | 0 | 1 |
53 | 3 | 150 | mazda | std | two | hatchback | rwd | front | 95.3 | 0.812110 | ... | 9.4 | 101.0 | 6000.0 | 17 | 23 | 11845.0 | 13.823529 | Low | 0 | 1 |
54 | 3 | 150 | mazda | std | two | hatchback | rwd | front | 95.3 | 0.812110 | ... | 9.4 | 101.0 | 6000.0 | 17 | 23 | 13645.0 | 13.823529 | Low | 0 | 1 |
55 | 3 | 150 | mazda | std | two | hatchback | rwd | front | 95.3 | 0.812110 | ... | 9.4 | 135.0 | 6000.0 | 16 | 23 | 15645.0 | 14.687500 | Medium | 0 | 1 |
5 rows × 29 columns
auto.isnull().all(axis=0).sum()
0
auto.isnull().all(axis=1).sum()
0
round(auto.isnull().sum().sort_values(ascending =False)/len(auto) *100,2)
stroke 1.99 horsepower-binned 0.50 symboling 0.00 engine-size 0.00 diesel 0.00 city-L/100km 0.00 price 0.00 highway-mpg 0.00 city-mpg 0.00 peak-rpm 0.00 horsepower 0.00 compression-ratio 0.00 bore 0.00 fuel-system 0.00 num-of-cylinders 0.00 normalized-losses 0.00 engine-type 0.00 curb-weight 0.00 height 0.00 width 0.00 length 0.00 wheel-base 0.00 engine-location 0.00 drive-wheels 0.00 body-style 0.00 num-of-doors 0.00 aspiration 0.00 make 0.00 gas 0.00 dtype: float64
auto[['normalized-losses','price','peak-rpm','horsepower']] = auto[['normalized-losses','price','peak-rpm','horsepower']].apply(pd.to_numeric)
auto[['normalized-losses','price','peak-rpm','horsepower']].describe()
normalized-losses | price | peak-rpm | horsepower | |
---|---|---|---|---|
count | 201.00000 | 201.000000 | 201.000000 | 201.000000 |
mean | 122.00000 | 13207.129353 | 5117.665368 | 103.405534 |
std | 31.99625 | 7947.066342 | 478.113805 | 37.365700 |
min | 65.00000 | 5118.000000 | 4150.000000 | 48.000000 |
25% | 101.00000 | 7775.000000 | 4800.000000 | 70.000000 |
50% | 122.00000 | 10295.000000 | 5125.369458 | 95.000000 |
75% | 137.00000 | 16500.000000 | 5500.000000 | 116.000000 |
max | 256.00000 | 45400.000000 | 6600.000000 | 262.000000 |
auto.loc[:,'normalized-losses'].fillna(auto['normalized-losses'].mean(),inplace = True)
round(auto.isnull().sum().sort_values(ascending = False)/len(auto) * 100,2)
stroke 1.99 horsepower-binned 0.50 symboling 0.00 engine-size 0.00 diesel 0.00 city-L/100km 0.00 price 0.00 highway-mpg 0.00 city-mpg 0.00 peak-rpm 0.00 horsepower 0.00 compression-ratio 0.00 bore 0.00 fuel-system 0.00 num-of-cylinders 0.00 normalized-losses 0.00 engine-type 0.00 curb-weight 0.00 height 0.00 width 0.00 length 0.00 wheel-base 0.00 engine-location 0.00 drive-wheels 0.00 body-style 0.00 num-of-doors 0.00 aspiration 0.00 make 0.00 gas 0.00 dtype: float64
df.loc[:,'Price'].fillna(df['Price'].mean(),inplace = True)
df.loc[:,'Stroke'].fillna(df['Stroke'].mean(),inplace = True)
df.loc[:,'Bore'].fillna(df['Bore'].mean(),inplace = True)
df.loc[:,'Peak RPM'].fillna(df['Peak RPM'].mean(),inplace = True)
df.loc[:,'Horsepower'].fillna(df['Horsepower'].mean(),inplace = True)
round(df.isnull().sum().sort_values(ascending = False)/len(df) * 100,2)
auto.loc[:,'price'].fillna(auto['price'].mean(),inplace =True)
auto.loc[:,'stroke'].fillna(auto['stroke'].mean(),inplace= True)
auto.loc[:,'bore'].fillna(auto['bore'].mean(),inplace = True)
auto.loc[:,'peak-rpm'].fillna(auto['peak-rpm'].mean(),inplace =True)
auto.loc[:,'horsepower'].fillna(auto['horsepower'].mean(),inplace = True)
round(auto.isnull().sum().sort_values(ascending = False)/len(auto) * 100,2)
horsepower-binned 0.5 symboling 0.0 engine-size 0.0 diesel 0.0 city-L/100km 0.0 price 0.0 highway-mpg 0.0 city-mpg 0.0 peak-rpm 0.0 horsepower 0.0 compression-ratio 0.0 stroke 0.0 bore 0.0 fuel-system 0.0 num-of-cylinders 0.0 normalized-losses 0.0 engine-type 0.0 curb-weight 0.0 height 0.0 width 0.0 length 0.0 wheel-base 0.0 engine-location 0.0 drive-wheels 0.0 body-style 0.0 num-of-doors 0.0 aspiration 0.0 make 0.0 gas 0.0 dtype: float64
auto[['horsepower-binned']]
auto['horsepower-binned'].unique()
array(['Medium', 'Low', 'High', nan], dtype=object)
auto['horsepower'].astype('category').value_counts()
68.0 19 69.0 10 116.0 9 70.0 9 110.0 8 95.0 7 114.0 6 62.0 6 101.0 6 88.0 6 76.0 5 82.0 5 145.0 5 84.0 5 97.0 5 160.0 5 102.0 5 92.0 4 111.0 4 86.0 4 123.0 4 90.0 3 85.0 3 121.0 3 73.0 3 182.0 3 207.0 3 152.0 3 161.0 2 156.0 2 155.0 2 162.0 2 94.0 2 112.0 2 52.0 2 104.256157635468 2 100.0 2 176.0 2 184.0 2 56.0 2 175.0 1 200.0 1 154.0 1 48.0 1 106.0 1 143.0 1 142.0 1 140.0 1 135.0 1 134.0 1 120.0 1 115.0 1 78.0 1 72.0 1 64.0 1 60.0 1 58.0 1 55.0 1 262.0 1 Name: horsepower, dtype: int64
df['No. of Doors'].astype('category').value_counts()
df.loc[:,'No. of Doors'].fillna('four',inplace = True)
df['No. of Doors'].astype('category').value_counts()
df.to_csv('clean_auto.csv')
auto.to_csv('Cleandata_auto.csv')
auto.hist(figsize=(30,30))
plt.show()
plt.figure(figsize=(15,10))
sns.heatmap(auto.select_dtypes(include='number').corr(),annot =True,cmap='coolwarm')
plt.title("Numerical features")
plt.show()
plt.figure(figsize=(15,10))
sns.countplot(auto['normalized-losses'])
plt.title("values")
plt.show
<function matplotlib.pyplot.show(close=None, block=None)>
auto['normalized-losses'].describe()
count 201.00000 mean 122.00000 std 31.99625 min 65.00000 25% 101.00000 50% 122.00000 75% 137.00000 max 256.00000 Name: normalized-losses, dtype: float64
sns.displot(auto['normalized-losses'],kde=True)
plt.title("Distribution of losses")
plt.show()
plt.figure(figsize=(10,7))
sns.heatmap(auto.select_dtypes(include='number').corr(),annot = True,cmap='coolwarm')
plt.title("correlation od all number")
plt.show()
auto.drop(['symboling','normalized-losses','compression-ratio','peak-rpm'],axis=1,inplace=True)
plt.figure(figsize=(10,7))
sns.heatmap(auto.select_dtypes(include='number').corr(),annot = True,cmap='coolwarm')
plt.title("correlation od all number")
plt.show()
auto.select_dtypes(exclude='number').head()
make | aspiration | num-of-doors | body-style | drive-wheels | engine-location | engine-type | num-of-cylinders | fuel-system | horsepower-binned | |
---|---|---|---|---|---|---|---|---|---|---|
0 | alfa-romero | std | two | convertible | rwd | front | dohc | four | mpfi | Medium |
1 | alfa-romero | std | two | convertible | rwd | front | dohc | four | mpfi | Medium |
2 | alfa-romero | std | two | hatchback | rwd | front | ohcv | six | mpfi | Medium |
3 | audi | std | four | sedan | fwd | front | ohc | four | mpfi | Medium |
4 | audi | std | four | sedan | 4wd | front | ohc | five | mpfi | Medium |
Leave a comment
Thanks for choosing to leave a comment. Please keep in mind that all the comments are moderated as per our comment policy, and your email will not be published for privacy reasons. Please leave a personal & meaningful conversation.
Other comments...
Project 1 - Analyzing the Education trends in Tamilnadu
This dashboard empowers mission driven organizations to harness the power of data visualization for social change. Women are tracked away from science and mathematics throughout their education, limiting their training and options to go into these fields as adults. The data set contains the data of women graduated by years,…
14 Nov 2023 01:32 PM IST
Project 1 - English Dictionary App & Library Book Management System
Project 1) English dictionary app and Library Book Management system
06 Nov 2023 04:04 PM IST
Project 1 - Implement and deploy CNN model in real-time using python on Fashion MNIST dataset
Implement and deploy CNN model in real-time using python on Fashion MNIST dataset
20 Dec 2022 07:04 AM IST
Project 2
Project 2
30 Nov 2022 11:41 AM IST
Related Courses
0 Hours of Content
Skill-Lync offers industry relevant advanced engineering courses for engineering students by partnering with industry experts.
© 2025 Skill-Lync Inc. All Rights Reserved.