GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

Practical Guide to Machine Learning Model Evaluation and Error Metrics

globalresearchsyndicate by globalresearchsyndicate
August 4, 2020
in Data Analysis
0
Practical Guide to Machine Learning Model Evaluation and Error Metrics
0
SHARES
12
VIEWS
Share on FacebookShare on Twitter

In machine learning, we regularly deal with mainly two types of tasks that are classification and regression. Classification is a task where the predictive models are trained in a way that they are capable of classifying data into different classes for example if we have to build a model that can classify whether a loan applicant will default or not. But regression is a process where the models are built to predict a continuous variable for example if we need to predict the house prices for the upcoming year. 

In both the tasks we do the basic data processing followed by splitting the data into training and testing sets. We use training data to train the model whereas testing data is used to compute prediction by the model. Many different algorithms can be used for classification as well as regression problems but the idea is to choose that algorithm that works effectively on our data. This can be done by doing the evaluation of the model and using error metrics. Different evaluation methods are used like confusion matrix, accuracy score, classification report, mean square error etc.

This article demonstrates a classification and regression problem where we will first build the model and then we will evaluate to check the model performance. We will be using prime diabetes data for doing the classification task where we need to classify whether a patient is diabetic or not. Also, we will explore the wine dataset to do the regression task where we need to predict the quality of the wine.



What you will learn from this article? 

  • How to build a classification model?  
  • How to build a Regression model?
  • How to check the model performance using different error metrics?

 Classification Model 

We will first import the required libraries followed by the data. We will then process the data set for basic data preprocessing. In this experiment, we have taken the Pima Indian Diabetes dataset that is publicly available on Kaggle. In this dataset, there are a total of 768 rows and 9 columns in the data with no missing value. Use the below code to do the same.

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn import svm
from sklearn.metrics import accuracy_score,precision_score,recall_score,confusion_matrix,classification_report,f1_score

pima= pd.read_csv('pima.csv')

print(pima.head(10))
print(pima.shape)
print(pima.isnull().any())

Model EvaluationAfter that, we will define the dependent and independent features X and y respectively. We then scale the data and then split it into training and testing sets. After that, we will fit the training data to the model and make predictions for the testing data. We will make use of the Random Forest classifier and Support Vector machine algorithm for building two models. Use the code below to the same. 

X = pima.drop('class', axis = 1)
y = pima['class']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

std = StandardScaler()

X_train = std.fit_transform(X_train)
X_test = std.fit_transform(X_test)

rfcl = RandomForestClassifier()
rfcl.fit(X_train,y_train)

y_pred_rfcl = rfcl.predict(X_test)

svc = svm.SVC()

y_pred_svc = svc.predict(X_test)

We have stored our prediction of testing data in y_pred_rfcl variable and y_pred_svc.  We will make you use this variable for the evaluation of the model. We will now compute different error metrics to check the model performance like accuracy_score, confusion_matrix, classification_report, f1_score. Use the below code to compute different error metrics. 

print(accuracy_score(y_pred_rfcl,y_test))
print(accuracy_score(y_pred_svc,y_test))
print(confusion_matrix(y_pred_rfcl,y_test))
print(confusion_matrix(y_pred_svc,y_test))
print(f1_score(y_pred_rfcl,y_test))
print(f1_score(y_pred_svc,y_test))
print(classification_report(y_pred_rfcl,y_test)) 
print(classification_report(y_pred_svc,y_test)) 

Model Evaluation

Regression Model 

We will first import the required libraries that are required and load the data set. We will be using the wine dataset for this problem that can be downloaded directly from Kaggle. After which we will load the data followed by pre-processing of the data. There are a total of 1599 rows and 12 columns in the data set. There were no missing values found in the data. Use the below to code to the same.

See Also




import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error,mean_absolute_error

print(data.head(10)
print(data.shape)
print(data.isnull().any())













After that, we will define the dependent and independent features X and y respectively. We then scale the data and then split it into training and testing sets. After that, we will fit the training data to the model and make predictions for the testing data. We will make use of Linear Regression and Support Vector Machine Regression for building two models. Use the code below to the same. 

X = data.drop('quality', axis =1)
y = data['quality']

std = StandardScaler()

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

X_train = std.fit_transform(X_train)
X_test = std.fit_transform(X_test)

lr = LinearRegression()
rfr = RandomForestRegressor()

lr.fit(X_train,y_train)
rfr.fit(X_train,y_train)

y_pred_lr = lr.predict(X_test)
y_pred_rfr = rfr.predict(X_test)

We have stored our prediction of testing data in y_pred_lr and y_pred_rfr. We will make use of this variable for the evaluation of the model. We will now compute different error metrics to check the model performance like mean squared error and mean absolute data. 

print("Mean Squared Error: ", mean_squared_error(y_pred_lr,y_test))
print("Mean Squared Error: ",mean_squared_error(y_pred_rfr,y_test))
print("Mean Absolute Error: ",mean_absolute_error(y_pred_lr,y_test))
print("Mean Absolute Error",mean_absolute_error(y_pred_rfr,y_test))


Model Evaluation

Conclusion

We have computed the evaluation metrics for both the classification and regression problems. We can always try improving the model performance using a good amount of feature engineering and Hyperparameter Tuning. Read more about error metrics here in this article “Top 10 Model evaluation Machine Learning Enthusiast should know”. I hope you have now understood how you can build a classification and regression model and also how to evaluate the model using different metrics discussed above. 

Provide your comments below

comments


If you loved this story, do join our Telegram Community.


Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.

Rohit Dwivedi

Rohit Dwivedi

Data Science enthusiast who is currently pursuing a Post Graduate Program in Machine learning and Artificial Intelligence. I have experience in building predictive models in Machine Learning and Artificial Intelligence and also deploying them in real time. I have done various good projects in the domain of analytics. My area of interest lies in Computer Vision. My goal is to build various use cases using the power of Artificial Intelligence and Machine Learning and solve business problems.

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
Waterproof Lamp Sales Market by Application, Type, Region for Forecast – 2020 to 2024 – 3w Market News Reports

Impact of COVID-19 Pandemic on Artificial Intelligence Platforms Market Trends, Size, Competitive Analysis and Forecast to 2024

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com