Global Research Syndicate
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

Regression using Tensorflow and Gradient descent optimizer

globalresearchsyndicate by globalresearchsyndicate
November 24, 2019
in Data Analysis
0
Regression using Tensorflow and Gradient descent optimizer
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter

Gradient descent is the most popular optimization algorithm, used in machine learning and deep learning. Gradient descent is iterative optimization algorithm for finding the local minima. To find local minima using gradient descent, one takes steps proportional to the negative of the gradient of the function at the current point.

Here in this tutorial will use Gradient descent optimization algorithm. In our example we have data in csv format with columns “height weight age projects salary”. Assuming there is a correlation between projects and salary will try predict salary given projetcs completed. You download data using this link : “https://drive.google.com/file/d/1Gx0riTlJHt9o_VyokrKNbj384AhwXpAW/view?usp=sharing”

-Advertisement-Online data science courses to jumpstart your future.

Initial Setup

First and foremost, we need to load the necessary libraries.

from __future__ import print_function

import math ##For basic mathematical operations

from IPython import display ## Plot setup for Ipython
from matplotlib import cm ##  Colormap reference
from matplotlib import gridspec ##plot setups
from matplotlib import pyplot as plt ##plot setups
import numpy as np 
import pandas as pd
from sklearn import metrics
import tensorflow as tf
from tensorflow.python.data import Dataset

from google.colab import drive ## Loading data directly from Google Drive
drive.mount('/content/gdrive') ## Mounting drive

tf.logging.set_verbosity(tf.logging.ERROR)
pd.options.display.max_rows = 10
pd.options.display.float_format = '{:.1f}'.format

Loading Dataset

Load data-set as pandas dataframe and check stats.

dataframe = pd.read_csv("/content/gdrive/My Drive/Colab Notebooks/TENSOR_FLOW/train_dataset.csv", sep=",")
dataframe.head()
height weight age projects salary
0 -114.3 34.2 15 1015 66900
1 -114.5 34.4 19 1129 80100
2 -114.6 33.7 17 333 85700
3 -114.6 33.6 14 515 73400
4 -114.6 33.6 20 624 65500
dataframe.describe()

		height	weight	age	projects salary
	count	17000.0	17000.0	17000.0	17000.0	17000.0
	mean	-119.6	35.6	28.6	1429.6	207300.9
	std	2.0	2.1	12.6	1147.9	115983.8
	min	-124.3	32.5	1.0	3.0	14999.0
	25%	-121.8	33.9	18.0	790.0	119400.0
	50%	-118.5	34.2	29.0	1167.0	180400.0
	75%	-118.0	37.7	37.0	1721.0	265000.0
	max	-114.3	42.0	52.0	35682.0	500001.0
dataframe = dataframe.reindex(np.random.permutation(dataframe.index))
dataframe["salary"] /= 1000.0
dataframe.head()
height  weight  age projects    salary
11381 -121.2 38.9 19 1206 192.6
4865 -118.1 34.1 50 636 500.0
3442 -117.9 33.8 35 1435 200.8
14934 -122.2 37.8 52 409 189.6
14925 -122.2 37.8 52 1659 107.9

Build our First Model

We wish to predict Salary, which will be our label. We’ll use projects as our input feature. To train our model, we’ll use the LinearRegressor interface provided by the TensorFlow Estimator API. This API takes care of a lot of the low-level model fixing and exposes convenient methods for performing model training, evaluation, and inference.

Step 1: Define Features and Configure Feature Columns

In TensorFlow, we indicate a feature’s data type using a construct called a feature column. Feature columns store only a description of the feature data; they do not contain the feature data itself.

To start, we’re going to use just one numeric input feature, projects.

my_feature = dataframe[["projects"]]
feature_columns = [tf.feature_column.numeric_column("projects")]

feature_columns

[NumericColumn(key='projects', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None)]

Step 2: Define the Target

Next, we’ll define our target, which issalary Again, we can pull it from our dataframe:

targets = dataframe["salary"]
targets
11381   192.6
4865 500.0
3442 200.8
14934 189.6
14925 107.9


7869 269.2
3770 192.9
11859 194.6
10158 167.7
14422 500.0
Name: salary, Length: 17000, dtype: float64

Step 3: Configure the LinearRegressor

Next, we’ll configure a linear regression model using LinearRegressor. We’ll train this model using the GradientDescentOptimizer, which implements Mini-Batch Stochastic Gradient Descent (SGD). The learning_rate argument controls the size of the gradient step.

my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=0.0000001)

my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)


linear_regressor = tf.estimator.LinearRegressor(
    feature_columns=feature_columns,
    optimizer=my_optimizer
)

Step 4: Define the Input Function

To import our salary data into our LinearRegressor, we need to define an input function, which instructs TensorFlow how to preprocess the data, as well as how to batch, shuffle, and repeat it during model training.

First, we’ll convert our pandas feature data into a dict of NumPy arrays. We can then use the TensorFlow Dataset API to construct a dataset object from our data, and then break our data into batches of batch_size, to be repeated for the specified number of epochs (num_epochs).

NOTE: When the default value of num_epochs=None is passed to repeat(), the input data will be repeated indefinitely.

Next, if shuffle is set to True, we’ll shuffle the data so that it’s passed to the model randomly during training. The buffer_size argument specifies the size of the dataset from which shuffle will randomly sample.

Finally, our input function constructs an iterator for the dataset and returns the next batch of data to the LinearRegressor.

def my_input_fn(features, targets, batch_size=1, shuffle=True, num_epochs=None):
  
    # Convert pandas data into a dict of np arrays.
    features = {key:np.array(value) for key,value in dict(features).items()}                                           
    
    # Construct a dataset, and configure batching/repeating.
    ds = Dataset.from_tensor_slices((features,targets)) # warning: 2GB limit
    ds = ds.batch(batch_size).repeat(num_epochs)
    
    # Shuffle the data, if specified.
    if shuffle:
      ds = ds.shuffle(buffer_size=10000)
    
    # Return the next batch of data.
    features, labels = ds.make_one_shot_iterator().get_next()
    return features, labels

Step 5: Train the Model

We can now call train() on our linear_regressor to train the model. We’ll wrap my_input_fn in a lambda so we can pass in my_feature and target as arguments (see this TensorFlow input function tutorial for more details), and to start, we’ll train for 100 steps.

_ = linear_regressor.train(
input_fn = lambda:my_input_fn(my_feature, targets),
steps=100
)

Tweak the Model Hyperparameters and optimize model

For this exercise, we’ve put all the above code in a single function for convenience. You can call the function with different parameters to see the effect.

In this function, we’ll proceed in 10 evenly divided periods so that we can observe the model improvement at each period.

For each period, we’ll compute and graph training loss. This may help you judge when a model is converged, or if it needs more iterations.

We’ll also plot the feature weight and bias term values learned by the model over time. This is another way to see how things converge.

def train_model(learning_rate, steps, batch_size, input_feature="projects"):
 
  periods = 10
  steps_per_period = steps / periods

  my_feature = input_feature
  my_feature_data = dataframe[[my_feature]]
  my_label = "salary"
  targets = dataframe[my_label]

  # feature columns.
  feature_columns = [tf.feature_column.numeric_column(my_feature)]
  
  # input functions.
  training_input_fn = lambda:my_input_fn(my_feature_data, targets, batch_size=batch_size)
  prediction_input_fn = lambda: my_input_fn(my_feature_data, targets, num_epochs=1, shuffle=False)
  
  # linear regressor object.
  my_optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
  my_optimizer = tf.contrib.estimator.clip_gradients_by_norm(my_optimizer, 5.0)
  linear_regressor = tf.estimator.LinearRegressor(
      feature_columns=feature_columns,
      optimizer=my_optimizer
  )

  # plot
  plt.figure(figsize=(15, 6))
  plt.subplot(1, 2, 1)
  plt.title("Learned Line by Period")
  plt.ylabel(my_label)
  plt.xlabel(my_feature)
  sample = dataframe.sample(n=300)
  plt.scatter(sample[my_feature], sample[my_label])
  colors = [cm.coolwarm(x) for x in np.linspace(-1, 1, periods)]

  # Training
  print("Training model...")
  print("RMSE (on training data):")
  root_mean_squared_errors = []
  for period in range (0, periods):
    
    linear_regressor.train(
        input_fn=training_input_fn,
        steps=steps_per_period
    )
    
    predictions = linear_regressor.predict(input_fn=prediction_input_fn)
    predictions = np.array([item['predictions'][0] for item in predictions])
    
   
    root_mean_squared_error = math.sqrt(
        metrics.mean_squared_error(predictions, targets))
    
    print("  period %02d : %0.2f" % (period, root_mean_squared_error))
   
    root_mean_squared_errors.append(root_mean_squared_error)
   
    
    y_extents = np.array([0, sample[my_label].max()])
    
    weight = linear_regressor.get_variable_value('linear/linear_model/%s/weights' % input_feature)[0]
    bias = linear_regressor.get_variable_value('linear/linear_model/bias_weights')

    x_extents = (y_extents - bias) / weight
    x_extents = np.maximum(np.minimum(x_extents,
                                      sample[my_feature].max()),
                           sample[my_feature].min())
    y_extents = weight * x_extents + bias
    plt.plot(x_extents, y_extents, color=colors[period]) 
  print("Model training finished.")

  
  plt.subplot(1, 2, 2)
  plt.ylabel('RMSE')
  plt.xlabel('Periods')
  plt.title("Root Mean Squared Error vs. Periods")
  plt.tight_layout()
  plt.plot(root_mean_squared_errors)

  
  calibration_data = pd.DataFrame()
  calibration_data["predictions"] = pd.Series(predictions)
  calibration_data["targets"] = pd.Series(targets)
  display.display(calibration_data.describe())

  print("Final RMSE (on training data): %0.2f" % root_mean_squared_error)

Training: Achieve an RMSE of 180 or Below

Tweak the model hyperparameters to improve loss and better match the target distribution. If, after 5 minutes or so, you’re having trouble beating a RMSE of 180, check the solution for a possible combination.

train_model(
    learning_rate=0.00002,
    steps=500,
    batch_size=3
)
Training model

RMSE (on training data):
period 00 : 0.27
period 01 : 0.27
period 02 : 0.27
period 03 : 0.24
period 04 : 0.27
period 05 : 0.27
period 06 : 0.27
period 07 : 0.18
period 08 : 0.18
period 09 : 0.18
Model training finished.
predictions targets 
count 17000.0 17000.0
mean 0.1 0.2
std 0.1 0.1
min 0.0 0.0
25% 0.0 0.1
50% 0.1 0.2
75% 0.1 0.3
max 2.2 0.5

Related

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

July 12, 2023
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 29, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 29, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

June 29, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

June 29, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

June 29, 2023
Next Post
Will SpaceX’s Starlink Mega-Constellation ‘Ruin’ Astronomy’s Biggest Ever Eye On The Sky?

Will SpaceX’s Starlink Mega-Constellation ‘Ruin’ Astronomy’s Biggest Ever Eye On The Sky?

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Unveiling the Hidden Power of Market Research: A Game Changer
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2023 Globalresearchsyndicate.com

No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2023 Globalresearchsyndicate.com

Login to your account below

Forgotten Password?

Fill the forms bellow to register

All fields are required. Log In

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT