GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

Machine learning algorithms explained | InfoWorld

globalresearchsyndicate by globalresearchsyndicate
November 23, 2019
in Data Analysis
0
Machine learning algorithms explained | InfoWorld
0
SHARES
9
VIEWS
Share on FacebookShare on Twitter

Machine learning and deep learning have been widely embraced, and even more widely misunderstood. In this article, I’d like to step back and explain both machine learning and deep learning in basic terms, discuss some of the most common machine learning algorithms, and explain how those algorithms relate to the other pieces of the puzzle of creating predictive models from historical data.

What are machine learning algorithms?

Recall that machine learning is a class of methods for automatically creating models from data. Machine learning algorithms are the engines of machine learning, meaning it is the algorithms that turn a data set into a model. Which kind of algorithm works best (supervised, unsupervised, classification, regression, etc.) depends on the kind of problem you’re solving, the computing resources available, and the nature of the data.

How machine learning works

Ordinary programming algorithms tell the computer what to do in a straightforward way. For example, sorting algorithms turn unordered data into data ordered by some criteria, often the numeric or alphabetical order of one or more fields in the data.

Linear regression algorithms fit a straight line, or another function that is linear in its parameters such as a polynomial, to numeric data, typically by performing matrix inversions to minimize the squared error between the line and the data. Squared error is used as the metric because you don’t care whether the regression line is above or below the data points; you only care about the distance between the line and the points.

Nonlinear regression algorithms, which fit curves that are not linear in their parameters to data, are a little more complicated, because, unlike linear regression problems, they can’t be solved with a deterministic method. Instead, the nonlinear regression algorithms implement some kind of iterative minimization process, often some variation on the method of steepest descent.    

Steepest descent basically computes the squared error and its gradient at the current parameter values, picks a step size (aka learning rate), follows the direction of the gradient “down the hill,” and then recomputes the squared error and its gradient at the new parameter values. Eventually, with luck, the process converges. The variants on steepest descent try to improve the convergence properties.

Machine learning algorithms are even less straightforward than nonlinear regression, partly because machine learning dispenses with the constraint of fitting to a specific mathematical function, such as a polynomial. There are two major categories of problems that are often solved by machine learning: regression and classification. Regression is for numeric data (e.g. What is the likely income for someone with a given address and profession?) and classification is for non-numeric data (e.g. Will the applicant default on this loan?).

Prediction problems (e.g. What will the opening price be for Microsoft shares tomorrow?) are a subset of regression problems for time series data. Classification problems are sometimes divided into binary (yes or no) and multi-category problems (animal, vegetable, or mineral).

Supervised learning vs. unsupervised learning

Independent of these divisions, there are another two kinds of machine learning algorithms: supervised and unsupervised. In supervised learning, you provide a training data set with answers, such as a set of pictures of animals along with the names of the animals. The goal of that training would be a model that could correctly identify a picture (of a kind of animal that was included in the training set) that it had not previously seen.

In unsupervised learning, the algorithm goes through the data itself and tries to come up with meaningful results. The result might be, for example, a set of clusters of data points that could be related within each cluster. That works better when the clusters don’t overlap.

Training and evaluation turn supervised learning algorithms into models by optimizing their parameters to find the set of values that best matches the ground truth of your data. The algorithms often rely on variants of steepest descent for their optimizers, for example stochastic gradient descent (SGD), which is essentially steepest descent performed multiple times from randomized starting points. Common refinements on SGD add factors that correct the direction of the gradient based on momentum or adjust the learning rate based on progress from one pass through the data (called an epoch) to the next.

Data cleaning for machine learning

There is no such thing as clean data in the wild. To be useful for machine learning, data must be aggressively filtered. For example, you’ll want to:

  1. Look at the data and exclude any columns that have a lot of missing data.
  2. Look at the data again and pick the columns you want to use for your prediction. (This is something you may want to vary when you iterate.)
  3. Exclude any rows that still have missing data in the remaining columns.
  4. Correct obvious typos and merge equivalent answers. For example, U.S., US, USA, and America should be merged into a single category.
  5. Exclude rows that have data that is out of range. For example, if you’re analyzing taxi trips within New York City, you’ll want to filter out rows with pick-up or drop-off latitudes and longitudes that are outside the bounding box of the metropolitan area.

There is a lot more you can do, but it will depend on the data collected. This can be tedious, but if you set up a data-cleaning step in your machine learning pipeline you can modify and repeat it at will.

Data encoding and normalization for machine learning

To use categorical data for machine classification, you need to encode the text labels into another form. There are two common encodings.

One is label encoding, which means that each text label value is replaced with a number. The other is one-hot encoding, which means that each text label value is turned into a column with a binary value (1 or 0). Most machine learning frameworks have functions that do the conversion for you. In general, one-hot encoding is preferred, as label encoding can sometimes confuse the machine learning algorithm into thinking that the encoded column is ordered.

To use numeric data for machine regression, you usually need to normalize the data. Otherwise, the numbers with larger ranges may tend to dominate the Euclidian distance between feature vectors, their effects can be magnified at the expense of the other fields, and the steepest descent optimization may have difficulty converging. There are a number of ways to normalize and standardize data for ML, including min-max normalization, mean normalization, standardization, and scaling to unit length. This process is often called feature scaling.

What are machine learning features?

Since I mentioned feature vectors in the previous section, I should explain what they are. First of all, a feature is an individual measurable property or characteristic of a phenomenon being observed. The concept of a “feature” is related to that of an explanatory variable, which is used in statistical techniques such as linear regression. Feature vectors combine all of the features for a single row into a numerical vector.

Part of the art of choosing features is to pick a minimum set of independent variables that explain the problem. If two variables are highly correlated, either they need to be combined into a single feature, or one should be dropped. Sometimes people perform principal component analysis to convert correlated variables into a set of linearly uncorrelated variables.

Some of the transformations that people use to construct new features or reduce the dimensionality of feature vectors are simple. For example, subtract Year of Birth from Year of Death and you construct Age at Death, which is a prime independent variable for lifetime and mortality analysis. In other cases, feature construction may not be so obvious.

Common machine learning algorithms

There are dozens of machine learning algorithms, ranging in complexity from linear regression and logistic regression to deep neural networks and ensembles (combinations of other models). However, some of the most common algorithms include:

  • Linear regression, aka least squares regression (for numeric data)
  • Logistic regression (for binary classification)
  • Linear discriminant analysis (for multi-category classification)
  • Decision trees (for both classification and regression)
  • Naïve Bayes (for both classification and regression)
  • K-Nearest Neighbors, aka KNN (for both classification and regression)
  • Learning Vector Quantization, aka LVQ (for both classification and regression)
  • Support Vector Machines, aka SVM (for binary classification)
  • Random Forests, a type of “bagging” ensemble algorithm (for both classification and regression)
  • Boosting methods, including AdaBoost and XGBoost, are ensemble algorithms that create a series of models where each new model tries to correct errors from the previous model (for both classification and regression)

Where are the neural networks and deep neural networks that we hear so much about? They tend to be compute-intensive to the point of needing GPUs or other specialized hardware, so you should use them only for specialized problems, such as image classification and speech recognition, that aren’t well-suited to simpler algorithms. Note that “deep” means that there are many hidden layers in the neural network.

For more on neural networks and deep learning, see “What deep learning really means.”

Hyperparameters for machine learning algorithms

Machine learning algorithms train on data to find the best set of weights for each independent variable that affects the predicted value or class. The algorithms themselves have variables, called hyperparameters. They’re called hyperparameters, as opposed to parameters, because they control the operation of the algorithm rather than the weights being determined.

The most important hyperparameter is often the learning rate, which determines the step size used when finding the next set of weights to try when optimizing. If the learning rate is too high, the gradient descent may quickly converge on a plateau or suboptimal point. If the learning rate is too low, the gradient descent may stall and never completely converge.

Many other common hyperparameters depend on the algorithms used. Most algorithms have stopping parameters, such as the maximum number of epochs, or the maximum time to run, or the minimum improvement from epoch to epoch. Specific algorithms have hyperparameters that control the shape of their search. For example, a Random Forest Classifier has hyperparameters for minimum samples per leaf, max depth, minimum samples at a split, minimum weight fraction for a leaf, and about 8 more.

Hyperparameter tuning

Several production machine-learning platforms now offer automatic hyperparameter tuning. Essentially, you tell the system what hyperparameters you want to vary, and possibly what metric you want to optimize, and the system sweeps those hyperparameters across as many runs as you allow. (Google Cloud hyperparameter tuning extracts the appropriate metric from the TensorFlow model, so you don’t have to specify it.)

There are three search algorithms for sweeping hyperparameters: Bayesian optimization, grid search, and random search. Bayesian optimization tends to be the most efficient.

You would think that tuning as many hyperparameters as possible would give you the best answer. However, unless you are running on your own personal hardware, that could be very expensive. There are diminishing returns, in any case. With experience, you’ll discover which hyperparameters matter the most for your data and choice of algorithms.

Automated machine learning

Speaking of choosing algorithms, there is only one way to know which algorithm or ensemble of algorithms will give you the best model for your data, and that’s to try them all. If you also try all the possible normalizations and choices of features, you’re facing a combinatorial explosion.

Trying everything is impractical to do manually, so of course machine learning tool providers have put a lot of effort into releasing AutoML systems. The best ones combine feature engineering with sweeps over algorithms and normalizations. Hyperparameter tuning of the best model or models is often left for later. Feature engineering is a hard problem to automate, however, and not all AutoML systems handle it.

In summary, machine learning algorithms are just one piece of the machine learning puzzle. In addition to algorithm selection (manual or automatic), you’ll need to deal with optimizers, data cleaning, feature selection, feature normalization, and (optionally) hyperparameter tuning.

When you’ve handled all of that and built a model that works for your data, it will be time to deploy the model, and then update it as conditions change. Managing machine learning models in production is, however, a whole other can of worms.

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
Interview with Prof Dr. Med Thomas Skutella on Stem Cells

Interview with Prof Dr. Med Thomas Skutella on Stem Cells

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com