GLOBAL RESEARCH SYNDICATE
No Result
View All Result
  • Login
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights
No Result
View All Result
globalresearchsyndicate
No Result
View All Result
Home Data Analysis

Inference C++ Models Using SageMaker Processing : idk.dev

globalresearchsyndicate by globalresearchsyndicate
August 5, 2020
in Data Analysis
0
Inference C++ Models Using SageMaker Processing : idk.dev
0
SHARES
15
VIEWS
Share on FacebookShare on Twitter

Machine learning has existed for decades. Before the prevalence of doing machine learning with Python, many other languages such as Java, and C++ were used to build models. Refactoring legacy models in C++ or Java could be forbiddingly expensive and time consuming. Customers need to know how they can bring their legacy models in C++ to the cloud, so that they can run model inference faster and at a lower cost.

Amazon SageMaker Processing is a new capability of Amazon SageMaker for running processing and model evaluation workloads with a fully managed experience. Amazon SageMaker Processing enables customers to run analytics jobs for data engineering and model evaluation on Amazon SageMaker easily, and at scale. SageMaker Processing allows customers to enjoy the benefits of a fully managed environment with all the security and compliance built into Amazon SageMaker.

In this blog post, we demonstrate inferencing a C++ model using SageMaker Processing. We first explain the C++ program we use to represent a simple linear regression model, and the Python script we use to run inference. Then, we build a custom container that contains the C++ model and Python script. Lastly, we run a SageMaker ScriptProcessor job for inference. The code from this post is available in the GitHub repo.

Prerequisites

To run this code, you need to have permissions to access Amazon S3, push a Docker image to Amazon ECR, and create SageMaker Processing jobs.

Prepare a C++ Model

We use a simple C++ test file for demonstration purposes. This C++ program accepts input data as a series of strings separated by a comma. For example, “2,3“ represents a row of input data, labeled 2 and 3 in two separate columns.

We use a simple linear regression model y=x1 + x2 in this blog post for demonstration purposes. Customer can modify the C++ inference code to inference more realistic and sophisticated models.  The C++ code is made up of the following steps:

  • Receives data record for inferencing from C++ command line parameters.
  • Parses out data columns and stores data in a C++ vector. We use “,” to separate data columns.
  • Loops through data columns and calculates the sum.
  • Prints out the result to standard output stream.

We can compile the C++ program to an executable file using g++. The complete C++ script is shown in the following code:

#include <sstream>
#include <iostream>
#include <string>
#include <vector>
#include <sstream>
#include <iostream>
using namespace std;

void print(std::vector<int> const &input)
{
    for (int i = 0; i < input.size(); i++)
    {
        std::cout << input.at(i);
        if (i!=input.size()-1)
            cout<< ',';
    }
}


std::vector<std::string> split(const std::string& s, char delimiter)
{
   std::vector<std::string> tokens;
   std::string token;
   std::istringstream tokenStream(s);
   while (std::getline(tokenStream, token, delimiter))
   {
      tokens.push_back(token);
   }
   return tokens;
}


int main(int argc, char* argv[])
{
    vector<int> result;
    int counter = 0;
    int result_temp = 0;
    
    //assuming one argv
    string t1(argv[1]);
    vector<string> temp_str = split(t1, ',');
    vector<string>::iterator pos; 

    for (pos = temp_str.begin(); pos < temp_str.end(); pos++)
    {
        int temp_int;
        istringstream(*pos) >> temp_int;
        
        if (counter == 0)
        {
            result_temp += temp_int;
            counter++;
            continue;
        }
        if (counter == 1)
            result_temp += temp_int;
            result.push_back(result_temp);
            result_temp = 0;
            counter = 0;
    }    
    print(result);
    return 0;
}

Create a SageMaker Processing script

This notebook uses the ScriptProcessor class from the Amazon SageMaker Python SDK. The ScriptProcessor class runs a Python script with your own Docker image that processes input data, and saves the processed data in Amazon S3.  For more information, review Run Scripts with Your own Processing Container.

When the processing job starts, the data files are automatically downloaded by SageMaker from S3 to the designated local directory in the processing compute instance.

Your Python script, process_script.py, first finds all data files under /opt/ml/processing/input/ directory. By default, when you use multiple instances, the data files from S3 are duplicated to each processing compute instance. That means every instance gets the full dataset. By setting s3_data_distribution_type='ShardedByS3Key' , each instance gets approximately 1/n of the number of total input date files, where n is the number of compute instances. For more effective parallel processing, partition input data into multiple files to help ensure each node processes a different set of input data.

The Python script reads each data file into memory and converts it into a long string ready for C++ executable to consume. The subprocess module from Python runs the C++ executable and connects to output and error pipes. The output is saved as a CSV file to /opt/ml/processing/output directory. Upon completion, SageMaker Processing uploads output files in this directory from every Processing instance to Amazon S3.

def call_one_exe(a):
    p = subprocess.Popen(["./a.out",
 a],stdout=subprocess.PIPE)
    p_out, err= p.communicate()
    output = p_out.decode("utf-8")
    return output.split(',')

if __name__=='__main__':
    parser = argparse.ArgumentParser()
    #user can pass their own argument from Processor. 
    
    args, _ = parser.parse_known_args()
    print('Received arguments {}'.format(args))
    
    files = glob('/opt/ml/processing/input/*.csv')
    for i, f in enumerate(files):
        try:
            print(f)
            data = pd.read_csv(f, header=None)
            string = str(list(data.values.flat)).replace(' ','')[1:-1]
            predictions = call_one_exe(string)
            output_path = os.path.join('/opt/ml/processing/output', str(i)+'_out.csv')
            print('Saving training features to {}'.format(output_path))
            pd.DataFrame({'results':predictions}).to_csv(output_path, header=False, index=False)
        except Exception as e:
            print(str(e))
            

Build your own SageMaker Processing container

The processing container is defined as shown in the following image. We have Anaconda and Pandas installed into the container. a.out is the C++ executable that contains the model inference logic. process_script.py is the Python script we use to call C++ executable and save results. We explain more about the C++ program and process_script.py in a later paragraph. Now let us build the Docker container and push it to Amazon ECR. The Dockerfile looks like the following code:

FROM ubuntu:16.04

RUN apt-get update && 
    apt-get -y install build-essential libatlas-dev git wget curl 

RUN curl -LO http://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && 
    bash Miniconda3-latest-Linux-x86_64.sh -bfp /miniconda3 && 
    rm Miniconda3-latest-Linux-x86_64.sh

ENV PATH=/miniconda3/bin:${PATH}

RUN conda update -y conda && 
    conda install -c anaconda scipy

# Python won’t try to write .pyc or .pyo files on the import of source modules
# Force stdin, stdout and stderr to be totally unbuffered. Good for logging
ENV PYTHONDONTWRITEBYTECODE=1 PYTHONUNBUFFERED=1 PYTHONIOENCODING=UTF-8 LANG=C.UTF-8 LC_ALL=C.UTF-8

RUN pip install --no-cache -I scikit-learn==0.20.0 pandas==1.0.3 boto3 sagemaker retrying
ADD process_script.py /
ADD a.out /

Set up the ScriptProcessor and run your script

We have 10 sample data files included in this demo. Each file contains 5000 rows of arbitrarily generated data. We first upload these files to Amazon S3. We use one  ml.c5.xlarge instance for inference. You can increase the number of instance counts for a bigger dataset. Amazon SageMaker Processing runs the script in similar way as the following command, where EntryPoint is process_script.py and ImageUri is the Docker image we built earlier.

docker run --entry-point [EntryPoint] [ImageUri]

The SageMaker Processing job is set up as following,

role = get_execution_role()
script_processor = ScriptProcessor(command=['python3'],
                image_uri=Account_number + '.dkr.ecr.us-east-1.amazonaws.com/cpp_processing:latest',
                role=role,
                instance_count=1,
                base_job_name = 'run-exe-processing',
                instance_type='ml.c5.xlarge')
output_location = os.path.join('s3://',default_s3_bucket, 'processing_output')
script_processor.run(code='process_script.py',
                     inputs=[ProcessingInput(
                        source=input_data,
                        destination='/opt/ml/processing/input')],
                      outputs=[ProcessingOutput(source='/opt/ml/processing/output',
                                               destination=output_location)]
                    )
 
       

After the processing job starts, Amazon SageMaker displays job progress. Information such as Job Name, input and output locations are reported. Upon completion, we can review a few rows of the output to make sure that the processing job was successful.

print('Top 5 rows from 1_out.csv')
!aws s3 cp $output_location/0_out.csv - | head -n5

Conclusion

In this post, we used Amazon SageMaker Processing to run inference on C++ models. Customers can bring legacy C++ models to SageMaker for faster inference at a lower cost. For more information, review Amazon SageMaker Processing.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Related Posts

How Machine Learning has impacted Consumer Behaviour and Analysis
Consumer Research

How Machine Learning has impacted Consumer Behaviour and Analysis

January 4, 2024
Market Research The Ultimate Weapon for Business Success
Consumer Research

Market Research: The Ultimate Weapon for Business Success

June 22, 2023
Unveiling the Hidden Power of Market Research A Game Changer
Consumer Research

Unveiling the Hidden Power of Market Research: A Game Changer

June 2, 2023
7 Secrets of Market Research Gurus That Will Blow Your Mind
Consumer Research

7 Secrets of Market Research Gurus That Will Blow Your Mind

May 8, 2023
The Shocking Truth About Market Research Revealed!
Consumer Research

The Shocking Truth About Market Research: Revealed!

April 25, 2023
market research, primary research, secondary research, market research trends, market research news,
Consumer Research

Quantitative vs. Qualitative Research. How to choose the Right Research Method for Your Business Needs

March 14, 2023
Next Post
Teamwork that opens doors to research, opportunities

Teamwork that opens doors to research, opportunities

Categories

  • Consumer Research
  • Data Analysis
  • Data Collection
  • Industry Research
  • Latest News
  • Market Insights
  • Marketing Research
  • Survey Research
  • Uncategorized

Recent Posts

  • Ipsos Revolutionizes the Global Market Research Landscape
  • How Machine Learning has impacted Consumer Behaviour and Analysis
  • Market Research: The Ultimate Weapon for Business Success
  • Privacy Policy
  • Terms of Use
  • Antispam
  • DMCA

Copyright © 2024 Globalresearchsyndicate.com

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
This website uses cookies to improve your experience. We'll assume you're ok with this, but you can opt-out if you wish. Cookie settingsACCEPT
Privacy & Cookies Policy

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these cookies, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may have an effect on your browsing experience.
Necessary
Always Enabled
Necessary cookies are absolutely essential for the website to function properly. This category only includes cookies that ensures basic functionalities and security features of the website. These cookies do not store any personal information.
Non-necessary
Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as non-necessary cookies. It is mandatory to procure user consent prior to running these cookies on your website.
SAVE & ACCEPT
No Result
View All Result
  • Latest News
  • Consumer Research
  • Survey Research
  • Marketing Research
  • Industry Research
  • Data Collection
  • More
    • Data Analysis
    • Market Insights

Copyright © 2024 Globalresearchsyndicate.com