Beginners in the field of machine learning face numerous challenges in trying to cope up with the fast-paced nature of AI. It is especially difficult for people with no coding experience since they have to learn the math behind the algorithm and learn how to code the algorithm as well. To make things a little easier for them, a no-code machine learning GUI called KNIME was developed.
In this article, we will learn about KNIME and discuss how to use this tool for building a machine learning model from scratch.
What is KNIME?
Knime is a GUI based workflow platform that can be used to effectively build machine learning models without having to code. Here, you simply have to define the workflow between some pre-defined nodes. These nodes may be for data cleaning, data visualization and model training. Once the workflow is defined, the model can be trained to get the desired output. All functions from basic input-output operations to data mining can be performed with KNIME.
To download this interface click here and select the operating system as per your computer requirements.
For Windows users select the first option above and the download will begin. Once the download is completed follow the steps shown and you will then see a KNIME dashboard before you.
Creating a workflow
To create the machine learning model we first need to set up a workflow. For this, select File-> New and select a new workflow.
You will get a popup where you can type in the name of the project.
Click on the finish to get the new workflow before you.
On the right-hand side, you can type in the description of the projects, any links for reference as well. The left-hand side is where you will be creating the workflow.
Getting the dataset
Now that we have created our workspace, let us get the dataset. To do this, first, download the dataset that you want to use for the project. I have used the tips dataset from Kaggle, which can be downloaded from here. The dataset contains values like a smoker, time, day and total_bill which is used to predict how many tips a waiter will get. It is a regression problem and is a simple project. After downloading the data, go to your node repository and search ‘file reader’. Drag and drop this on the workspace.
Then, double click this and browse the dataset on your local system and upload the file. Once you do that you will get a preview of the dataset.
Here you can select options like ignore tab spaces, reading the column headings etc. After you have selected the desired options select apply and ok. Once one, right-click on the node and select ‘Execute’ button so that it is executed.
The next step is to identify the correlation that exists between the features. To do this search ‘Linear correlation’ on the node repository and drag and drop it to the workspace. Then, connect your dataset to this node.
Now, right-click on this and click on ‘execute’. After executing this, right-click again and click on ‘view correlation matrix’. Once you select this you will see the matrix.
Some columns are not related much with the others hence it is clear that tip and total_bill have a very high correlation. Let us select these two columns to build the model.
The next step in model building is to visualize the dataset. To do this, search the type of plot you want to visualize. I have selected the scatter plot for the visualization. Drag and drop this node on the workspace and connect your file reader node with it. Once done, right-click and select execute.
Here you can change the columns as well to find out how the data is scattered. There are other visualization methods as well like pie charts as shown below
In order to find out which values are missing, type in the node repository ‘missing values’. Drag and drop this node and connect with the input file reader.
Next, double click on the missing values node. Here you will find a dashboard that lets you impute values in the dataset.
These options allow you to impute values either as number or string. I have select to impute the missing values with the mean value. But you can choose from the below options according to your requirements.
After selecting this, you can select apply and ok and the missing values are filled automatically. Finally, right-click and select the execute option to run the node.
After we have pre-processed and visualized the data it is time to build a model. I will make use of the simple linear regression model on this dataset. To do this, type linear regression learner in the node repository and drag and drop this on the workspace. Connect the missing values node to this since it has the pre-processed data.
Now double click the linear regression node. The following is displayed.
Here on the top you need to set the target column. Once you set this the target is automatically removed from the inputs shown below. You can choose to eliminate some of the features as well. I will eliminate a few features since there was not much correlation between them with the target. Just select the column to be excluded and click the left arrow button to do this.
Once done, select the apply button. Next, right-click the node and select execute. Once the execution is done you can see the output on the screen.
The different types of errors and the R-squared value is shown and the results are quite good here.
Thus we have built a machine learning model without coding.
In this article, we saw how simple it is to use the KNIME GUI and build a machine learning model. There is a lot left to explore in this tool for building better and more complex models. KNIME also supports building neural networks and clustering algorithms which is making machine learning easy and accessible to everyone.
If you loved this story, do join our Telegram Community.
Also, you can write for us and be one of the 500+ experts who have contributed stories at AIM. Share your nominations here.