eldr.ai | Working with Unsupervised Data - Clustering
If you haven't already, please have a look at the
Quick Start
guide before going through this tutorial as
this guide is an extension of that.
In the Quick Start and
Multiple Outputs tutorials we looked
at Customer Data examples where we had inputs and output(s), and the idea was that ELDR AI learnt the links between them in order for us to make
future accurate predictions and get recommendations. This is the most common form of AI/Machine Learning and is called Supervised Learning.
It is Supervised because we are telling ELDR AI the outputs to learn.
However, there may be instances where we do not know what our outputs are - i.e. we have a load of data and have no idea how any of it is related (big data) - but we still
want to learn from it, make predictions, get recommendations and gain insights. Because we have no defined outputs to tell ELDR AI what to learn we call this
UnSupervised Learning.
In cases where we don't have any defined outputs (unsupervised learning), ELDR AI is able to process the entire
dataset and group together rows of data that are linked to each other - in a process called Clustering. With ELDR AI you can create
as many clusters as you want. E.g. if you want to split your data into e.g. two types of customer, five types of patient, ten types of behaviour, fifty types
of response etc. ELDR AI will find distinctly related groups for you, taking into account all inputs. You can then use this clustered data as you did in
the Quick Start or Multiple Output tutorials for Supervised Learning - the clusters (e.g. CustomerCluster1,CustomerCluster2) become our outputs.
Let's go through a full example to explain this.
Task: We have been given some Customer Data and we have no idea what it all means or how the data is related to each other -
but we want ELDR AI to define four customer types for us so that when a new customer visits our website we can:
* Predict what type of customer they are - so we can tailor their experience
* Recommend products to them based on what they are about to purchase - so we can make more sales
* Gain insights about the four types of customer - so we can inform the Marketing Strategy Team
(1) Download and have a look at the Customer Data No Output example
Click here to download a small CSV file (3Kb, 150 rows)
containing No Output Customer Data that we want ELDR AI to learn from.
You will see the data looks very similar to to the customer data from the Quick Start guide with ips and ipcs - however we now do not have an output (op).
(2) Create ELDR AI Data
As before, go to the Create Data and upload the file you have just downloaded. Call the data "Customer No Output".
Navigate to View Data
You will now see your newly-created Customer Data - with some key differences between what we saw in the Quick Start and Multi Output tutorials.
Clusters icon (red box) - this tells us that Clustering is available for this data
Unsupervised label (blue box) - this tells us our data is only suitable for unsupervised learning
No.Outputs is 0 (green box) - unlike previously, we don't have any outputs
(3) Cluster Data
Our data is uploaded, now let's get ELDR AI to cluster it for us and group together related data.
To start the clustering process, click the clusters icon (shown by the red box above)
You will be taken to the View Data screen, specifically the Clusters section.
By default, the "Number of Clusters" dropdown will show ~10th the number of rows of data (15 in this case). In this case we want ELDR AI to find four clusters
so select 4 from the dropdown and press "Cluster Data". For this small dataset clustering should take less than 20 seconds.
If clustering is successful you will see the clusters/groups displayed in chart and table form
During the clustering process ELDR AI looks at all fields collectively and groups them. By looking at the chart you can see how ELDR AI defines each cluster.
You will have noticed that three new buttons have now appeared:
Saved Clustered Data
This will store the generated clustered data against your original dataset so you don't need to do the
clustering process again.This is important because the clustering process (like all AI/Machine Learning) is initially
random before ELDR AI starts to learn. Although ELDR AI will find the same clusters each time, they may come back in a different order,
so it's always better to save the clusters when you're happy.
Click the "Saved Clustered Data" and you should get a success message:
After saving, go to the View Data page to see your clusters have been recorded (red box):
Click again on the Cluster icon (green box) to take you back to the Clusters Screen and you will find your clusters are already there in chart and table
form.
Download Clustered Data as CSV
Clicking this button will download the clustered data to your machine for your own processing and analysis e.g. you might
want to look over the clusters that ELDR AI has found in more detail before deciding what to do next.
Convert Clustered Data to Supervised Data
Our task is to get predictions, recommendations and insights from this unsupervised clustered data. To do this we need to convert the clustered data into a usable
format for ELDR AI to use e.g. Supervised Data.
In this example we have four clusters, which means 4 output options. If you remember from the Customer Multiple Output tutorial we need to use One Hot Encoding
to achieve this. ELDR AI has an in-built multi-output converter. We can use this here by pressing the "Convert Clustered Data to Supervised Data" button.
Press the button and you will get this:
In the Cluster(output)field, it will be called "Cluster" by default. As we have clustered Customers, change it to Customer as shown by the red box.
Right of the red box are all our input fields e.g. PostCode, CreditScore etc. with a confirmation underneath of what type of input they are (ip/ipc)
To complete the process, press "Create Supervised Learning Data".
If you go back to the View Data screen, you will now see your newly-created converted data:
You will now see a data entry has been made for you with the name "Customer No Output converted_1" (red box).
Each time you convert the data "_1" will increment to _2,_3 etc.
Also note, ELDR AI now recognises the converted data as supervised (green box) and we have the correct number of customer outputs (4, blue box).
We can inspect the converted data by clicking the "eye" icon (orange box):
Here you can see how ELDR AI has converted our 4 clusters into 4 customer output columns.
Good stuff, we have successfully converted some unsupervised data that we had no idea about, via clustering,
into Supervised Data for ELDR AI to learn from. Let's now create a model, train it and start getting some predictions, recommendations and insights...just as
we did in the Quick Start and Multi Output Tutorials previously.
(4) Create ELDR AI Model
Again, the default settings are fine for this demo. Enter "Customer Data Converted" as the Name and make sure you have the correct Data Source selected
(CSV|Customer No Output converted_1)..or whatever you called it.
Click the "Create" button and you will hopefully get a success message:
In the View Models screen you will now your newly-created Model:
(5) Train ELDR AI Model
As before, click the red graduation cap Training icon to go to the Training Screen, and Train the model by pressing the "Train Model" button.
(6) Get Predictions and Recommendations
Our task was to get predictions and recommendations from unsupervised customer data. Let's do that now.
As before, navigate to the Ask EDLR screen by pressing the question mark icon on the View Models screen:
We can now get some predictions and recommendations:
As before, have a look around at the other methods of Asking ELDR AI if you like, e.g. API, CSV etc.
(7) Gain Insights
Another part of our task was to gain insights from unsupervised customer data. Let's go ahead.
As in the previous tutorials, click the lightbulb icon to get to the Insights Screen:
Here we can see that Age has a big influence on this Customer Data Set.
And as seen in previous examples we can see that product purchasing similarities are apparent among the customer groups e.g. customer types who buy biscuits
are likely to be also interested in chocolate.
That concludes the ELDR AI guide to working with Unsupervised Data and Clusters