# 03_MachineLearning

0 (0 Likes / 0 Dislikes)

Thank you.
So I will talk about new functions in machine learning,
new machine learning functions in Mathematica 10,
and actually the functions Predict and Classify.
So the goal of these functions is to do supervised learning,
meaning learning how to do a task from examples of this task.
So one of these tasks could be image recognition as Shadi showed as an example in a talk.
Another one would be text classification,
so for example you have a document and you want to classify this document depending on the topic,
they are about biology, physics, whatever, and in a general way the task is to predict the variable
as function of all the variables, such we call features.
In order to do that, you don’t have to classify and predict in the usual way—it needs quite a bit of expertise,
you need to know about algorithms, machine learning algorithms, you need to know about what kind
of preprocessing you have to do on the data, and all of these algorithms are parameters as well
and so you need to tune them, there’s some methods such as [] to find the optimal parameters.
So what we did with Classify and Predict is to automatize all these processes
to have automatic preprocessing, automatic choice of methods, and automatic arithmetic selection
in order for even a non-expert to do machine learning.
So here is a basic example of Classify.
The goal of Classify is from example,
from teacher of the example, to attribute a class, meaning a category called variable,
so it creates a category called “variable”.
Here you have a list of examples,
these are the most recent examples so when the teacher is “1” the class is “A”,
when the teacher is “3.2” the class is “B”.
From this first example, if you use Classify on them it will output a classifier function so it will choose a method and so on.
And this Classify function we can use on new example to predict a class.
So here if the teacher is “1.5” the class will be “A”; if I put “3.5” the class will be “B”.
There is a simple example of Predict.
It is very similar to Classify except it doesn’t predict classes but it predicts real values.
For example, here with the operator function that you can use on new examples.
Here is a more interesting example.
I’m loading here from an example that I am machine learning data about Titanic passengers.
For each passenger, we want to predict if the passenger survived
or not as function of some of the features of the passenger,
which is what class the passenger was traveling in, its age and its sex.
And you see, yes I am missing data in the dataset, but Classify and Predict can handle missing data.
So again I ran Classify on these.
I chose a method, you don’t have to choose a method actually,
if you don’t you have the best performance by not choosing a method,
but if you want to because you know the method you can use them as well.
So we output the Classifier function
so here on a new example of a second-class passenger, 28 year old male it is more likely that he died.
You can also ask the probabilities.
Here this passenger, according to the model of the data,
had about a 93% chance of dying to answer the question of chance of surviving.
This is an important thing that all the Classifier functions are probabilities;
you can always ask for the probabilities,
which is quite useful when you want to combine models or some type of decision with it.
Here is a plot of the surviving probability as a function of age,
for a different combination of the sex and the class.
Another example, which is similar to what Shadi showed, is on images.
I have here, from the example that I said of images of handwritten digits.
I can put the dataset into Classify
so here I just take ten thousand to make it a bit faster for this presentation
and I use a performance goal TrainingSpeed,
so it’s one of the values of performance goal to choose to give more information to Classify,
to what type of classifier you want—you want to classify other trainings fast,
that it’s fast to classify or small in memory all that you can specify with PerformanceGoal.
We have a work classifier function—in order to test it I am going to load some test data,
so some data that were not in the training set.
Here is a random sample of 10 of this test example,
so I can try on an example. For example, let’s try this one
If I put a 6, would it recognize a 6? OK, it recognized it.
And in order to be a bit more scientific,
we can use the function ClassifierMeasurements
where you input the classifier and the test data and it will output a ClassifierMeasurements object
on which you can query various properties.
For example, I want to know the accuracy of the Classifier on the dataset so here you have 93.6% accuracy.
I can also as it for the confusion matrix for plotting the confusion matrix,
which tells you here, for example, that 821 examples of 5 have been correctly classified
and here 17 examples of 5 have been misclassified as 6 in the test set.
And I can actually see what all these examples,
like querying examples like error 6 and here you have the examples that have been classified as 6.
I’m going to show you quickly a basic text example.
This is not a real life example so it will just try to recognize the text being about cats or dogs
and another example with images but this time with Predict.
Here are images of gauges with a value and we want to try to predict the value as a function of the images.
I run Predict—this is just one example.
And I can ask to create a function for new images so it is a predicted value of this gauge.
This would be close to it.
You can also ask the Distribution—so with Classify you could have Probabilities
and here you can ask for the continuous distribution.
So it said that given this image,
he belief of the predictor function is a normal distribution centered around 0.65 with this kind of standard deviation.
And here is just a tool to dynamically check your Classifier, your predictor.
Another part of Classify and Predict are built-in functions.
So they’re functions that you don’t need training sets to use them,
they’ve already been trained by us.
The first argument, instead of putting the training set you put the name,
for example I want to classify a language,
and for a string it will give you the language of the string.
Again, you can wrap that around Dynamic to test it,
which is quite a good way to test your function.
There are some other ones, such as determining the topic of a Facebook post,
and I think you can see the other one.
I think that’s all I have time to show.
If you want to know more about it there is of course documentation of Classify and Predict,
there’s a machine learning guide, and there’s “New in 10” pages
with nice examples and also a blog post about predicting,
this was done before the World Cup, who will win the World Cup,
which is an example of a full workflow of machine learning
with Mathematica on a real problem.
Thank you.