How to use the Amazon Machine Learning to
build Machine Learning model
earlier this month, AWS announced amazon machine learning services (Amazon Machine Learning), claiming that the new AWS services from amazon's internal data scientists to create a machine learning model technology, can help you use what you collected all the data to improve the quality of your decisions.You can use a large amount of data to set up and fine tune prediction model, and then the mass use of amazon machine learning to predict (in batch mode or in real-time mode).Even if there were no statistically higher diploma or to build, run and maintain your own processing and storage infrastructure are not familiar with, you can also benefit from the machine learning.
Machine learning based
In order to benefit from the machine learning, you need to have some existing data can be used for training.Imagine training data into a database or spreadsheet rows is helpful.Each line represents a single data element (a purchasing, shipping, or a directory entry).Column represents the attribute of the element: the customer zip code, the purchase price, the credit card type, the size of the project, and so on.
The training data must contain the actual example of the results.Complete the deal with lines represent, for example, whether legal or fraud, each line must contain a column as a target variable to represent the results.This data is used to create a machine learning model, when the new data submitted to the proposed deal would return the prediction about its effectiveness.Amazon machine learning support for three different types of prediction: binary classification, many class classification and regression analysis.Let's look at each one:
Binary classification is used to predict one of two possible results.It is legal deal?Customer will purchase this product?Delivery address is the apartment building?
More class classification is used to predict one of three or more likely outcome, as well as the possibility of each one.This product is a book about clothing, a movie, or an article?The film is a comedy, documentary, or a thriller?Which kind of products for the customers are most interested in?
Regression analysis was used to predict a number.How much inventory should be placed 27 inch monitors?We should spend how many money?What percentage of likely sold as a gift?
An appropriate training and one of the adjusted model can be used to answer the above questions.In some cases, using the same training data to establish the model of two or more than two is appropriate.
You should plan to spend some time to enrich your data, to ensure that it can be a very good match your training process.A simple example, you might begin with location data based on zip code.After some analysis, you probably find that you can use include or big or small the resolution of the different position said to improve the quality of the results.The training of the machine learning process is repeated, you need a clear plan to take some time to
understand and evaluate your initial results, and then use them to enrich your data.
You can use to provide you a set of performance indicators to measure the quality of each of your model.For example, the area under the curve (AUC) standard according to the performance of binary classification.This is within the scope of 0.0 to 1.0 a floating point value, how often it says the model prediction results in not trained data.As the model of quality rise, numerical increased from 0.5 to 0.5.The value of 0.5 is better than random guesses, and 0.9 in most cases is a good model data.But the value of 0.9999 is too good and hard to believe, and this value is likely to mean that there was a problem in training data.
When you build your binary model, you will need to take some time to observe the results and adjust the cut-off value.It represents the probability that the prediction is correct;Under certain circumstances, you can according to the false positive (forecast should be false, but be predicted to true) and false negative (forecast should be true, but be predicted for the false) the relative importance of the adjustment of high or low.If you are building a spam filter for email, false negatives will junk mail delivery to your inbox, and false positives will be your legal thrown into junk mail folders.In this case, the false positive is not ideal.Trade-off between false positives and false negatives is to rely on your business issues, and how are you going to use this model in the production.
Amazon machine learning field
Using the AWS Machine Leaning API, developers can create a new model in the Amazon RDS, using data from Amazon S3, Amazon Redshift or data in a MySQL database.Let us walk in the process of create a model, and according to the amazon machine learning developer tutorials section describe the steps to create some projections.You can register the amazon machine learning, and then if you like, you can according to the steps to use of the guide.Copies of the guide to use a slightly stronger, the copy is from the university of California, irvine, machine learning repository of publicly available marketing bank data set.We are about to finish the model will answer "users to subscribe to our new product?"
I downloaded the banking. A copy of the CSV, and upload it to amazon's simple storage service (S3), and then add a IAM agreed to let the console strategy, so that the amazon machine learning can access it:
Then, I through the reference of the bucket project to create an amazon machine learning data source object and provide a name for the object.This object contains the location of the data, variable names and types, and mentioned the name of the target variable, and descriptive statistics for each variable.Most of the amazon machine learning reference data.The following is my set up everything:
Amazon machine learning can also be from Amazon Redshift or Amazon RDS MySQL database to create a data source.Choose shown above Amazon Redshift scheme gave me into my Amazon Redshift the name of the cluster, as well as the database name, access credentials and the choice of SQL queries.Machine learning API can be used to from an Amazon RDS in create a data source for the MySQL database.
Amazon machine learning open and scan files, made a guess for variable type, and then puts forward the following solutions:
In this case, all the speculation is correct.If they are not all right, I can choose one or more lines, click change type to fix them.
Because I will use the data to create and evaluate a ML mode, I need to select training variables.In this group of data set, the training variable (Y) with a binary data type, so it adopts binary classification model is generated.
After a lot of hits I am ready to create the data source:
After a minute or two of my data source is ready:
As I suggested before, you can learn more about your data to improve your model.Amazon machine learning the console provides you with a variety of different tools, you can use them to get to know more information.For example, you can observe any variable values in a data source distribution.Here is my age variables in its own data source can see:
The next step is to create my model: