Kontera

Sunday, January 26, 2014

Multiple ROC Curve for multiple number of classifiers

In my previous post I have explained about how to generate a single ROC Curve for single classifier, but in the practical cases you will need to generate Multiple ROC Curve for multiple numbers of classifiers. This is very important to evaluate the performance of the classifiers. The following steps will gives you the complete idea of how to draw Multiple ROC Curve for multiple numbers of classifiers.
Open the Weka tool and select the knowledge flow tab.
Figure 1
Figure 2
  • When it is loaded, the first thing is to select ArffLoader from the DataSources menu, which is used to input the data. Drag ArffLoader into the knowledge flow layout as shown in figure 3.
  • From the evaluation tab select the ClassAssigner and put it into the knowledge flow layout.
  • From the evaluation tab select the ClassValuePicker and put it into the knowledge flow layout.
  • From the evaluation tab select the CrossValidationFoldMaker (we have using 10 fold cross validation) and put it into the knowledge flow layout.
  • Next step is to choose the classifiers from classifiers tab, in this tutorial I am using Random Forest (RF) and Naïve Bayes as classifiers. Select RF from trees tab and Naïve Bayes from bayes tab.
  • We are going perform the performance of the classifiers, for that from the evaluation tab select the ClassifierPerformanceEvaluator(we need two performance evaluator one for each classifier) ) and put it into the knowledge flow layout.
  • Finally we have to plot the ROC Curve, for that from the visualization tab select ModelPerformanceChart  and put it into the knowledge flow layout.

Figure 3
Now all the components need to draw the ROC are on the layout, next thing is to do the connections.
  • To connect ArffLoader with the ClassAssigner , right click on the ArffLoader  and select the data set and connect it to the ClassAssigner 

Figure 4
  • Right click on the ClassAssigner and select the dataset, connect it to ClassValuePicker.
  • Right click on the ClassValuePicker and select the dataset, connect it to. CrossValidationFoldMaker.
  • Next we have to assign the training and test set data to the classifier algorithms
  • Right click on the CrossValidationFoldMaker and select the training data, connect it to RF classifier. Right click on the CrossValidationFoldMaker and select the testing data, connect it to RF classifier.Similarly do the same for Naïve Bayes classifier also.
  • Right click on the RF Classifier and select the batchClassifier, connect it to ClassifierPerformanceEvaluator. Do the same for the Naïve Bayes classifier also.
  • Right click on the ClassifierPerformanceEvaluator and select the thresholdData, connect it to ModelPerformanceChart. Now the total arrangement looks like figure 4.
Figure 5

  • Next we are going to input the data, for that right clicks on the ArffLoader and select configure. Browse for the arrf file in your system, click ok button.
  • Right click on the ClassValuePicker and selects for which class we going to draw the ROC Curve.
  • Right click on the CrossValidationFoldMaker and selects how many folds we are using (default will be 10 fold cross validation) for selecting the training and testing data. Ten fold cross validation means from the input 90% data are used as training data and remaining 10% used as the testing data.
  • Next we have to run the model for that right clicks on the ArffLoader and selects start loading
Figure 6





To see the ROC Curve Right click the ModelPerformanceChart, and select show chart. The result will be look like in the figure 7.
Figure 7
Figure 8


Friday, January 24, 2014

How to plot ROC Curves in weka?

Receiver Operating Characteristic (ROC) represents the different trade-off between false positives and false negatives. It is created by plotting the fraction of true positives out of the total actual positives (True Positive Rate) vs. the fraction of false positives out of the total actual negatives (False Positive Rate), at various threshold settings. 
 
Do the steps given here for classification, after that right click on the result list

 


Select Visualize threshold curve
 

X axis will be False Positive Rate and Y axis will be True Positive Rate
To save the image click  Alt + Shift + Left Mouse Button

Thursday, January 23, 2014

Classification using Weka - Weka Tutorial 1

This tutorial gives in depth idea of how the classification in data mining is done using the weka tool.
  1. Click the explorer tab in the weka (Figure 1)
    Figure 1
  2. under the Preprocess tab click open file (Figure  2)
    Figure 2
  3. Open an .arff file to do classification (Figure 3)
  4. Figure 3
    Click the classify tab, choose the classifier
    Figure 4
  5. Click the start button
    Figure 5
  6. The result will like the following.
    Figure 6

Java Code to Copy eMail from One Folder to Another in Gmail

The following java code copies the eMails from one folder to another folder in gmail.

Steps

  1. First create a new label in gmail by clicking the manage label link
  2. Create new label(Name of the source folder)
  3. Give the name of new label in source folder(eg:PHISH) place in the code
  4. Run the code  











Libraries Required :- JavaMail Download  

Source Code
import java.util.Properties;
import javax.mail.Folder;
import javax.mail.Message;
import javax.mail.Session;
import javax.mail.Store;

/**
 *
 * @Sarju
 */
public class MoveMailToFolder {
    public static void main(String[] args) {
        Properties props = new Properties();
        props.setProperty("mail.store.protocol", "imaps");
        try {
            Session session = Session.getInstance(props, null);
            Store store = session.getStore();
            //eMail Authentication
            store.connect("imap.gmail.com", "username@gmail.com", "password");
            Folder inbox = store.getFolder("INBOX");//Source folder
            inbox.open(Folder.READ_WRITE);
            System.out.println("Opened source...");
            Folder spam = store.getFolder("PHISH"); // Destination folder
            spam.open(Folder.READ_WRITE);
            //Get the latest message
            Message[] msgs = inbox.getMessages(inbox.getMessageCount()-inbox.getUnreadMessageCount(),inbox.getMessageCount());
            inbox.copyMessages(msgs, spam);
            System.out.println("Copied messages...");
            inbox.close(false);
            store.close();
            } catch (Exception mex) {
        }
    }
}