Added section on training and testing sets

abatula · abatula · commit e0111dda5460 · 2015-02-16T18:32:32.000-05:00
diff --git a/Iris_DataSet.ipynb b/Iris_DataSet.ipynb
@@ -1,7 +1,7 @@
 {
  "metadata": {
   "name": "",
-  "signature": "sha256:72abac1d31697537195855ce452891fee38f41e1df32d4a77d25305918028631"
+  "signature": "sha256:dc45d7ea9684401e6148d4fba511d2fdc4a35075dcf957f971f8c4b9e3ca5c78"
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -210,6 +210,59 @@
      "metadata": {},
      "outputs": []
     },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "Training and Testing Sets\n",
+      "===\n",
+      "\n",
+      "In order to evaluate our data properly, we need to divide our dataset into training and testing sets.\n",
+      "\n",
+      "Training Set\n",
+      "---\n",
+      "A portion of the data, usually a majority, used to train a machine learning classifier. These are the examples that the computer will learn in order to try to predict data labels.\n",
+      "\n",
+      "Testing Set\n",
+      "---\n",
+      "A portion of the data, usually smaller than the training set, used to test the accuracy of the machine learning classifier. The computer does not \"see\" this data while learning, but tries to guess the data labels. We can then determine the accuracy of our method by determining how many examples it got correct.\n",
+      "\n",
+      "Creating training and testing sets\n",
+      "---\n",
+      "Below, we create a training and testing set from the iris dataset using using the [train_test_split()](http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) function. "
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [
+      "from sklearn import cross_validation\n",
+      "\n",
+      "X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4)\n",
+      "\n",
+      "print(X.shape)\n",
+      "print(X_train.shape)\n",
+      "print(X_test.shape)\n"
+     ],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
+    {
+     "cell_type": "markdown",
+     "metadata": {},
+     "source": [
+      "More information on different methods for creating training and testing sets is available at scikit-learn's [crossvalidation](http://scikit-learn.org/stable/modules/cross_validation.html) page."
+     ]
+    },
+    {
+     "cell_type": "code",
+     "collapsed": false,
+     "input": [],
+     "language": "python",
+     "metadata": {},
+     "outputs": []
+    },
     {
      "cell_type": "code",
      "collapsed": false,