Skip to content

Commit e0111dd

Browse files
committed
Added section on training and testing sets
1 parent 1eb67af commit e0111dd

1 file changed

Lines changed: 54 additions & 1 deletion

File tree

Iris_DataSet.ipynb

Lines changed: 54 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
{
22
"metadata": {
33
"name": "",
4-
"signature": "sha256:72abac1d31697537195855ce452891fee38f41e1df32d4a77d25305918028631"
4+
"signature": "sha256:dc45d7ea9684401e6148d4fba511d2fdc4a35075dcf957f971f8c4b9e3ca5c78"
55
},
66
"nbformat": 3,
77
"nbformat_minor": 0,
@@ -210,6 +210,59 @@
210210
"metadata": {},
211211
"outputs": []
212212
},
213+
{
214+
"cell_type": "markdown",
215+
"metadata": {},
216+
"source": [
217+
"Training and Testing Sets\n",
218+
"===\n",
219+
"\n",
220+
"In order to evaluate our data properly, we need to divide our dataset into training and testing sets.\n",
221+
"\n",
222+
"Training Set\n",
223+
"---\n",
224+
"A portion of the data, usually a majority, used to train a machine learning classifier. These are the examples that the computer will learn in order to try to predict data labels.\n",
225+
"\n",
226+
"Testing Set\n",
227+
"---\n",
228+
"A portion of the data, usually smaller than the training set, used to test the accuracy of the machine learning classifier. The computer does not \"see\" this data while learning, but tries to guess the data labels. We can then determine the accuracy of our method by determining how many examples it got correct.\n",
229+
"\n",
230+
"Creating training and testing sets\n",
231+
"---\n",
232+
"Below, we create a training and testing set from the iris dataset using using the [train_test_split()](http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html) function. "
233+
]
234+
},
235+
{
236+
"cell_type": "code",
237+
"collapsed": false,
238+
"input": [
239+
"from sklearn import cross_validation\n",
240+
"\n",
241+
"X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=0.4)\n",
242+
"\n",
243+
"print(X.shape)\n",
244+
"print(X_train.shape)\n",
245+
"print(X_test.shape)\n"
246+
],
247+
"language": "python",
248+
"metadata": {},
249+
"outputs": []
250+
},
251+
{
252+
"cell_type": "markdown",
253+
"metadata": {},
254+
"source": [
255+
"More information on different methods for creating training and testing sets is available at scikit-learn's [crossvalidation](http://scikit-learn.org/stable/modules/cross_validation.html) page."
256+
]
257+
},
258+
{
259+
"cell_type": "code",
260+
"collapsed": false,
261+
"input": [],
262+
"language": "python",
263+
"metadata": {},
264+
"outputs": []
265+
},
213266
{
214267
"cell_type": "code",
215268
"collapsed": false,

0 commit comments

Comments
 (0)