Skip to content

Latest commit

 

History

History
19 lines (17 loc) · 805 Bytes

File metadata and controls

19 lines (17 loc) · 805 Bytes
  • Use ! as an emotion
  • Porter Stemming and Lemmatizing available in NLTK - allows you to treat messages, message and messaging the same
  • Remove words which occurr in less than ten seasons or something.
  • Remove unpopular names
  • There are a bunch of unique names 3950 names!!!! We should filter out unpopular individuals, say all those who have less than X lines in all seasons
  • grid search for svm
  • Neural Networks, use appropriate activiation functions
  • We should somehow plot the datapoints, would be good for the paper too

Best param clf__alpha -> 1e-05 Best param clf__n_iter -> 50 Best param clf__penalty -> elasticnet Best param tfidf__norm -> l2 Best param tfidf__use_idf -> False Best param vect__max_df -> 0.5 Best param vect__max_features -> 50000 Best param vect__ngram_range -> (1, 2)