Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.bundle
bayesdata.yml
1 change: 1 addition & 0 deletions .ruby-version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
2.1.3
10 changes: 10 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Revisions
##0.5.0
* Lots of refactoring, cleanup, making methods private, etc.
* Added tests
* Removed uid from train/untrain - couldn't think of a good use case, and the logic didn't seem right since the system doesn't keep track of which call to train created a token the untrain option would blindly remove them.
* Changed BayesData to BayesPool since that seems more explanatory
* Moved some pool manipulation functions into BayesPool for better encapsulation
* Add to_json method
* Removed data_class from Bayes initializer since I couldn't think of a reason to make that configurable
* Create corpus in build cache instead of maintaining it in parallel
674 changes: 674 additions & 0 deletions COPYING

Large diffs are not rendered by default.

674 changes: 674 additions & 0 deletions COPYING.LESSER

Large diffs are not rendered by default.

7 changes: 7 additions & 0 deletions Gemfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
source 'http://rubygems.org'
ruby '2.1.3'
gem 'stemmer'

group :test do
gem 'minitest'
end
12 changes: 12 additions & 0 deletions Gemfile.lock
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
GEM
remote: http://rubygems.org/
specs:
minitest (5.4.2)
stemmer (1.0.1)

PLATFORMS
ruby

DEPENDENCIES
minitest
stemmer
40 changes: 40 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# Introduction
This is a Naive Bayes classifier that can be used to categorize text based on trained "pools".
Training counts how often each word is used, except for any specified stop words.
The Bayes::Bishop.guess method tokenizes the message and then calculates for each pool the probability that the message is the same "classification" as that pool.
For example, you could train the system with one pool of "spam" email and one pool of "non-spam" email. Then you could ask the guess method which pool each incoming message belongs to.

# Usage
1. Create a Bishop::Bayes object:

b = Bishop::Bayes.new

2. Train with multiple pools of text:

b.train('pool1')
b.train('pool2')
b.train('pool3')

3. Call the guess method with a message to categorize:

guesses = b.guess('This is a sentence')

The return value is a hash where the keys are pool names and the values are the probability
that the message belongs to that pool.

# Features
* Stop words may be specified

b.add_stop_words(an_array_words)
b.add_stop_word('word')

* You can include the default stop words list

b.load_default_stop_words

* You can choose between the default tokenizer, a stemming tokenizer, or a custom tokenizer

b = Bishop::Bayes.new
b = Bishop::Bayes.new(Bishop::StemmingTokenizer)
b = Bishop::Bayes.new(CustomTokenizer)

30 changes: 30 additions & 0 deletions Rakefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
require 'rake/testtask'
require 'rdoc/task'

#desc "Default task: test"
task :default => [:test]

desc "Run Tests"
Rake::TestTask.new( :test ) do |t|
t.pattern = "test/test_*.rb"
t.verbose = true
end

RDoc::Task.new(:rdoc) do |rdoc|
rdoc.main = 'README.md'
rdoc.rdoc_files.include 'README.md', 'CHANGELOG.md', "lib/**/*\.rb"
rdoc.rdoc_dir = 'docs/rdoc'
rdoc.title = "Bayes::Bishop Documentation"
rdoc.options << '--line-numbers'
rdoc.options << '--fileboxes'
end

RDoc::Task.new(:rdoc => "rdoc_markdown",:clobber_rdoc => "clobber_rdoc_markdown", :rerdoc => "rerdoc_markdown") do |rdoc|
rdoc.main = 'README.md'
rdoc.rdoc_files.include 'README.md', 'CHANGELOG.md', "lib/**/*\.rb"
rdoc.rdoc_dir = 'docs/md'
rdoc.title = "Bayes::Bishop Documentation"
rdoc.markup = 'MARKUP'
rdoc.options << '--line-numbers'
rdoc.options << '--fileboxes'
end
20 changes: 10 additions & 10 deletions bishop.gemspec
Original file line number Diff line number Diff line change
Expand Up @@ -2,21 +2,21 @@ require 'rubygems'

SPEC = Gem::Specification.new do |s|
s.name = "bishop"
s.version = "0.4.0"
s.author = "Matt Mower"
s.email = "self@mattmower.com"
s.homepage = "http://rubyforge.org/projects/bishop/"
s.version = "0.5.0"
s.author = "Richard Harrington"
s.email = "richard@maymount.com"
s.license = 'LGPL-3.0+'
s.homepage = "https://github.com/maymount/bishop"
s.platform = Gem::Platform::RUBY
s.summary = "Bayesian classification and ART-2 clustering library."

candidates = Dir.glob( "{bin,docs,lib,test}/**/*" )
s.summary = "Bayesian classification library. Refactoring of mmowers/bishop version."
s.description = "Bayesian classification library. Refactoring of mmowers/bishop version."
s.add_runtime_dependency 'stemmer'
candidates = Dir.glob( "{docs,lib,test}/**/*" )

s.files = candidates.delete_if do |item|
item.include?( "CVS" ) || item.include?( "rdoc" )
end
s.extra_rdoc_files = ['README.md','CHANGELOG.md','COPYING','COPYING.LESSER']
s.require_path = "lib"
# s.autorequire = "bishop"
s.has_rdoc = true

#s.add_dependency( "stemmer", ">= 1.0.1" )
end
202 changes: 202 additions & 0 deletions docs/md/Bishop.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
<!DOCTYPE html>

<html>
<head>
<meta charset="UTF-8">

<title>module Bishop - Bayes::Bishop Documentation</title>

<link href="./fonts.css" rel="stylesheet">
<link href="./rdoc.css" rel="stylesheet">

<script type="text/javascript">
var rdoc_rel_prefix = "./";
</script>

<script src="./js/jquery.js"></script>
<script src="./js/navigation.js"></script>
<script src="./js/search_index.js"></script>
<script src="./js/search.js"></script>
<script src="./js/searcher.js"></script>
<script src="./js/darkfish.js"></script>


<body id="top" role="document" class="module">
<nav role="navigation">
<div id="project-navigation">
<div id="home-section" role="region" title="Quick navigation" class="nav-section">
<h2>
<a href="./index.html" rel="home">Home</a>
</h2>

<div id="table-of-contents-navigation">
<a href="./table_of_contents.html#pages">Pages</a>
<a href="./table_of_contents.html#classes">Classes</a>
<a href="./table_of_contents.html#methods">Methods</a>
</div>
</div>

<div id="search-section" role="search" class="project-section initially-hidden">
<form action="#" method="get" accept-charset="utf-8">
<div id="search-field-wrapper">
<input id="search-field" role="combobox" aria-label="Search"
aria-autocomplete="list" aria-controls="search-results"
type="text" name="search" placeholder="Search" spellcheck="false"
title="Type to search, Up and Down to navigate, Enter to load">
</div>

<ul id="search-results" aria-label="Search Results"
aria-busy="false" aria-expanded="false"
aria-atomic="false" class="initially-hidden"></ul>
</form>
</div>

</div>



<div id="class-metadata">




<!-- Method Quickref -->
<div id="method-list-section" class="nav-section">
<h3>Methods</h3>

<ul class="link-list" role="directory">

<li ><a href="#method-c-robinson">::robinson</a>

<li ><a href="#method-c-robinson_fisher">::robinson_fisher</a>

</ul>
</div>

</div>
</nav>

<main role="main" aria-labelledby="module-Bishop">
<h1 id="module-Bishop" class="module">
module Bishop
</h1>

<section class="description">

</section>




<section id="5Buntitled-5D" class="documentation-section">









<section id="public-class-5Buntitled-5D-method-details" class="method-section">
<header>
<h3>Public Class Methods</h3>
</header>


<div id="method-c-robinson" class="method-detail ">

<div class="method-heading">
<span class="method-name">robinson</span><span
class="method-args">( probs, ignore )</span>

<span class="method-click-advice">click to toggle source</span>

</div>


<div class="method-description">

<p>default “combiner” set in initialize ignore is truly ignored</p>




<div class="method-source-code" id="robinson-source">
<pre><span class="ruby-comment"># File lib/bayes/bishop.rb, line 419</span>
<span class="ruby-keyword">def</span> <span class="ruby-keyword">self</span>.<span class="ruby-identifier">robinson</span>( <span class="ruby-identifier">probs</span>, <span class="ruby-identifier">ignore</span> )
<span class="ruby-identifier">nth</span> = <span class="ruby-value">1.0</span><span class="ruby-operator">/</span><span class="ruby-identifier">probs</span>.<span class="ruby-identifier">length</span>
<span class="ruby-identifier">what_is_p</span> = <span class="ruby-value">1.0</span> <span class="ruby-operator">-</span> <span class="ruby-identifier">probs</span>.<span class="ruby-identifier">map</span> { <span class="ruby-operator">|</span><span class="ruby-identifier">p</span><span class="ruby-operator">|</span> <span class="ruby-value">1.0</span> <span class="ruby-operator">-</span> <span class="ruby-identifier">p</span>[<span class="ruby-value">1</span>] }.<span class="ruby-identifier">inject</span>( <span class="ruby-value">1.0</span> ) { <span class="ruby-operator">|</span><span class="ruby-identifier">s</span>,<span class="ruby-identifier">v</span><span class="ruby-operator">|</span> <span class="ruby-identifier">s</span> <span class="ruby-operator">*</span> <span class="ruby-identifier">v</span> } <span class="ruby-operator">**</span> <span class="ruby-identifier">nth</span>
<span class="ruby-identifier">what_is_q</span> = <span class="ruby-value">1.0</span> <span class="ruby-operator">-</span> <span class="ruby-identifier">probs</span>.<span class="ruby-identifier">map</span> { <span class="ruby-operator">|</span><span class="ruby-identifier">p</span><span class="ruby-operator">|</span> <span class="ruby-identifier">p</span>[<span class="ruby-value">1</span>] }.<span class="ruby-identifier">inject</span> { <span class="ruby-operator">|</span><span class="ruby-identifier">s</span>,<span class="ruby-identifier">v</span><span class="ruby-operator">|</span> <span class="ruby-identifier">s</span> <span class="ruby-operator">*</span> <span class="ruby-identifier">v</span> } <span class="ruby-operator">**</span> <span class="ruby-identifier">nth</span>
<span class="ruby-identifier">what_is_s</span> = ( <span class="ruby-identifier">what_is_p</span> <span class="ruby-operator">-</span> <span class="ruby-identifier">what_is_q</span> ) <span class="ruby-operator">/</span> ( <span class="ruby-identifier">what_is_p</span> <span class="ruby-operator">+</span> <span class="ruby-identifier">what_is_q</span> )
( <span class="ruby-value">1</span> <span class="ruby-operator">+</span> <span class="ruby-identifier">what_is_s</span> ) <span class="ruby-operator">/</span> <span class="ruby-value">2</span>
<span class="ruby-keyword">end</span></pre>
</div>

</div>




</div>


<div id="method-c-robinson_fisher" class="method-detail ">

<div class="method-heading">
<span class="method-name">robinson_fisher</span><span
class="method-args">( probs, ignore )</span>

<span class="method-click-advice">click to toggle source</span>

</div>


<div class="method-description">

<p>Alternative combiner</p>




<div class="method-source-code" id="robinson_fisher-source">
<pre><span class="ruby-comment"># File lib/bayes/bishop.rb, line 428</span>
<span class="ruby-keyword">def</span> <span class="ruby-keyword">self</span>.<span class="ruby-identifier">robinson_fisher</span>( <span class="ruby-identifier">probs</span>, <span class="ruby-identifier">ignore</span> )
<span class="ruby-identifier">n</span> = <span class="ruby-identifier">probs</span>.<span class="ruby-identifier">length</span>

<span class="ruby-keyword">begin</span>
<span class="ruby-identifier">h</span> = <span class="ruby-identifier">chi2p</span>( <span class="ruby-value">-2.0</span> <span class="ruby-operator">*</span> <span class="ruby-constant">Math</span>.<span class="ruby-identifier">log</span>( <span class="ruby-identifier">probs</span>.<span class="ruby-identifier">map</span> { <span class="ruby-operator">|</span><span class="ruby-identifier">p</span><span class="ruby-operator">|</span> <span class="ruby-identifier">p</span>[<span class="ruby-value">1</span>] }.<span class="ruby-identifier">inject</span>( <span class="ruby-value">1.0</span> ) { <span class="ruby-operator">|</span><span class="ruby-identifier">s</span>,<span class="ruby-identifier">v</span><span class="ruby-operator">|</span> <span class="ruby-identifier">s</span><span class="ruby-operator">*</span><span class="ruby-identifier">v</span> } ), <span class="ruby-value">2</span><span class="ruby-operator">*</span><span class="ruby-identifier">n</span> )
<span class="ruby-keyword">rescue</span>
<span class="ruby-identifier">h</span> = <span class="ruby-value">0.0</span>
<span class="ruby-keyword">end</span>

<span class="ruby-keyword">begin</span>
<span class="ruby-identifier">s</span> = <span class="ruby-identifier">chi2p</span>( <span class="ruby-value">-2.0</span> <span class="ruby-operator">*</span> <span class="ruby-constant">Math</span>.<span class="ruby-identifier">log</span>( <span class="ruby-identifier">probs</span>.<span class="ruby-identifier">map</span> { <span class="ruby-operator">|</span><span class="ruby-identifier">p</span><span class="ruby-operator">|</span> <span class="ruby-value">1.0</span> <span class="ruby-operator">-</span> <span class="ruby-identifier">p</span>[<span class="ruby-value">1</span>] }.<span class="ruby-identifier">inject</span>( <span class="ruby-value">1.0</span> ) { <span class="ruby-operator">|</span><span class="ruby-identifier">s</span>,<span class="ruby-identifier">v</span><span class="ruby-operator">|</span> <span class="ruby-identifier">s</span><span class="ruby-operator">*</span><span class="ruby-identifier">v</span> } ), <span class="ruby-value">2</span><span class="ruby-operator">*</span><span class="ruby-identifier">n</span> )
<span class="ruby-keyword">rescue</span>
<span class="ruby-identifier">s</span> = <span class="ruby-value">0.0</span>
<span class="ruby-keyword">end</span>

( <span class="ruby-value">1</span> <span class="ruby-operator">+</span> <span class="ruby-identifier">h</span> <span class="ruby-operator">-</span> <span class="ruby-identifier">s</span> ) <span class="ruby-operator">/</span> <span class="ruby-value">2</span>
<span class="ruby-keyword">end</span></pre>
</div>

</div>




</div>


</section>

</section>
</main>


<footer id="validator-badges" role="contentinfo">
<p><a href="http://validator.w3.org/check/referer">Validate</a>
<p>Generated by <a href="http://rdoc.rubyforge.org">RDoc</a> 4.1.0.
<p>Based on <a href="http://deveiate.org/projects/Darkfish-Rdoc/">Darkfish</a> by <a href="http://deveiate.org">Michael Granger</a>.
</footer>

Loading