Skip to content

add binomial entropy and kl#149

Open
alicanb wants to merge 2 commits intomasterfrom
binom-kl
Open

add binomial entropy and kl#149
alicanb wants to merge 2 commits intomasterfrom
binom-kl

Conversation

@alicanb
Copy link
Copy Markdown
Collaborator

@alicanb alicanb commented Jun 26, 2018

This is a larger PR than I intended but basically it adds binomial entropy and binomial-poisson and binomial-geometric KL with some helper functions:

  • binomial._log1pmprobs: I used this a lot so I made it a separate function. it calculates
    (-probs).log1p() safely.
  • binomial._Elnchoosek(): for x~Bin(n, p), this calculates E[log(nchoosek)], E[log(n!)], E[log(x!)], E[log((n-x)!)]

@alicanb alicanb requested review from fritzo and vishwakftw June 26, 2018 07:47
Comment thread torch/distributions/binomial.py Outdated
def param_shape(self):
return self._param.size()

def _log1pmprobs(self):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it is a function for internal use, I think this can be moved to the top, like in MVN. Something like:

def _log1pmtensor(tensor):
    # Do the same thing

Uses of the function in kl.py can be done via importing this function along with Binomial.

Comment thread torch/distributions/binomial.py Outdated
values = values.expand((-1,) + self._batch_shape)
return values

def _Elnchoosek(self):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same idea here.

Comment thread torch/distributions/binomial.py Outdated
s = self.enumerate_support()
s[0] = 1 # 0! = 1
# x is factorial matrix i.e. x[k,...] = k!
x = torch.cumsum(s.log(), dim=0)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x is the log of factorial matrix right?

Comment thread torch/distributions/binomial.py Outdated
indices[0] = torch.arange(x.size(0) - 1, -1, -1,
dtype=torch.long, device=x.device)
# x[tuple(indices)] is x reversed on first axis
lnchoosek = x[-1] - x - x[tuple(indices)]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think x.flip(dim=0) will exhibit same behaviour.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

weird, I tried using flip and it didn't work before- maybe I messed with arguments...

Comment thread torch/distributions/binomial.py Outdated
elognfac = x[-1]
elogkfac = ((lnchoosek + s * self.logits + self.total_count * self._log1pmprobs()).exp() *
x).sum(dim=0)
elognmkfac = ((lnchoosek + s * self.logits + self.total_count * self._log1pmprobs()).exp() *
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E[log(n-k)!] = E[log k!] but for Bin(n, (1 - p)). Can we use this fact here?

Comment thread torch/distributions/kl.py Outdated
return kl


@register_kl(Binomial, Poisson)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heterogeneous combinations were placed below. This section was for homogeneous combinations.

Comment thread torch/distributions/kl.py Outdated
q.rate)


@register_kl(Binomial, Geometric)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment.

Comment thread torch/distributions/kl.py Outdated
return -p.entropy() - torch.log1p(-q.probs) / p.probs - q.logits


@register_kl(Geometric, Binomial)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above comment.

Copy link
Copy Markdown

@vishwakftw vishwakftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments have been given. Please check them.

Could you check if the KL test passes with lower tolerance, and how much time it takes in the default tolerance setting?

Copy link
Copy Markdown

@fritzo fritzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding these!

@alicanb
Copy link
Copy Markdown
Collaborator Author

alicanb commented Jun 26, 2018

@vishwakftw thanks for the comments! One thing I want us to work out before wrapping this up is an approximation to E[logk!] for large n, I tried Stirling's but couldn't come up with a closed form. Any ideas?

@vishwakftw
Copy link
Copy Markdown

vishwakftw commented Jun 26, 2018

I think we have to make use of Stirling's inequality and the Taylor series to compute this. I guess the reason you are unable to come up with a closed form is because of the log (k) term.

I tried using them, and got about 0.5% relative error.

image

This might help after the expansion of log k! <= 1 + klog k + 0.5 log k - k

<source: wikipedia: https://en.wikipedia.org/wiki/Taylor_expansions_for_the_moments_of_functions_of_random_variables>

Copy link
Copy Markdown

@vishwakftw vishwakftw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!! @fritzo what do you think?

@vishwakftw
Copy link
Copy Markdown

Also, are you going to try the large n approximation using Stirling and Taylor expansions? @alicanb

@alicanb
Copy link
Copy Markdown
Collaborator Author

alicanb commented Jun 27, 2018

@vishwakftw btw I tried it with 0.01 precision as well. 2 things on my wishlist:

  • large n approximation for _Elnchoosek
  • KL(Bin(N,p)|Bin(M,p)) where M>N. Although we can calculate this expensively, making it work for batch is hard... Maybe it doesn't worth the effort.

@vishwakftw
Copy link
Copy Markdown

@alicanb I have a closed form solution for E[log x!], E[log (n - x)!] and E[log n!] (this is simply log n!) for large n.

@alicanb
Copy link
Copy Markdown
Collaborator Author

alicanb commented Jun 27, 2018

Great, have you experimented with any large n? n=30 seems not large enough for KL(Bin|Geom) for me with 0.1 precision.

@vishwakftw
Copy link
Copy Markdown

This is the gist for the approximations.

I ran some tests: n = {10, 20, 50, 75, 100} and p = {0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9}
Max relative error: 0.198 (n = 10, p = 0.9) and min relative error: 0.00025 (n = 100, p = 0.1). This is for E[log(n - x)!]

@alicanb
Copy link
Copy Markdown
Collaborator Author

alicanb commented Jun 27, 2018

btw lgamma(n * (1-p) + 1) + 0.5 * polygamma(1,n * (1-p) + 1) * n * p * (1-p) is a pretty good approximation even for small n, but it's non-differentiable we don't have polygamma(2,x)...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants