Hybrid sequential-MST parser

Implement "hybrid" parser blending sequential information and MI, so the extend of blending could be made configurable, with "maximum sequential" mode producing "sequential parse" and "maximum MI" mode producing "plain MST-Parses with no account for distance".

There are two perspectives:
A) As Ben Goertzel has suggested, ("use *both* the sequential parse *and* some fancier hierarchical parse as inputs to clustering and grammar learning?   I.e. don't throw out the information of simple before-and-after co-occurrence, but augment it with information from the statistically inferred dependency parse tree") we can be simply (I guess) have it implemented in existing MST-Parser given the changes that @glicerico and Claudia have done year ago. That could be tried with "distance_vs_MI" blending parameter in the MST-Parser code which accounts for word-to-word distance. So that if the distance_vs_MI=1.0 we would get "sequential parses", distance_vs_MI=0.0 would produce "Pure MST-Parses", distance_vs_MI=0.7 would provide "English parses", distance_vs_MI=0.5 would provide "Russian parses".

B) As Ben Goertzel further wrote:

> I don't think we want an arithmetic average of distance and MI, maybe more like
> 
> f(1) = C >1
> f(1) > f(2) > f(3) > f(4)
> f(4) = f(5) = ... = 1
> 
> and then
> 
> f(distance) * MI
> 
> i.e. maybe we count the MI significantly more if the distance is
> small... but if MI is large and distance is large, we still count the
> MI a lot...
> 
> (of course the decreasing function f becomes the thing to tune here...)

The task can be broken down to subtasks:
**1) Implement configurable blending of sequential and MI information using approach A) or B) or combination of the two above.
2) Implement unit test ensuring that it can provide either sequential or MST or hybrid parses on small corpus like POC-English or POC-Turtle.
3) Study F1 of the parses based o Gutenberg Children corpus  and see if we can find configuration outperforming "sequential parses".
4) Extend study 3) using both traditional MI and DNN-MI.**
 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hybrid sequential-MST parser #217

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hybrid sequential-MST parser #217

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions