Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
This thesis focuses on learning natural language interfaces using synchronous
grammars, l-calculus and statistical modeling of parse probabilities. A major
focus of the thesis has been to replicate Mooney and Wong’s l-WASP  algorithm
and implement it inside the C-PHRASE  Natural Language Interface
(NLI) system. By doing this we can use C-PHRASE’s more expressive and transportable
meaning representation language (MRL), rather than the PROLOG-based
MRL Mooney and Wong used.
Our system, the C-PHRASE LEARNER, relaxes some constraints in l-WASP
to allow use of more flexible MRL grammars. We also reformulate the algorithm
in terms of operations on trees to clarify and simplify the approach. We test the
C-PHRASE LEARNER over the US geography corpus GEOQUERY and produce
precision and recall results slightly below those achieved by l-WASP. This was
expected as we have fewer domain restrictions due to our more expressive and
portable MRL grammar.
Our work on the C-PHRASE LEARNER system has also revealed some promising
avenues of future research including, among others, alternative statistical alignment
strategies, integrating linguistic theories into our learning algorithm and
ways to improve named entity recognition. C-PHRASE LEARNER is presented
as open source to the community to allow anyone to expand upon this work.