Research
I'm a computational linguist with interests in
natural language processing, formal language and automata theory, artificial
neural networks, dynamical systems, and multivariate statistical analysis of
linguistic corpora. From an earlier life, I also retain an interest and some
research activity in the development and cultural role of literacy, and in
early Germanic and Celtic languages and history.
Publication
list
Current work
- Implementation of natural language understanding systems using dynamic attractor sequences
I have been
developing a strictly sequential natural language
understanding architecture that dispenses with two
foundational principles of generative linguistics,
mainstream cognitive science, and much of artificial
intelligence --that natural language strings have complex
syntactic structure processed by structure-sensitive
algorithms, and that this structure is crucial in
determining string semantics. This sequential architecture
was originally stated in terms of standard automata theory
as a system of cooperating finite state automata, but more
recently I have become interested in neuroscientific work
which identifies chaotic attractor trajectory in state space
as the fundamental principle of brain function at a level
above that of the individual neuron, and which indicates
that sensory processing, and perhaps higher cognition more
generally, are implemented by cooperating attractor sequence
processes. Some relevant publications are:
Moisl, H., (1992)
'Connectionist finite state natural language processing', Connection
Science 4, 67 - 91.
Moisl, H.,(1997) 'Recurrent
neural networks and natural language processing', New
Methods in Language Processing, ed. D. Jones & H.
Somers, UCL Press, London, 69-82
Moisl, H.,(2000)
Handbook
of Natural Language Processing, Marcel Dekker (with R.
Dale of Macquarie University and H. Somers of UMIST)
Moisl, H.,(2001) 'Artificial
neural networks and natural language processing', Encyclopedia
of Library and Information Science, ed. M. Drake, Marcel
Dekker (in press)
Moisl, H.,(2001) 'Linguistic
computation with state space trajectories', in
Emergent
Neural Computational Architectures based on Neuroscience,
ed. Stefan Wermter, Jim Austin, David Willshaw, Springer, 2001
|
- Natural language corpus creation
Together
with
Karen Corrigan
of Newcastle University and
Joan Beal of Sheffield
University, I have recently completed the Newcastle Electronic Corpus of
Tyneside English (NECTE), a corpus of dialect speech from
Tyneside in North-East England. It is based on two pre-existing corpora, one
of them collected in the late 1960s by the Tyneside Linguistic Survey
(TLS) project, and the other in 1994 by the Phonological Variation and
Change in Contemporary Spoken English (PVC) project. NECTE amalgamates
the TLS and PVC materials into a single
Text Encoding Initiative (TEI)-conformant
XML-encoded corpus and makes them available in a variety of aligned
formats: digitized audio, standard orthographic transcription, phonetic
transcription, and part-of-speech tagged. This website describes the NECTE
corpus in detail, and makes it available to academic researchers,
educationalists, the media in non-commercial applications, and organisations
such as language societies and individuals with a serious interest in
historical dialect materials.
For further informantion, go to the
project website.
|
- Exploratory multivariate analysis
of text corpora
Since completion of the NECTE project
I have been developing a methodology for sociolinguistic and
dialectological study of the corpus, the aim of which is to
attempt to identify interesting regularities in phonetic variation
among informants in the corpus, and any correlations between such
variation and associated social factors. The methodology is based on the one
formulated by the originators of much of the NECTE corpus, the Tyneside Linguistic Survey
(TLS). It was radical at the time and remains
so today: in contrast to the then-universal and still-dominant theory driven
approach, where social and linguistic factors are selected by the analyst on
the basis of some combination of an independently-specified theoretical
framework, existing case studies, and personal experience of the domain of
enquiry, the TLS proposed a fundamentally empirical approach in which
salient factors are extracted from the data itself and then serve as the
basis for model construction. To implement its approach the TLS used a
particular exploratory multivariate analytical technique,
hierarchical cluster analysis, but its work never
progressed beyond preliminary studies for a variety of theoretical and
practical reasons. My development of the TLS methodology
i. uses a range
of linear and nonlinear exploratory analytical methods in addition
to hierarchical cluster analysis, such as multidimensional scaling
and self organizing maps, and
ii. pays
particular attention to issues in data creation which are crucial
to the validity of analytical results: document length
normalization, dimensionality reduction, and data nonlinearity.
Relevant
publications are:
Moisl, H. and Beal,
J.C. (2001) ‘Corpus Analysis and Results: Visualization Using
Self-Organizing Maps’, Corpus Linguistics 2001, Lancaster
University, 386-391. Electronic Publication.
Moisl, H, Jones, V (2005)
Cluster analysis of the Newcastle electronic
corpus of Tyneside English: a comparison of methods.
Centre for Telematics and Information Technology [CTIT] ; TR
2005/65) {A-53328} University of Twente.
Moisl, H, Jones V., (2005) 'Cluster
analysis of the Newcastle Electronic Corpus of Tyneside English: a
comparison of methods', Literary and
Linguistic Computing 20, 125-46. [Online
journal version] [Preprint]
Moisl, H., Maguire W,
Allen W., (2006) 'Phonetic
variation in Tyneside: exploratory multivariate
analysis of the Newcastle Electronic Corpus of
Tyneside English'. In: F. Hinskens, ed.
Language Variation. European Perspectives.
Amsterdam: Meertens Institute. [Preprint]
Allen, W.H., Beal, J.C.,
Corrigan, K.P., Maguire, W. and Moisl, H.L. (2007) ‘A Linguistic
‘Time-Capsule’: The Newcastle Electronic Corpus of Tyneside
English’ in Beal, J.C., Corrigan, K.P. and Moisl, H.L. (eds.) Creating and Digitising Language Corpora, Vol. 2: Diachronic
Databases. Houndmills: Palgrave Macmillan, 16-48.
[Preprint]
Moisl, H. (2007) Data
nonlinearity in exploratory multivariate analysis of
language corpora, Computing and Historical
Phonology. Proceedings of the Ninth Meeting of the
ACL Special Interest Group in Computational
Morphology and Phonology, June 28 2007, ed. J.
Nerbonne, M.Ellison, G.Kondrak, Association for
Computational Linguistics, 93-100. [Online
publication]
Moisl, H. (2008) 'Exploratory
Multivariate Analysis',
In: Lüdeling A, Kytö M, ed. Corpus Linguistics. An
International Handbook (Series: Handbücher zur
Sprache und Kommunikationswissenschaft/Handbooks of
Linguistics and Communication Science). Berlin:
Mouton de Gruyter. [Preprint]
Moisl, H., Maguire, W. (2008)
'Identifying
the Main Determinants of Phonetic Variation in the
Newcastle Electronic Corpus of Tyneside English',
Journal of Quantitative Linguistics 15, 46-69. [Preprint]
Moisl, H. (2008) 'Using electronic
corpora to study language variation: the problem of data
sparsity', in Tsiplakou, S; Karyolemu, M; Pavlou,
P, ed. Language Variation. European Perspectives II.Amsterdam:John
Benjamins,2009, 169-178 [Preprint]
Moisl, H. (2009) 'Using electronic
corpora in historical dialectology research: the problem
of document length variation', in [M.
Dossena & R. Lass, ed. Studies in English and
European Historical Dialectology.Bern:Peter Lang
Preprint]
Moisl, H. (2008) Normalization for Variation in Document Length in
Exploratory Multivariate Analysis of Text Corpora,
Proceedings of INFOS2008: 6th International Conference on
Informatics and Systems, Cairo University, 27-29 March
2008 [Preprint]
Moisl, H. (2009)
Sura
length and lexical probability estimation in cluster
analysis of the Qur'an,
Association for Computing Machinery
Transactions on Asian Language Information Processing
8, .
[Online publication]
Moisl, H. (2009) 'Hypothesis
Generation', in Analysing Variation
in English: What we know, what we don't, and why it
matters, ed. A. McMahon & W. Maguire, Cambridge
University Press. [Preprint]
Moisl. H. (2010)
Variable scaling in cluster
analysis of linguistic data,
Corpus Linguistics and Linguistic Theory 6
[Preprint]
Moisl, H. (2010)
Finding the
minimum document length for reliable clustering of
multi-document natural language corpora, Journal of
Quantitative Linguistics 18 (in press) [Preprint]
4th
Conference on Quantitative Investigations in Theoretical
Linguistics, Berlin
Methods in Dialectology 14
Draft DECTE website
|
|