DOI Number : 10.5614/itbj.ict.res.appl.2013.7.3.1
Hits : 27

Implementation of Kadazan Tagger Based on Brill's Method

Marylyn Alex & Lailatul Qadri Zakaria

CAIT Research Group, Faculty of Information Science and Technology,
Jalan Tun Ismail Ali, Universiti Kebangsaan Malaysia, 43600 Bangi, Malaysia
Email: alexmarylyn@gmail.com


Abstract. We present and evaluate the implementation of Part of Speech (POS) Tagging for the Kadazan language by using the Transformation-based approach. The main purpose of this study is to develop an automatic POS tagging for the Kadazan language, which had never, been developed before. POS tagging can tag the Kadazan corpus automatically and can help reduce the disambiguation problem of this language. The implementation of this approach in this study is to achieve a better and higher accuracy or at least similar to that of the other tagging approaches such as the statistical and the original rule-based approach. This approach can transform the tags based on the prescribed set of rules. A number of objectives were set in order to achieve the main purpose of this study. Firstly, to apply the lexical and contextual rules for this language. Secondly, to implement the Brill's algorithm based on the set of rules and finally to determine the effectiveness of the Kadazan Part of Speech by using this approach. The tagging system had been trained using four Kadazan corpuses containing 5663 words in all. Based on the evaluation results, the tagging system had achieved around 93% accuracy.

Keywords: brill’s tagger; kadazan language; Part of Speech tagger; rule-based; statistical; transformation-based.

Download Article
 
Bahasa Indonesia | English
 
 
 

Notification:

Begin on 10 October 2014 this website is no longer activated for article process in Journal of Mathematical and Fundamental Sciences, Journal of Engineering and Technological Sciences, Journal of ICT Research and Applications and Journal of Visual Art and Design. The next process will be proceeded under new website at http://journals.itb.ac.id.

For detail information please contact us to: journal@lppm.itb.ac.id.

 
       
       
       ITB Journal Visitor Number #24329725       
       Jl. Tamansari 64, Bandung 40116, Indonesia Visitor IP Address #       
       Tel : +62-22-250 1759 ext. 121 © 2011 Institut Teknologi Bandung       
       Fax : +62-22-250 4010, +62-22-251 1215 XHTML + CSS + RSS       
       E-mail : journal@lppm.itb.ac.id or proceedings@lppm.itb.ac.id Developed by AVE