This software is was designed by Zachary Bornheimer to do automated morpheme extraction. The goal is for an unsupervised, non-statistical, language-blind machine learning algorithm that could parse corpera of a variety of languages.
The Paper explaining the research is coming soon.
Here is the README file.
Morpheme Extraction System ========================== This software allows for the programmatic extraction of morpheme candidates from a corpus into a defined morpheme-list location. Licensed under the GPLv2. If you change something or get something to work better, please let me know it will help me improve in C and will help the project :-) Research Paper that accompanied this project is coming soon. Software Required for Functionality: gcc (with OpenMP compatibility enabled) make How to install? Choose one of the following: make optimized make debug make all Command-line Arguments: Verbose Mode: --verbose Serial Processing: --serial or --sequential --process-sequentially Full Processing: --process Output File: --output-file REL-FILE-PATH Corpus Dir: --corpus-dir REL-CORPUS-PATH where REL-FILE-PATH and REL-CORPUS-PATH are relative paths to a desired filename and/or corpus directory. Verbose Mode gives more visual output, however it impacts speed. Serial Processing yields data results for each file process as opposed to a conglomerate data processing experience :) Full Processing yields serial and sequential results as if you were to have run the program with --serial the first time and then a second time without that flag. Output File is the place in which data results are appended (it won't overwrite existing data). Corpus Dir is the place where all the files that need to be processed reside.