This software is was designed by Zachary Bornheimer to do automated morpheme extraction. The goal is for an unsupervised, non-statistical, language-blind machine learning algorithm that could parse corpera of a variety of languages.
The Paper explaining the research is coming soon.
Here is the README file.
Morpheme Extraction System
==========================
This software allows for the programmatic
extraction of morpheme candidates from a
corpus into a defined morpheme-list location.
Licensed under the GPLv2.
If you change something or get something to
work better, please let me know it will help
me improve in C and will help the project :-)
Research Paper that accompanied this project is coming soon.
Software Required for Functionality:
gcc (with OpenMP compatibility enabled)
make
How to install?
Choose one of the following:
make optimized
make debug
make all
Command-line Arguments:
Verbose Mode: --verbose
Serial Processing: --serial or --sequential --process-sequentially
Full Processing: --process
Output File: --output-file REL-FILE-PATH
Corpus Dir: --corpus-dir REL-CORPUS-PATH
where REL-FILE-PATH and REL-CORPUS-PATH are relative paths to a
desired filename and/or corpus directory.
Verbose Mode gives more visual output, however it impacts speed.
Serial Processing yields data results for each file process as
opposed to a conglomerate data processing experience :)
Full Processing yields serial and sequential results as if you
were to have run the program with --serial the first time
and then a second time without that flag.
Output File is the place in which data results are appended
(it won't overwrite existing data).
Corpus Dir is the place where all the files that need to be
processed reside.