|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.umn.cs.nlp.mt.tools.ExtractWordPairs
public class ExtractWordPairs
Utility to extract aligned word pairs from an aligned corpus.
The files used must use Unix-style newlines.
4.4 of "Statistical Phrase-Based Translation" by Philipp Koehn, Franz Josef Och, & Daniel Marcu (HLT-NAACL, 2003)
Field Summary | |
---|---|
static String |
UNALIGNED_MARKER
Special marker to use with unaligned words |
Constructor Summary | |
---|---|
ExtractWordPairs()
|
Method Summary | |
---|---|
static void |
extract(int number_of_lines,
Scanner source_text,
Scanner target_text,
Scanner alignments,
Writer outputFile)
Extract aligned word pairs from an aligned corpus. |
static void |
main(String[] args)
Utility to extract aligned word pairs from an aligned corpus |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Field Detail |
---|
public static final String UNALIGNED_MARKER
Constructor Detail |
---|
public ExtractWordPairs()
Method Detail |
---|
public static void extract(int number_of_lines, Scanner source_text, Scanner target_text, Scanner alignments, Writer outputFile) throws IOException
This method does not convert from upper case to lower case. All input needs to already be in the proper case.
NOTE: The scanners provided for source text, target text, and alignments must all be backed by data that uses Unix-style newlines.
number_of_lines
- The number of lines to process from the aligned corpus.source_text
- Scanner backed by the source language texttarget_text
- Scanner backed by the target language textalignments
- Scanner backed by the sentence alignment dataoutputFile
- Writer to use when producing output results
IOException
- Thrown if an I/O error occurs when writing resultspublic static void main(String[] args)
args
- Command line arguments
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |