|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||
java.lang.Objectedu.umn.cs.nlp.mt.tools.ExtractWordPairs
public class ExtractWordPairs
Utility to extract aligned word pairs from an aligned corpus.
The files used must use Unix-style newlines.
4.4 of "Statistical Phrase-Based Translation" by Philipp Koehn, Franz Josef Och, & Daniel Marcu (HLT-NAACL, 2003)| Field Summary | |
|---|---|
static String |
UNALIGNED_MARKER
Special marker to use with unaligned words |
| Constructor Summary | |
|---|---|
ExtractWordPairs()
|
|
| Method Summary | |
|---|---|
static void |
extract(int number_of_lines,
Scanner source_text,
Scanner target_text,
Scanner alignments,
Writer outputFile)
Extract aligned word pairs from an aligned corpus. |
static void |
main(String[] args)
Utility to extract aligned word pairs from an aligned corpus |
| Methods inherited from class java.lang.Object |
|---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
| Field Detail |
|---|
public static final String UNALIGNED_MARKER
| Constructor Detail |
|---|
public ExtractWordPairs()
| Method Detail |
|---|
public static void extract(int number_of_lines,
Scanner source_text,
Scanner target_text,
Scanner alignments,
Writer outputFile)
throws IOException
This method does not convert from upper case to lower case. All input needs to already be in the proper case.
NOTE: The scanners provided for source text, target text, and alignments must all be backed by data that uses Unix-style newlines.
number_of_lines - The number of lines to process from the aligned corpus.source_text - Scanner backed by the source language texttarget_text - Scanner backed by the target language textalignments - Scanner backed by the sentence alignment dataoutputFile - Writer to use when producing output results
IOException - Thrown if an I/O error occurs when writing resultspublic static void main(String[] args)
args - Command line arguments
|
|||||||||
| PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
| SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD | ||||||||