Class MakeLmBinaryFromGoogle

java.lang.Object
edu.berkeley.nlp.lm.io.MakeLmBinaryFromGoogle

public class MakeLmBinaryFromGoogle extends Object
Given a directory in Google n-grams format, builds a binary representation of a stupid-backoff language model language model and writes it to disk. Language model binaries are significantly smaller and faster to load. Note: actually running this code on the full Google-ngrams corpus can be very slow and memory intensive -- on our machines, it takes about 32GB of memory and 15 hours.

Note that if the input/output files have a .gz suffix, they will be unzipped/zipped as necessary.

Author:
adampauls
  • Constructor Details

    • MakeLmBinaryFromGoogle

      public MakeLmBinaryFromGoogle()
  • Method Details

    • main

      public static void main(String[] argv)