bogoutil
{ -h | -V }
bogoutil
[options] { -d file
| -H | -l file
| -m | -w file_or_dir
| -p file_or_dir
} file.db
bogoutil
{ -r | -R } directory
bogoutil
{ --db-verify file
| --db-prune directory
| --db-recover directory
| --db-recover-harder directory
| --db-remove-environment directory
}
where options
is
bogoutil
[-v] [-n] [-C] [-D] [-a age
] [-c count
] [-s min,max
] [-y date
] [-I file
] [-x flags
] [--config-file file
]
Bogoutil is part of the bogofilter Bayesian spam filter package.
It is used to dump and load bogofilter's Berkeley DB databases to and from text files, perform database maintenance functions, and to display the values for specific words.
The -d
option tells bogoutil to print
the contents of the database file to file
stdout
.
The -H
option tells bogoutil to print
a histogram of the specified database file to
file_or_dir
stdout
. The output is similar to
bogofilter -vv. Finally,
hapaxes (tokens which were only seen once) and pure tokens
(tokens which were encountered only in ham or only in
spam) are counted.
The -l
option tells bogoutil
to load the data from file
stdin
into the database file.
If the database file exists, stdin
data is
merged into the database file, with counts added up.
The -m
option tells bogoutil
to perform maintenance functions on the specified database, i.e. discard tokens
that are older than desired, have counts that are too small, or sizes (lengths)
that are too long or too short.
The -w
option tells bogoutil to
display token information from the database. The option
takes an argument, which is either the name of the
wordlist (usually wordlist.db) or the name of the directory
containing it. Tokens can be listed on the command line
or piped to bogoutil. When
there are extra arguments on the command line,
bogoutil will use them as the
tokens to lookup. If there are no extra arguments,
bogoutil will read tokens from
file_or_dir
stdin
.
The -p
option tells bogoutil to
display the database information for one or more tokens.
The display includes a probability column with the
token's spam score (computed using
bogofilter's default values).
Option file_or_dir
-p
takes the same arguments as
option -w
.
The -r
option tells
bogoutil to recalculate the ROBX
value and print it as a six-digit fraction.
file
The -R
option does the same as file
-r
, but prints
more information and saves the result in the training
database.
The -I
option tells
bogoutil to read its input from
file
file
rather than stdin.
The -v
option produces verbose output on stderr
.
This option is primarily useful for debugging.
The -C
inhibits reading configuration
files and lets bogoutil go with the defaults.
The --config-file
option tells
bogoutil to read file
file
instead of
the standard configuration file.
The -D
redirects debug output to stdout (it
usually goes to stderr).
The -x
option sets debugging flags.flags
Option -n
stands for "replace non-ascii characters".
It will replace characters with the high bit (0x80) by question marks.
This can be useful if a word list has lots of unreadable tokens, for example from asian spam.
The "bad" characters will be converted to question marks and matching tokens will be combined
when used with '-m' or '-l', but not with '-d'.
Option -a age
indicates an acceptable token age, with older ones being discarded.
The age can be a date (in form YYYYMMMDD) or a day count, i.e. discard tokens older than
age
days.
Option -c value
indicates that tokens with counts less than or equal to value
are to be discarded.
Option -s min,max
is used to discard tokens based on their size, i.e. length.
All tokens shorter than min
or longer than max
will be discarded.
Option -y date
is specifies the date to
give to tokens that don't have dates. The format is YYYYMMDD.
The -h
option prints the help message and exits.
The -V
option prints the version number and exits.
The --db-prune
option causes bogoutil to checkpoint the database
environment and remove inactive log files.dir
The --db-recover
option runs a regular database recovery
in the specified database directory. If that fails, it will retry
with a (usually slower) catastrophic database recovery. If
that fails, too, your database cannot be repaired and must
be rebuilt from scratch.
This is only supported when compiled with Berkeley DB
support with transactions enabled. Trying recovery with QDBM or TDB support will
result in an error.dir
The --db-recover-harder
option runs a catastrophic data
base recovery in the specified database directory. If that fails,
your database cannot be repaired and must be rebuilt from
scratch.
This is only supported when compiled with Berkeley DB
support with transactions enabled. Trying recovery with QDBM or TDB support will
result in an error.dir
The --db-remove-environment
option has
no short option equivalent. It runs recovery in the given
directory and then removes the database environment. Use
this before upgrading to a new Berkeley
DB version if the new version to be installed requires a log
file format update.directory
The --db-verify
option requests that bogofilter verifies the database file.
It prints only errors, unless in verbose mode.
file
Bogoutil reads and writes text files where each nonblank line consists of a word, any amount of horizontal whitespace, a numeric word count, more whitespace, and (optionally) a date in form YYYYMMDD. Blank lines are skipped.
0 for successful operation. 1 for most errors. 3 for I/O or other errors. Error 3 usually means that something is seriously wrong with the database files.
Gyepi Sam <gyepi@praxis-sw.com>
.
Matthias Andree <matthias.andree@gmx.de>
.
David Relson <relson@osagesoftware.com>
.
For updates, see the bogofilter project page.