htdig
ht://Dig © 1995, 1996, 1997 Andrew Scherpbier <andrew@contigo.com>
Please see the file COPYING for license
information.
Synopsis
- htdig [options]
Description
- Htdig retrieves HTML documents using the HTTP protocol
and gathers information from these documents which can
later be used to search these documents. This program can
be referred to as the search robot.
Options
- -a
- Use alternate work files. Tells htdig to append .work
to database files, causing a second copy of the
database to be built. This allows the original
files to be used by htsearch during the indexing
run.
- -c configfile
- Use the specified configfile file instead
of the default.
- -h maxhops
- Restrict the dig to documents that are at most maxhops
links away from the starting document. This only
works if -i is also given.
- -i
- Initial. Do not use any old databases. This is
accomplished by first erasing the databases.
- -s
- Print statistics about the dig after completion.
- -t
- Create an ASCII version of the document database.
This database is easy to parse with other
programs so that information can be extracted
from it for purposes other than searching. One
could gather some interesting statistics from
this database.
- -u username:password
- Tells htdig to send the supplied username and
password with each HTTP request. The credentials
will be encoded using the 'Basic' authentication
scheme. There HAS to be a colon
(:) between the username and password.
- -v
- Verbose mode. This increases the verbosity of the
program. Using more than 2 is probably only
useful for debugging purposes. The default
verbose mode (using only one -v) gives a nice
progress report while digging.
Files
- CONFIG_DIR/htdig.conf
- The default configuration file.
See Also
- htmerge, htsearch, Configuration
file format, A
Standard for Robot Exclusion.
Andrew Scherpbier
<andrew@contigo.com>
Last modified: Wed Jan 1 20:46:32 PST