User Commands ezmlm-archive(1) NNNNAAAAMMMMEEEE ezmlm-archive - create thread and author index for a mailing list archive SSSSYYYYNNNNOOOOPPPPSSSSIIIISSSS eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee [ -ccccCCCCFFFFTTTTvvvvVVVV ][ -ffff _m_s_g_1 ] ][ -tttt _m_s_g_2 ] _d_i_r DDDDEEEESSSSCCCCRRRRIIIIPPPPTTTTIIIIOOOONNNN eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee reads the index files from a message archive, and creates a subject index, a collection of subject files, and a collection of author files. These files are suitable as an index for WWW access to, and navigation through a mailing list archive by eeeezzzzmmmmllllmmmm----ccccggggiiii((((1111)))). The index files read are created by eeeezzzzmmmmllllmmmm----iiiiddddxxxx((((1111)))) on a per- list basis and by eeeezzzzmmmmllllmmmm----sssseeeennnndddd((((1111)))) on a per-message archive for a indexed list. The output files created are: _d_i_r////aaaarrrrcccchhhhiiiivvvveeee////tttthhhhrrrreeeeaaaaddddssss////yyyyyyyyyyyyyyyymmmmmmmm The thread index. It contains one line per subject, starting with the number of the first message with that subject within the set investigated, ``:'', a 20 char- acter subject hash, blank, ``[n]'' where ``n'' is the number of messages in the thread, blank, and the sub- ject. The file ``yyyymm'' contains entries for all threads that have messages in the month ``yyyymm'' or that have messages both before and after that month. The subject hash is a key to the subject files; the message number is a key to the index file. The lines are in ascending order by message number when the index is created _d_e _n_o_v_o on an existing archive. When the messages are added one-by-one as in normal archive operation, ``n'' is the number of message in the thread _f_o_r _t_h_e _p_a_r_t_i_c_u_l_a_r _m_o_n_t_h and the order is in reverse of latest message, i.e. the last extended thread is shown last. The message number accompanying a thread is always a message within the thread. It is the first in archives created on existing lists, and the last mes- sage in incrementally created archives. Use the corresponding subject index file to get a list of all messages in the thread in ascending order. _d_i_r////aaaarrrrcccchhhhiiiivvvveeee////ssssuuuubbbbjjjjeeeeccccttttssss////xxxxxxxx////yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy A subject file. The first line is the subject hash, a space, and the subject. This is followed by one line per message with this subject, in the format message number, ``:'', date (yyyymm), ``:'', author hash, blank, author from line. The lines are sorted by mes- sage number. The author hash is a key to the author files; the message number is a key to the index file. SunOS 5.11 Last change: 1 User Commands ezmlm-archive(1) The file in the example would be for the subject hash ``xxyyyyyyyyyyyyyyyyyy''. _d_i_r////aaaarrrrcccchhhhiiiivvvveeee////aaaauuuutttthhhhoooorrrrssss////xxxxxxxx////yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy An author file. The first line is the author hash, a space, and the author from line. This is followed by one line per message with this author, in the format message number, ``:'', date (yyyymm), ``:'', subject hash, blank, subject. The lines are sorted by message number. The subject hash is a key to the subject files; the message number is a key to the index file. The file in the example would be for the author hash ``xxyyyyyyyyyyyyyyyyyy''. _d_i_r////aaaarrrrcccchhhhnnnnuuuummmm keeps track of the last message processed. Normally, eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee will process entries for mes- sages from one above the contents of this file up to an including the message number in _d_i_r////nnnnuuuummmm. OOOOPPPPTTTTIIIIOOOONNNNSSSS eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee writes messages in a crash-proof manner when run in normal mode. When overriding the normal message range with any of the options listed, the normal ssssyyyynnnncccc((((3333)))) of the output files is suppressed for efficiency. Should the com- puter crash during this time the state of the indices is not defined. Use the -ssss option in the (extremely rare) cases where this would be a problem. -cccc Create a new index. This overrides _d_i_r////aaaarrrrcccchhhhnnnnuuuummmm causing eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee to start with the first message in the archive. Synonym for -ffff_0. NNNNOOOOTTTTEEEE:::: eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee does not remove files in the index. While it will overwrite/update old files it will not remove files that are obsolete for other reasons. -CCCC (Default.) Process entries starting with the message after the message listed in _d_i_r////aaaarrrrcccchhhhnnnnuuuummmm. -ffff _m_s_g_1 Process messages from the archive section (set of 100 messages) containing message _m_s_g_1. This is useful if you have removed part of the archive, as it will shor- ten processing time and decrease memory use. NNNNOOOOTTTTEEEE:::: eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee does not remove files in the index. While it will overwrite/update old files it will not remove files that are obsolete for other reasons. The number of messages per thread will be incorrect when using of the -ffff and -tttt switches leads to partial re-indexing of already indexed messages. -FFFF (Default.) Do not change the starting message from the default (see -CCCC). SunOS 5.11 Last change: 2 User Commands ezmlm-archive(1) -ssss Always sync files. -SSSS (Default.) Sync files, except when on of the message range modifying options is used. -tttt _m_s_g_2 Process messages to message _m_s_g_2 instead of the last message in the archive. Again, files written are corrected, but other files are not explicitly removed. -TTTT (Default.) Process entries for messages up to the last message in the archive. -vvvv Display eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee version info. -VVVV Display eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee version info. MMMMEEEEMMMMOOOORRRRYYYY UUUUSSSSAAAAGGGGEEEE eeeezzzzmmmmllllmmmm----aaaarrrrcccchhhhiiiivvvveeee stores its linked lists in memory. On at 32- bit architecture, it uses 12 bytes per message, 28 bytes per thread (plus one copy of the subject), and 20 bytes per author (plus one copy of the author from line). In normal list use, it processes only at most a few messages at a time, but for initial processing of a large archive, considerable amounts of memory may be used. Assuming 40 bytes for subject/from line, 5 messages per thread, 100,000 messages, and 1000 authors, this is 2.5 MB. For 1,000,000 messages this is about 20 MB. Thus, for large archives, it may be useful to use the -_t switch to process the archive in multiple subsets, starting with e.g. the first 100,000, then the next, and so on. SSSSEEEEEEEE AAAALLLLSSSSOOOO ezmlm-cgi(1), ezmlm-idx(1), ezmlm-send(1), ezmlm(5) SunOS 5.11 Last change: 3