Statistics programm for news articles. Still to be considered in beta stage. ******************** Running the programm ******************** type news-stat -h for programm options. file input is a list of spooler directories containing the articles and the .overview file. for example: /var/spool/news.other/news/comp/lang/c/ /var/spool/news.other/news/comp/unix/programmer/ ... The trailing / must be added. Output is a list of names with number of posts for each one of them, unsorted. Because programm uses stdin/stdout you can get a sorted list using pipes: news-stat -p Jun < input | sort -nr > output-file ********** KNOWN BUGS ********** QP encoding is not handled at all, it appears as is. Posts from deja news are not counted in OS and browser statistics, because they don't contain the appropriate headers (see also below). Names containing special characters ('<', '>', '(', ')') my not appear correctly. It is possible some times for an article to be counted twice or more times (because of dupicate entries in the .overview file). For this reason a small script (find_dup) is used to track duplicate entries in the .overview files. It has the same format for its input file. ********************* ABOUT NEWS EXTRACTION ********************* By-person. This is based on the "From: " header. Not many problems here, since the news server is usually quite strict in that it accepts here. By-network and By-domain. These are based on the "NNTP-Posting-Host: " header. If it is an IP address, network mask is 255.255.0.0 and domain 255.255.255.0 . This works for the local newsgroups I attend, not for everything out there. If it is a DNS name, domain is considered the string from the first dot till the end, and network from the second dot from the right till the end. This fails if next to the top-level domain is a regional domain suffix (such as com.ua). By-Browser and By-OS. These are based on the following headers: User-Agent: (slrn, and others) X-Newsreader: (OE, microplanet gravity, and others) X-Mailer: (netscape) Some news-clients do not have any of them, and their posts are not counted at all. OS type is determined from slrn header, netscape header, and finally for OE and other win clients, it is reported simply as "Microsoft windows" For debbuging reasons, posts that don't have the appropriate headers, or posts with headers that are not in the expected form, are displayed during the file proccess. This may happen for one of the the reasons explained above, or because someone is playing with headers, or simply because the news-client does not report the appropriate header at all. For any suggestions/bugs please mail me at: amanous@cs.ntua.gr