[BACK]Return to nanpa.sed CVS log [TXT][DIR] Up to [cvs.NetBSD.org] / src / share / misc

File: [cvs.NetBSD.org] / src / share / misc / nanpa.sed (download)

Revision 1.1, Mon Mar 3 01:13:36 2003 UTC (15 years, 5 months ago) by jhawk
Branch: MAIN
CVS Tags: wrstuden-fixsa-newbase, wrstuden-fixsa-base-1, wrstuden-fixsa-base, wrstuden-fixsa, netbsd-4-base, netbsd-4-0-RELEASE, netbsd-4-0-RC5, netbsd-4-0-RC4, netbsd-4-0-RC3, netbsd-4-0-RC2, netbsd-4-0-RC1, netbsd-4-0-1-RELEASE, netbsd-4-0, netbsd-4, netbsd-3-base, netbsd-3-1-RELEASE, netbsd-3-1-RC4, netbsd-3-1-RC3, netbsd-3-1-RC2, netbsd-3-1-RC1, netbsd-3-1-1-RELEASE, netbsd-3-1, netbsd-3-0-RELEASE, netbsd-3-0-RC6, netbsd-3-0-RC5, netbsd-3-0-RC4, netbsd-3-0-RC3, netbsd-3-0-RC2, netbsd-3-0-RC1, netbsd-3-0-3-RELEASE, netbsd-3-0-2-RELEASE, netbsd-3-0-1-RELEASE, netbsd-3-0, netbsd-3, netbsd-2-base, netbsd-2-1-RELEASE, netbsd-2-1-RC6, netbsd-2-1-RC5, netbsd-2-1-RC4, netbsd-2-1-RC3, netbsd-2-1-RC2, netbsd-2-1-RC1, netbsd-2-1, netbsd-2-0-base, netbsd-2-0-RELEASE, netbsd-2-0-RC5, netbsd-2-0-RC4, netbsd-2-0-RC3, netbsd-2-0-RC2, netbsd-2-0-RC1, netbsd-2-0-3-RELEASE, netbsd-2-0-2-RELEASE, netbsd-2-0-1-RELEASE, netbsd-2-0, netbsd-2, abandoned-netbsd-4-base, abandoned-netbsd-4

Parse HTML tables from NANPA.COM (used by nanpa.awk to produce
na.phone)

# $NetBSD: nanpa.sed,v 1.1 2003/03/03 01:13:36 jhawk Exp $
#
# Parse HTML tables output by 
#   http://docs.nanpa.com/cgi-bin/npa_reports/nanpa
# Specifically, for each html table row (TR),
# print the <TD> elements seperated by colons.
#
# This could break on HTML comments.
#
:top
#				Strip ^Ms
s/
//g
#				Join all lines with unterminated HTML tags
/<[^>]*$/{
	N
	b top
}
#				Replace all </TR> with EOL tag
s;</[Tt][Rr]>;$;g
# 				Join lines with only <TR>.
/<[Tt][Rr][^>]*>$/{
	N
	s/\n//g
	b top
}
#				Also, join all lines starting with <TR>.
/<[TtRr][^>]*>[^$]*$/{
	N
	s/\n//g
	b top
}
#				Remove EOL markers
s/\$$//
#				Remove lines not starting with <TR>
/<[Tt][Rr][^>]*>/!d
#				Replace all <TD> with colon
s/[ 	]*<TD[^>]*> */:/g
#				Strip all HTML tags
s/<[^>]*>//g
#				Handle HTML characters
s/&nbsp;/ /g
#				Compress spaces/tabs
s/[ 	][ 	]*/ /g
#				Strip leading colons
s/^://
#				Strip leading/trailing whitespace
s/^ //
s/ $//