The NetBSD Project

CVS log for pkgsrc/textproc/xapian-omega/distinfo

[BACK] Up to [] / pkgsrc / textproc / xapian-omega

Request diff between arbitrary revisions

Default branch: MAIN

Revision 1.42 / (download) - annotate - [select for diffs], Mon Jul 10 15:08:30 2023 UTC (2 months, 3 weeks ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2023Q3-base, pkgsrc-2023Q3, HEAD
Changes since 1.41: +4 -4 lines
Diff to previous 1.41 (colored)

Update to 1.4.23. From the changelog:


* Improve documentation for OmegaScript numerical and logical operators.  Patch
  from Vaibhav Kansagara.

* Improve documentation for DATEVALUE, xFILTERS and $filters.


* omindex:

  + Handle XPS files with multiple FixedDocument parts better.  Previously we
    only extracted text from the first FixedDocument part.

  + Prefer latter subparts of multipart/alternative which is what RFC2046 (and
    earlier RFCs which that obsoletes) say, but previously we used the first
    subpart that we could get text from.

  + Prefer latter subparts of multipart/alternative when indexing Outlook
    .msg files too.

  + Fix obscure bug in --mimetype option.  We keep track of the length of the
    longest extension we have a mapping for, but this was being updated using
    the length of the MIME type rather than the length of the extension.
    Theoretically this could have led to us effectively ignoring a --mimetype
    option, but in the real world the MIME type will probably always be longer
    so this just results in us testing long extensions unnecessarily.


* Ignore DATEVALUE CGI parameter if START.n, etc is specified on the same
  slot.  We explicitly document not to do this, but if that advice is ignored
  it's more helpful to at least preserve the property that we only have
  one date range per value slot.

* Add flag_ngrams as a preferred new alias for flag_cjk_ngram.  In the next
  release series this feature has been expanded to cover many more languages
  so the "cjk" in the name has become inaccurate as it stands for
  "Chinese, Japanese and Korean").

* Fix handling of Outlook .msg containing Unicode.  Codepoints <= U+00FF appear
  to have been handled correctly, but anything higher resulted in individual
  bytes of the UTF-8 encoding being treated as separate characters.

  Fixes, reported by uhuntu.


* Fix compatibility code for old libmagic versions.  The code we were using
  seems like it would never have worked.  Nobody's reported this (it was
  spotted while looking at the code) so we could just require libmagic >= 4.22,
  but it's trivial to actually handle so we've fixed the fallback code.

* Remove lingering traces of IRIX support as it's been dead for many years.

Revision 1.41 / (download) - annotate - [select for diffs], Sat Feb 4 14:29:45 2023 UTC (7 months, 4 weeks ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2023Q2-base, pkgsrc-2023Q2, pkgsrc-2023Q1-base, pkgsrc-2023Q1
Changes since 1.40: +4 -4 lines
Diff to previous 1.40 (colored)

Update to 1.4.22. From the changelog:


* Improve term prefix documentation.


* omindex:

  + Add --date-terms and --no-date-terms options.

  + Extract page/sheet count for OpenDocument text documents and spreadsheets.

  + Extract created date and keywords for MS XML formats.

* scriptindex:

  + Fix handling of an unterminated final line in input file.


* Add OmegaScript commands to report value slot bounds.

* Add OmegaScript $sortableunserialise{} command.

Revision 1.40 / (download) - annotate - [select for diffs], Sun Sep 25 12:25:57 2022 UTC (12 months, 1 week ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2022Q4-base, pkgsrc-2022Q4, pkgsrc-2022Q3-base, pkgsrc-2022Q3
Changes since 1.39: +4 -4 lines
Diff to previous 1.39 (colored)

Update to 1.4.21. From the changelog:


* Consistently say "macOS" not "Mac OS X", "OS X", etc.


* omindex:

  + Add support for gzip-compressed SVG files (.svgz).

  + Handle <title> in SVG.  Previously only <dc:title> inside <metadata> was
    considered.  If both are present, <title> now takes precedence.


* omegatest: Add skip-for-32-bit-time_t mechanism and use it to conditionally
  enable some testcases which fail on platforms with 32-bit time_t.

build system:

* Update to use AX_CXX_COMPILE_STDCXX which is a replacement for
  AX_CXX_COMPILE_STDCXX_11 (which we were using) which also supports newer C++
  standards versions which will be useful.  For C++11 the only difference seems
  to be that the macro now checks for attribute support - we use C++11
  attributes so that seems a good thing.

Updating during the freeze for the bug and portability fixes.

Revision 1.39 / (download) - annotate - [select for diffs], Mon Jul 11 18:27:07 2022 UTC (14 months, 3 weeks ago) by schmonz
Branch: MAIN
Changes since 1.38: +4 -4 lines
Diff to previous 1.38 (colored)

Update to 1.4.20. From the changelog:


* omindex:

  + OpenDocument: Previously we only inserted an implicit space before each
    paragraph.  Now we insert them both before and after each paragraph and
    heading, and before forced each line-break and tab.

  + Add extension mapping for .awt (Abiword templates).

  + Index metadata from XPS files.

  + -G and -C short options were documented in --help but not previously
    actually handled. Reported by David Bremner.

  + Show --max-size required argument in --help output.

  + Remove lingering handling for database backends without slot bounds since
    all backends have been required to support these since 1.4.11.

* scriptindex:

  + Process an incomplete final line from a dump file.  Previously if the final
    line lacked a newline scriptindex would quietly ignore it (unless it was
    the only line).

  + The `unique` action now takes an optional `missing` parameter to specify
    what to do if a record doesn't trigger the unique action or triggers it
    with an empty value.  The default is now to issue a warning and create a
    new document (the same as before, except that there was only previously a
    warning for the empty value case). In Omega 1.5.0 the default will change
    to an error as that seems a better default, but is less compatible with
    potential existing use.

  + Explicitly allow multiple blank lines in input files.  Previously such
    extra blank lines were treated as empty records and in many cases these
    got quietly skipped, but e.g. with the new UNIQUE checks this could result
    in a warning or error.

  + If we hit an error while parsing the index script we used to exit right
    away, but now we finish parsing the index script since it's more helpful to
    report all the errors in an index script rather than the user having to
    fix them one by one.  This requires us to sensibly recover after each index
    script parse error - if you find a case where this recovery triggers
    further bogus errors please report it and we'll try to improve the

  + In four cases while handling input data (two cases of bad hex data fed
    to `hextobin`, an input data line without a `=`, and `load` failing to
    load the specified file) we'd emit a diagnostic that was labelled as an
    "error" but really it was handled as a warning as we kept reading input
    and the "error" didn't affect the exit status.  It doesn't really make
    sense to continue in any of these cases so we now exit with non-zero status
    right away.

  + A parameter in the index script which should be an integer but isn't, or
    should be positive but isn't now gives an error rather than a warning since
    an error seems more helpful.

  + All diagnostics issued while parsing the index script now include column

  + Avoid forcibly flushing the output stream after every message.


* Improve test coverage for scriptindex.


* Require PCRE2 instead of PCRE. The original PCRE is now EOL and unmaintained
  (last release was June 2021).  In omega it's potentially used to process
  input from the internet, so security is a real concern hence we're switching
  to PCRE2.

Revision 1.38 / (download) - annotate - [select for diffs], Sun Jan 2 09:32:06 2022 UTC (21 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2022Q2-base, pkgsrc-2022Q2, pkgsrc-2022Q1-base, pkgsrc-2022Q1
Changes since 1.37: +4 -4 lines
Diff to previous 1.37 (colored)

Update to 1.4.19. From the changelog:


* configure: Add missing AC_ARG_VAR for all programs so that they are
  documented in --help output, and so that autoconf knows they are "precious"
  and preserves them if configure is rerun even when they're specified via an
  environment variable.

* Add usage examples for $jsonobject.

* Fix path to omega in quickstart document.  Fixes #813, reported by Jim Lynch.

* Update for the IRC channel move from freenode to


* Fix handling of UTF-16 BOMs in XML and HTML - we had the sense of the
  endianness indicated by the BOM the wrong way round.

* Avoid making an extra temporary copy of HTML/XML data which has a UTF16 BOM.

* We now ignore an end of line immediately after a PHP close tag to match what
  PHP does.

* omindex:

  + Fix handling of formatted xlsx dates in certain cases.

* scriptindex:

  + Add new scriptindex whitespace removal actions `ltrim`, `rtrim`, `squash`,
    and `trim`.

  + Improve `truncate` action - if a word ends exactly on the requested length
    we now leave it in place rather than removing it.

  + Report the location of previous `unique` action in the error given when
    `unique` is used more than once.


* Clamp START and END with packed timestamps.  The 4-byte unsigned packed
  time_t format can't represent dates before 1970 or after Sun 07 Feb 2106
  06:28:15 UTC so clamp dates before or after these - previously they would
  wrap around.

* The JSON produced by $jsonobject no longer contains newlines, which makes it
  usable as a single line serialisation format without post-processing.

* Add $base64 OmegaScript command.

* omega: Add flag_no_positions to wrap new


* Fix topterms template to not trigger early matching.  We were checking $msize
  before including the `query` template, but doing so would trigger the query
  to be run, which means that settings early in the `query` template which
  should affect the result (such as $setmap{prefix,...}) were being ignored
  when the `topterms` template was used.  Partly addresses #815, reported by

* Add field support to opensearch and xml templates.  These templates now also
  search title, topic and filename by default and support `title:`, `author:`
  and `topic:` in the query string (both like the template `query` already
  does). Fixes remaining issue in #815, reported by Gennadiy.


* Expand omegatest.  All scriptindex actions now have test coverage.

build system:

* Replace uses of obsolete autoconf macros, fixing warnings if configure is
  regenerated with a recent release of autoconf.


* Don't automatically use _FORTIFY_SOURCE on mingw-w64.  Recent mingw-w64
  versions require -lssp to be linked when _FORTIFY_SOURCE is enabled, so just
  skip the automatic enabling.  Users who want to enable it can specify it

  Fixes #808, reported by xpbxf4.

* Automatically enable GCC warnings -Wduplicated-cond and -Wduplicated-branches
  if using a GCC version new enough to support them.  The usefulness of
  -Wduplicated-cond was highlighted by dcb in #816.

* Fix GCC -Wshadow warning.

* Use clock_gettime() and nanosleep() under modern mingw as these allow higher
  precision than what we previously used.

Revision 1.37 / (download) - annotate - [select for diffs], Tue Oct 26 11:23:39 2021 UTC (23 months, 1 week ago) by nia
Branch: MAIN
CVS Tags: pkgsrc-2021Q4-base, pkgsrc-2021Q4
Changes since 1.36: +2 -2 lines
Diff to previous 1.36 (colored)

textproc: Replace RMD160 checksums with BLAKE2s checksums

All checksums have been double-checked against existing RMD160 and
SHA512 hashes

Unfetchable distfiles (fetched conditionally?):

Revision 1.36 / (download) - annotate - [select for diffs], Thu Oct 7 15:02:46 2021 UTC (23 months, 3 weeks ago) by nia
Branch: MAIN
Changes since 1.35: +1 -2 lines
Diff to previous 1.35 (colored)

textproc: Remove SHA1 hashes for distfiles

Revision 1.35 / (download) - annotate - [select for diffs], Thu Jan 14 18:21:01 2021 UTC (2 years, 8 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2021Q3-base, pkgsrc-2021Q3, pkgsrc-2021Q2-base, pkgsrc-2021Q2, pkgsrc-2021Q1-base, pkgsrc-2021Q1
Changes since 1.34: +5 -5 lines
Diff to previous 1.34 (colored)

Update to 1.4.18. From the changelog:


* omindex:

  + Add default MIME mapping for application/rtf.  IANA have registrations for
    text/rtf and (more recently) application/rtf (it seems because newer
    versions of the RTF format can contain 8-bit data) so we now recognise
    application/rtf by default and handle it the same way as text/rtf.

    Current libmagic seems to always return text/rtf (no matches for
    application/rtf in magic.mgc) and we continue to map extension rtf to
    text/rtf, so this change is mainly future-proofing against libmagic future

  + Add support for indexing OpenXPS, which is effectively the same as XPS
    internally in ways we care about, but it uses a different mimetype and a
    different filename extension.


* Explicitly use OR for MORELIKE queries.

  Since 1.3.0 the default value of DEFAULTOP has been AND, which typically
  makes MORELIKE queries much less useful since they'll only match documents
  containing all the terms from the query expansion.  We now explicitly insert
  " OR " between the terms if DEFAULTOP hasn't been set to OR, which makes them
  work much more like they did in 1.2.x.

* Make $stoplist and $unstem consider all query strings by always passing the
  new Xapian::QueryParser::FLAG_ACCUMULATE flag.

* Add $foreach command which works like $map, but just concatenates the
  evaluated results rather than adding tabs to turn them into an OmegaScript

* Extend $include{} to allow handling failure to open the specified file via an
  optional second argument which if specified will be evaluated and returned
  instead.  Patch from Gaurav Arora.

* Support multiple MORELIKE parameters - we now form an RSet from all the
  specified documents and use that to generate the query to run (previously
  only one of multiple MORELIKE parameters was used).

Revision 1.34 / (download) - annotate - [select for diffs], Fri Aug 21 20:46:05 2020 UTC (3 years, 1 month ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2020Q4-base, pkgsrc-2020Q4, pkgsrc-2020Q3-base, pkgsrc-2020Q3
Changes since 1.33: +5 -5 lines
Diff to previous 1.33 (colored)

Update to 1.4.17. From the changelog:


* Document comment format supported by scriptindex index scripts.  We've
  supported comments on a line by themselves and introduced with a # since
  scriptindex was first added back in 2002, but it seems have never actually
  been documented before now.


* Check for SERVER_PROTOCOL=INCLUDED before anything which might throw an
  exception so that if it is set we suppress the Content-Type: when reporting
  such exceptions.  Spotted by Gaurav Arora.

* Report get_description() for Xapian::Error exceptions instead of get_msg().
  This means we now report the exception's type, context (useful for network
  errors), and errno information.

* Avoid leaking MyStopper object.  The object essentially has the lifespan of
  omega itself, but becomes unreachable when the QueryParser object is
  destroyed.  To make it easier to use leak-checking tools, hand ownership of
  this object to the QueryParser object.


* omegatest: Tell leak sanitizer not to report leaks for allocations which
  aren't explicitly released on exit - the OS will reclaim all memory from the
  process at this point and explicitly releasing everything just takes time for
  no real benefit.  We will still see leaks of objects which become unreachable
  during a run.

Revision 1.33 / (download) - annotate - [select for diffs], Wed Jun 10 17:56:10 2020 UTC (3 years, 3 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2020Q2-base, pkgsrc-2020Q2
Changes since 1.32: +5 -5 lines
Diff to previous 1.32 (colored)

Update to 1.4.16. From the changelog:


* Fix handling of XML empty tag syntax when there's a quoted parameter right
  before the closing `/>`.  This caused `<title xml:lang="en-US"/>` to treat
  the body text as the document title.  Spotted by Gaurav Arora.

* omindex: Fix killing of filter child process if the parent process receives a
  signal.  Spotted by Gaurav Arora.


* Reject $setrelevant without an argument list.  This has never been documented
  as allowed, and previously crashed with a segfault.  Fixes #802, reported by
  Gaurav Arora.

* If there's an error opening the databases we now close any we managed to open
  successfully before the error so that things like $dbsize can't end up
  reporting values for a subset of the specified databases.


* Use our own autoconf cache variable namespace (xo_cv_ prefix instead of
  ac_cv_) to avoid colliding with standard autoconf macro use if or
  a shared config.cache is used.  The former case caused a build failure for
  the OpenBSD port with 1.4.15, reported by Lucas R.

Revision 1.32 / (download) - annotate - [select for diffs], Tue Feb 25 17:55:46 2020 UTC (3 years, 7 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2020Q1-base, pkgsrc-2020Q1
Changes since 1.31: +5 -5 lines
Diff to previous 1.31 (colored)

Update to 1.4.15. From the changelog:


* Update documentation about how to add a new format to omindex.  Patch from
  Bruno Baruffaldi.


* Check for a BOM on HTML files, which for HTML5 should determine the encoding.


* Allow $if{COND} without any actions which is useful as a way to evaluate
  something but ignore the result if you just want the side effects.  Indeed
  we were already recommending to use it if you want to ignore the return value
  of $log.  Fixes bug introduced in 1.4.14, reported by tuftedocelot.

* Add OmegaScript support for $jsonbool{COND} for encoding a boolean value for
  use in JSON.  This is equivalent to $if{COND,true,false} but more readable.

* Add OmegaScript support for $jsonobject{} which allows producing a JSON
  object from an OmegaScript map.

* Allow specifying a format to $jsonarray{} so it is no longer restricted to
  producing an array of strings.

* Add $keys{MAP} OmegaScript command which gives a sorted list of the keys from
  an OmegaScript map.


* Simplify probes for snprintf.  The broken snprintf in libbsd in Linux libc4
  is from ~25 years ago so way too ancient to matter now, and all callers
  already handle the pre-ISO semantics of returning -1 for an undersize buffer
  so we don't need to run a test program to probe for this at configure time,
  which is more cross-compile friendly.

* Avoid deprecation warning on recent Linux.  We were including sys/sysctl.h if
  it existed, which it does on Linux but we don't actually use it there.
  Including it now warns that it is deprecated, so skip including it under
  Linux.  Reported on IRC by kumaran.

Revision 1.31 / (download) - annotate - [select for diffs], Tue Dec 17 03:54:17 2019 UTC (3 years, 9 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2019Q4-base, pkgsrc-2019Q4
Changes since 1.30: +5 -5 lines
Diff to previous 1.30 (colored)

Update to 1.4.14. From the changelog:


* Improve omindex --help docs for --duplicates.
* Document that $log will start to return an error message in 1.5.0, and that
  one can wrap it using a $if with no action now to be future-proof.


* Add built-in support for iso-8859-15 so we can handle it without iconv.
  This charset is a variant of iso-8859-1 with 8 characters changed, most
  notably including the euro currency symbol.  It's the most commonly seen
  charset we didn't have built-in support for.
* Optimise converting us-ascii to UTF-8 to do nothing, like we already do when
  converting UTF-8 to UTF-8.
* scriptindex:
  + Add new 'gap' action which provides a way to leave a gap in the term
    positions between fields to prevent phrases and positional operators from
    matching across fields.


* Fix error handling in $lookup.  We now check for errors from cdb_init()
  and cdb_get().  We've never checked for errors from cdb_init(), while
  for cdb_get() this bug was introduced by a warning fix in 1.2.20.


* Future-proof use of $log against changes in 1.5.0.

Revision 1.30 / (download) - annotate - [select for diffs], Fri Aug 2 21:29:11 2019 UTC (4 years, 2 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2019Q3-base, pkgsrc-2019Q3
Changes since 1.29: +6 -6 lines
Diff to previous 1.29 (colored)

Update to 1.4.12. From the changelog:


* Improve docs for OmegaScript $hitlist{}.

* Fix RST formatting errors in omega docs.

* Clarify use of Q prefix for unique ID terms - it was described as "reserved",
  but the use of "Q" is really just a convention (and in fact omindex uses "U"
  not "Q").

* Clarify scriptindex's weight action takes parameter >= 0.

* Correct typo in OmegaScript $add parameter documentation.


* omindex:

  + Fix typo in mimetypes used for Apple iWork documents ("apply" instead of
    "apple") which meant that these documents weren't actually being indexed.
    Patch from Bruno Baruffaldi.

  + Pipe input to ps2pdf as this accepts input on stdin.  Possibility pointed
    out by Gaurav Arora.

* scriptindex:

  + If parsedate action's format includes %z adjust for the timezone if
    possible (this requires the non-POSIX tm_gmtoff member of struct tm)
    and flag an error for other platforms.

  + If parsedate action's format include %Z flag an error as that doesn't
    seem to be usefully supported by strptime() anywhere.

  + Fix parsedate action to treat formats without a timezone as being UTC
    instead of localtime.

  + Add date=unixutc.  The existing date=unix works in localtime which is
    unhelpful if you want to use it on the output of parsedate since that's in
    UTC; date=unixutc is just like date=unix except it always works in UTC.

  + The date action now emits a warning for invalid values.  The documentation
    used to say "invalid values are ignored at present", but it's more helpful
    to flag bad data than quietly ignore it.

  + We now check the date action's parameter at script parse time and unknown
    values result in an error and nothing being indexed.  Previously an unknown
    format uselessly resulted in the terms D, M and Y literally being added to
    every document.

  + The split action now supports a new "prefixes" split style.  This gives all
    the prefixes from the split, so split=/,prefixes on a file path gives all
    parent directories.


* Remove documented limitation of $subdb and $subid - the implementation
  assumed that each omega database name corresponded to a single Xapian
  database, and if a database name referred to a stub database file expanding
  to multiple Xapian databases then they would misbehave.  Such cases are now
  handled properly as well.

* Extend $addfilter to support adding negated filters via a new optional second
  argument which specifies the type of filter to add.

* Stop $sort from needlessly ensuring the match has run.

* Handle corner case of nested $hitlist gracefully instead of potentially
  entering an infinite loop.


* omegatest: Avoid setting TZ globally during tests as that hides bugs where
  behaviour depends on the local timezone when it shouldn't.

* omegatest: Support testing when built using LeakSanitizer by suppressing
  leak reports for cached compiled pcre regular expressions.  These aren't
  released when the program exits but aren't memory leaks.

build system:

* Remove outdated deprecation warning suppression which was there to support
  building from git in the run up to 1.3.2 - a development version which is
  nearly 5 years ago now.


* Fix problems with fallback strptime() implementation which was being included
  in the wrong binary, and was lacking a required const_cast on the return

* Rework setenv() compatibility handling.  Now that Solaris 9 is dead we can
  assume setenv() is provided by Unix-like platforms (POSIX requires it).  For
  other platforms, provide a compatibility implementation of setenv() so the
  compatibility code is encapsulated in one place rather than replicated at
  every use.

Revision 1.29 / (download) - annotate - [select for diffs], Sun Mar 10 13:21:05 2019 UTC (4 years, 6 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2019Q2-base, pkgsrc-2019Q2, pkgsrc-2019Q1-base, pkgsrc-2019Q1
Changes since 1.28: +2 -1 lines
Diff to previous 1.28 (colored)

Avoid conflicting with system bswap32(). Use SUBST_VARS to mollify pkglint.

Revision 1.28 / (download) - annotate - [select for diffs], Mon Mar 4 01:38:10 2019 UTC (4 years, 7 months ago) by schmonz
Branch: MAIN
Changes since 1.27: +5 -5 lines
Diff to previous 1.27 (colored)

Update to 1.4.11. From the changelog:


* omindex:

  + outlookmsg2html: Handle Subject, Date, and From headers.


* In $div and $mod we were converting a non-zero denominator from string to int
  twice for no good reason.


* omegatest: Fix testcase which was failing if the local timezone was behind
  UTC.  This testcase was added in 1.4.10.

* omegatest: Tweak to not fail when $time not supported - it seems that the
  OS time functions we use report an error on GNU Hurd for unknown reasons.

build system:

* Sync up probes for OS time functions in omega's configure with those in
  xapian-core which may solve $time not being supported on GNU Hurd.


* Add missing includes of <cerrno>.  Fixes #776, reported by Matthieu Gautier.

* Stop using htonl()/ntohl() in a non-network context which should improve
  portability to platforms without a POSIX-like socket API.

Revision 1.27 / (download) - annotate - [select for diffs], Tue Feb 12 19:23:37 2019 UTC (4 years, 7 months ago) by schmonz
Branch: MAIN
Changes since 1.26: +5 -5 lines
Diff to previous 1.26 (colored)

Omega 1.4.10 (2019-02-12):


* Use https for URLs where supported.


* omindex:

  + Index .apxl and .kth files as Apple Keynote.  The .apxl extension is used
    for the XML files inside .key bundles/directories which hold the text
    content of the presentation, and by handling them we can index .key
    directories more usefully.  It seems they are also sometimes found by
    themselves.  Keynote themes have a .kth extension, and key2text can also
    handle these.

  + Pipe input to pdftotext, pdfinto and dpkg.  These tools all support piping
    an input file on stdin, which can be a little more efficient when we
    already have the file open (e.g.  to determine its type using libmagic, or
    to calculate its checksum).

  + An empty string for the start directory is now flagged as an error.
    Previously `/` was used instead, which is unlikely to be what is wanted
    (and `/` can be explicitly specified if that really is what is wanted).

  + Fix emulation of stderr redirection when the indexer's stderr has been
    closed.  We try to avoid using the shell when running external filters, and
    emulate 2>/dev/null in commands, but if the indexer's stderr was closed
    this emulation was buggy and would make give the filter a closed stderr
    instead of one redirected to /dev/null.

  + When emulating redirection to /dev/null, we now open /dev/null once and
    dup that fd each time which is a little more efficient and simplifies the

* scriptindex:

  + date=unix is now a no-op for empty input - previously it would unhelpfully
    add boolean date terms for 1970-01-01.

  + Warn for empty filename in LOAD action.  Previously this gave a slightly
    confusing error: "Couldn't load file '': No such file or directory"

  + Unknown command-line options now cause scriptindex to give a non-zero exit


* omegatest: Add testcase for SPAN.n on different slots.

* omegatest: Update expected QueryParser output for the xapian-core change to
  produce flatter Query trees.

build system:

* Use AM_ICONV to detect iconv() which should handle non-system install of GNU
  libiconv properly.  Fixes #775, reported by Ryan Schmidt.


* Provide fall-back strptime() implementation for platforms which don't provide
  it, using the C++11 std::get_time() function.  We use strptime() directly
  where it's available as some older C++11 compilers seem to lack
  std::get_time() (GCC 4.8 for example).  This is used by the parsedate action,
  which was added in 1.4.6.

Revision 1.26 / (download) - annotate - [select for diffs], Mon Nov 5 05:42:59 2018 UTC (4 years, 10 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2018Q4-base, pkgsrc-2018Q4
Changes since 1.25: +5 -5 lines
Diff to previous 1.25 (colored)

Update to 1.4.9. From the changelog:


* omindex:

  + Try harder to avoid opening a file being indexed more than once by
    reusing the file descriptor in more cases.

  + Hint to the OS not to cache output from external filters which require
    using a temporary file.

* scriptindex:

  + If the LOAD action successfully opens a file but hits a read error the
    error message now reports the file name correctly.  Previously it would
    report the partial file contents read so far instead of the file name.


* We no longer call posix_fadvise() with POSIX_FADV_NOREUSE under Linux,
  since it's still not implemented there.  We also now only call
  posix_fadvise() with POSIX_FADV_DONTNEED right before we close the file
  descriptor under Linux.

Revision 1.25 / (download) - annotate - [select for diffs], Sun Oct 28 03:44:06 2018 UTC (4 years, 11 months ago) by schmonz
Branch: MAIN
Changes since 1.24: +5 -5 lines
Diff to previous 1.24 (colored)

Update to 1.4.8. From the changelog:


* omindex:

  + Improve date handling in .eml files.  We now handle a "Date:" header
    without the day of the week, which is allowed by RFC822 and RFC2822
    (though seems rare in practice).  If the date can't be parsed, we now
    just omit the date information rather than failing to process the file.

  + Add support for indexing Apple iWork documents (Keynote (.key), Numbers
    (.numbers) and Pages (.pages)) using libetonyek.  Currently only the file
    variants are handled since omindex doesn't currently support indexing a
    directory as a document.

  + Index Visio files using vsd2xhtml.

  + Extend --filter to support filters which produce SVG as output.

  + Handle SVG embedded in XML with svg: namespace prefix.

  + Add --read-filters option to read a list of filters from a file, each line
    of which is a rule as passed to --filter.  Based on a patch from Gaurav

  + Add new --mime-type-match option which allows specifying a MIME
    Content-Type for a given shell filename pattern pattern (with the special
    Content-Type values "ignore" and "skip" supported, as for --mime-type).

  + Adjust --mime-type to allow ':' in the extension.  A valid MIME
    Content-Type can't contain a colon, so if the argument to --mime-type
    contains more than one colon it makes more sense to split at the *last*
    colon (we used to split at the first), as an extension could conceivably
    contain a colon.  Mostly this change is for consistency with the new
    --mime-type-match option, where the leafname pattern could reasonably
    contain a colon.

  + Remove failed entries for ignored files.  If a file is mapped to
    pseudo-mimetype "ignore" then remove any existing failure record for it so
    that ignored files so we don't potentially end up with a lot of cruft
    failure records for files we are no longer trying to index.

  + If a file fails to index due to failing to allocate enough memory we now
    try to flag it as failed to index so it will be skipped by default on
    future runs.  This should help to avoid indexing getting stuck on
    problematic files.

  + Add a "pages" field with the number of pages in the document where we
    know how to determine this (currently only for PDF files for which pdfinfo
    reports this information).

  + Handle initially empty database exactly the same was as when --overwrite
    is specified.  This probably has no user-visible consequences, but it's
    cleaner for the handling to be exactly the same.

* scriptindex:

  + Improve scriptindex diagnostic messages.  All diagnostics are now labelled
    as "error", "warning" or "note" as appropriate, and we now consistently
    report "FILE:LINE:" (and also "COLUMN:" in most cases) to make it clearer
    where the problem lies.

  + Add new "split" action which splits the text on a specified delimiter and
    executes the following actions for each piece.  Based on a patch by Gaurav

  + Missing whitespace after the closing " on an action argument is now
    flagged as an error.  Previously scriptindex would attempt to parse
    the following characters as the next action.

  + Support C-like escapes for quoted parameter values.  Notably this means it
    is now possible to include `"` in quoted parameter values.


  + Value-based date range filters can now be specified via CGI parameters
    START.N, END.N and/or SPAN.N where N is a value slot number, allowing
    multiple concurrent filters on different slots to be specified.

  + Support YYYY and YYYYMM limits in term-based date ranges.  Previously
    value-based date ranges supported these as limits, but term-based date
    ranges gave an error.

  + Add stem_strategy option and deprecate existing stem_all option in favour
    of this new more versatile option.

  + Support "natural" $sort option via new flag "#" which sorts embedded
    natural numbers in numerical order.

  + Support numeric $sort option via new flag "n", similar to GNU sort -n.

  + Rewrite field parsing to be more efficient, and store fields in an
    unordered_map for faster lookup.

Revision 1.24 / (download) - annotate - [select for diffs], Sun Aug 26 13:26:12 2018 UTC (5 years, 1 month ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2018Q3-base, pkgsrc-2018Q3
Changes since 1.23: +5 -5 lines
Diff to previous 1.23 (colored)

Update to 1.4.7. From the changelog:


* New OmegaScript $unique command.  The existing $uniq only removes adjacent
  entries (like the Unix uniq command) so to fully remove duplicates you need a
  sorted input.  Sometimes it is desirable to remove duplicates from an
  unsorted list without changing the order of the entries which are left, so
  add $unique to do that.  If the list is sorted already, then $uniq is more

* Fix $map to cleanly reject a single argument.


* templates/query: Merge multiple entries in the term frequency information,
  which came from searching several prefixes by default.  Reported by Alistair
  Buxton on #xapian-discuss.

* When multiple words with the same stem are in the query string we now fully
  eliminate duplicates when showing term frequency information.

Revision 1.23 / (download) - annotate - [select for diffs], Fri Jul 6 16:23:55 2018 UTC (5 years, 2 months ago) by schmonz
Branch: MAIN
Changes since 1.22: +5 -5 lines
Diff to previous 1.22 (colored)

Update to 1.4.6. From the changelog:


* Fix generate_sample() (used by OmegaScript $truncate and omindex) to return
  an empty sample instead of throwing an exception when the requested sample
  size is less than the size of the truncation indicator string.  Patch from
  Addy.  Fixes reported by Gaurav Arora.


* Check for the HTML5 doctype or legacy doctype declaration and use default
  charset UTF-8 if either is present.  Previously we always used ISO-8859-1,
  which is correct for older HTML versions, but not for HTML5.

* omindex:

  + When running commands without going through the shell, emulate shell exit
    codes 127 (for command not found) and 126 (for other cases where we fail to
    run the command).  This means the "missing filter" handling should now work
    properly for such commands.  Noted by Gaurav Arora.

  + Index POD files despite minor formatting errors.  We now pass
    --errors=stderr to pod2text so that minor formatting errors don't prevent
    us from indexing a file.  (It may seem that --errors=none is a better
    option, but for podlators < 4.11 that results in an ERRATA section in the
    generated text version which we then end up indexing; 4.11 fixed that but
    we can't assume that's in use).  Reported by Gaurav Arora.

* omindex:

  + Check file size before calling libmagic to get the mime type, since
    reading the file size is a much cheaper check and we can skip the
    libmagic test if the file is empty or larger than the specified
    maximum size.  Patch from caiyulun.

* scriptindex:

  + Avoid some unnecessary copying of Action objects by making use of C++11

  + Consistently send errors to stderr - some were sent to stdout.
    Patch from Gaurav Arora.

  + Add new "hextobin" action.  Based on a patch from Gaurav Arora.

  + Warn about non-integer arg to hash.

  + Fix hash action without an argument, which was failing with an assertion.
    Based on a patch by Gaurav Arora:

  + Reject 'hash' with argument < 6.  The hashing truncates and then adds a
    6 character hash of the removed part, so can't produce a result shorter
    than 6 characters.  Patch from Gaurav Arora.

  + Look for alphanumerics when parsing index actions.  None of the current
    index actions contain digits, but we give more helpful error messages this

  + Deprecate allowing spaces around = in scripts.  This was never documented
    as supported, and leads to a missing argument quietly swallowing the next
    action rather than using an empty value or giving an error.  Reported by
    Gaurav Arora in

  + In boolean and unique actions, add a colon between prefix and term when
    the term starts with a colon.  This means the mapping is reversible, and
    matches what omega actually does in this case when it tries to reverse the
    mapping.  Thanks to Andy Chilton for pointing out this corner case.

  + Add parsedate and valuepacked actions.  Together these assist adding date
    values for sorting and date range filtering.  Based on a patch from Gaurav

  + Use DB_RETRY_LOCK to wait if the database is already in use rather than
    sleeping for a second and retrying.  On most platforms this means we make a
    blocking request for the lock, and even on platforms where that's not
    supported, we now sleep and retry inside libxapian, and without having to
    throw and catch an exception each time.

* scriptindex:

  + Reject index scripts with multiple "unique" actions.  We don't handle this
    case sensibly, and it doesn't seem like it really has a use, so better to
    give an error for people who do this inadvertently.


* $freq: Speed up some cases by avoiding throwing and catching an exception
  when we know the MSet has no term frequency information.

* $sort: New OmegaScript command which does a string sort on an OmegaScript
  list, with u (unique) and r (reverse) options.

* $cond: New OmegaScript conditional multi-way conditional.  Inspired by LISP's
  COND, this provides a neater way to write a cascade of $if checks.

* $switch: New OmegaScript multi-way conditional which provides an even neater
  way to write a cascade of $if{$eq{X,VALUE1},$if{$eq{X,VALUE2},...}}.

* $subdb and $subid: New commands which report the subdatabase name and the
  docid in that subdatabase.

+ $termprefix and $unprefix: New OmegaScript commands which expose the existing
  code inside omega for splitting up a term.

* Use str() to convert time_t to string, which is simpler code and faster than
  using snprintf().

* New $seterror command to set the error message.  Implemented by Gaurav Arora.

* Make $highlight more efficient.  Patch from Vivek Pal.


* query: Use $prettyurl for the URL shown at the end of each match (previously
  we only used it on the URL shown as a fallback when the document has no
  title).  Split off from changes by Vivek Pal in

Revision 1.22 / (download) - annotate - [select for diffs], Sun Jul 9 22:31:23 2017 UTC (6 years, 2 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2018Q2-base, pkgsrc-2018Q2, pkgsrc-2018Q1-base, pkgsrc-2018Q1, pkgsrc-2017Q4-base, pkgsrc-2017Q4, pkgsrc-2017Q3-base, pkgsrc-2017Q3
Changes since 1.21: +3 -3 lines
Diff to previous 1.21 (colored)

Normalize patch filenames. No functional change.

Revision 1.21 / (download) - annotate - [select for diffs], Sun Jul 9 22:27:47 2017 UTC (6 years, 2 months ago) by schmonz
Branch: MAIN
Changes since 1.20: +6 -6 lines
Diff to previous 1.20 (colored)

Update to 1.4.4. From the changelog:


* omindex:

  + 1.4.3 added a new --sample option, but contrary to the documentation
    the default behaviour was to take the sample from the meta description
    (which was the hard-wired behaviour in 1.4.2 and earlier).  The default
    has now been changed to take the sample from the body.

  + Index .shtm, .xhtml and .xhtm as HTML by default - .shtm is another
    extension used for server-parsed HTML (in addition to the more common
    .shtml), and .xhtm and .xhtml are XHTML.

  + Fix fallback lookup for extension containing upper case.  User mappings
    worked, but built-in extension to MIME type mappings were effectively being
    ignored (because the result of the function call was not being checked).
    Bug introduced in 1.3.4.

  + Fix term-based date ranges, broken by changes in 1.4.2.  Found and
    diagnosed by Gaurav Arora.

  + Handle date range with start after end better - with term-based ranges,
    this used to generate a bogus filter, but now just generates Dlatest.

  + Use Y-term when range starts/ends at year start/end.  Previously we used 12
    M-terms for these cases.

  + Use full leap-year check when constructing term-based date ranges -
    previous code was good until 2100, but even then it would only result
    in an extra term being included for a non-existent February 29th in
    rare cases.

  + Add support for indexing vCard files if Perl and its Text::vCard module
    are available.

  + Recognise application/x-rpm as alternative type since libmagic reports this
    rather than application/x-redhat-package-manager.

  + Use official MIME type application/vnd.debian.binary-package for debian
    packages.  We used to map .deb and .udeb to application/x-debian-package,
    but in 2014 (after we added that support for .deb) an official type was
    registered with IANA.  We now map extensions .deb and .udeb to the official
    type, but the unofficial type is still recognised (older versions of
    libmagic probably report it, and users may be mapping to it).

  + Handle PHP as MIME type text/x-php.  The main difference this makes is that
    PHP files which don't have extension '.php' (e.g. .phtml, .phps, .php5,
    .ph4, etc) get identified by libmagic as text/x-php and will now be indexed.
    It also means that the user can now more easily configure different filters
    for HTML and PHP.

  + Don't use meta description as sample by default.  Now we have dynamic
    snippets (via $snippet), the body text is a better default.  Also generated
    HTML sometimes has unhelpful content in the meta description.  To get the
    previous behaviour, use the new omindex command line option:


* New OmegaScript command $cgiparams which returns a list of the parameter

* Handle tab in a CGI parameter name in the same way as space.  Mostly this is
  a way to avoid having tabs in CGI parameter names - they aren't useful, but
  if they could have tabs in we can't put CGI parameter names in a list.


* query: Fix highlighting of matching terms.  We were using both $snippet and
  $highlight, which results in double highlighting and HTML escaping, most
  noticeable by literal <strong> and </strong> appearing around matching terms
  in the rendered HTML snippet.  Reported by Mark Thomas on xapian-discuss.

build system:

* If gen-mimemap failed after creating mimemap.h, the rule wouldn't get rerun.

Revision 1.20 / (download) - annotate - [select for diffs], Sun Jan 1 10:41:03 2017 UTC (6 years, 9 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2017Q2-base, pkgsrc-2017Q2, pkgsrc-2017Q1-base, pkgsrc-2017Q1
Changes since 1.19: +5 -5 lines
Diff to previous 1.19 (colored)

Update to 1.4.2. From the changelog:


* Replace auto-generated list of the supported MIME types with an
  auto-generated table showing the extensions that are mapped to each MIME type
  by default.  Partly addresses #569, reported by catkin.


* omindex: Add support for indexing markdown files (extension .md or .markdown,
  mime-type text/markdown, using "markdown" to convert to HTML).


* Add support for "make installcheck" to run tests against installed version.

build system:

* configure: Fail with clear error with xapian-core < 1.4.0.


* Fix GCC -Wimplicit-fallthrough warning.

* Add missing <ctime> for time_t.

* Avoid snprintf_for formatting fixed-width integers - it results in warnings
  about possible output truncation with GCC7 (which aren't actually possible
  due to limited input range) and it's a bit heavyweight for this job anyway.

Revision 1.19 / (download) - annotate - [select for diffs], Mon Nov 7 13:02:45 2016 UTC (6 years, 10 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2016Q4-base, pkgsrc-2016Q4
Changes since 1.18: +7 -7 lines
Diff to previous 1.18 (colored)

Update to 1.4.1. From the changelog:


  + Also index leafname with _ and & replaced by spaces.  Literal spaces are
    often avoided in filenames, and "hello_world.txt" ought to be searchable for
    via "hello" and "world".  Partly addresses #618, reported by Julien

  + Make named entity look-up (e.g. &eacute; -> 233) use the same keyword-lookup
    table approach we already use for HTML tags and built-in MIME content-types,
    rather than a std::map, which makes it faster while using less memory.

  + Avoid using the shell to run most external commands as it's unnecessary
    overhead.  For the built-in filters, the only cases which now use a shell
    are where we run two unzip commands.  For user-specified commands, a simple
    and slightly conservative test is used, which should avoid a shell in most
    common cases where it isn't needed.  Notably, environment variables set
    before the command are handled.

  + Track files which couldn't be indexed in the user metadata and skip them by
    default on subsequent runs to avoid the costs of repeatedly running a
    filter on a file it can't handle.  Run omindex with --retry-failed to retry
    such files.

  + Overhaul the "per-site" terms:
    - 'H' prefix is hostname as before, except that if the term would be > 240
      bytes (unlikely but possible) the end is hashed is the same way 'U'
      prefix terms are.
    - 'P' terms are now added for every directory level, not just the start
      URL's path.
    - A new 'J' prefix term is added with the start URL (less any trailing
      '/'), which means all files indexed from a particular "site" are now
      indexed by one term.  See #376.

  + Add 'skip' pseudo-mimetype which extensions can be mapped to, and they will
    then be reported and skipped (to complement the existing 'ignore'
    pseudo-mimetype which causes files with the specified extension to be
    quietly ignored).

  + Treat a command of 'true' specially as meaning make the text extraction a
    no-op (as actually running /bin/true effectively would).  This provides a
    way to index some file types by only meta-data.  Fixes #519, reported by
    Brian Burton.

  + Add support for wildcard mimetypes */* and *.  Combined with filter command
    ``true`` for indexing by meta-data only, you can specify a fall back case
    of indexing by meta-data only using ``--filter '*:true'``.  From a
    suggestion by Brian Burton on xapian-discuss.

  + Index message/rfc822 and message/news.  These are individually saved email
    messages and news articles.

  + Index archived web page formats MAFF and MHTML.

  + Handle .xla, yet another XL extension.

  + Handle metadata in LibreOffice HTML export (dcterms.subject,
    dcterms.description, dcterms.creator and dcterms.contributor).

  + Use zlib's gzopen() instead of invoking "gzip -dc" for compressed Abiword

  + Add support for %f in command passed to --filter to allow specifying
    commands where the input file is not the final argument.  Fixed #570,
    reported by Charles Atkinson.

  + Allow --filter to handle commands which produce output in a temporary file
    rather than on stdout.

  + Allow --filter to specify the character set of the output the filter

  + Handle application/, text/x-perl and application/x-dvi via
    default --filter settings instead of hardcoded cases (now possible thanks
    to the new abilities that --filter has).

  + Add support for specifying a MIME subtype of '*' in --filter arguments.

  + Add -track-ctime option to allow omindex to pick up changes to file
    ownership and permissions.

  + Index terms from the leafname with an 'F' prefix, rather than treating them
    as more body text.  (Fixes #633, reported by Emmanuel Garette)

  + The starting URL wasn't previously URL encoded.  In 1.2.18, a minimally
    intrusive fix was implemented.  In 1.3.2, we now encode the starting URL
    as we do for the rest of the filename.

  + Don't assume .doc is application/msword but let libmagic decide, since .doc
    files may actually be RTF, and sometimes people use .doc for plain-text

  + Add support for indexing 'topic' and 'created date' meta-data for
    OpenDocument format and HTML.

  + Index "topic" for PDF documents.

  + Commit changes and exit, rather than skipping the current file on most
    unexpected errors reading directories or initialising libmagic - otherwise
    we can end up deleting a lot of database entries on errors like EHOSTDOWN
    when indexing network mounts.

  + Add --opendir-sleep=SECS option to allow working around problems with
    indexing files on Microsoft DFS shares.

  + If we get ENOTDIR trying to index a file, skip it quietly (unless in
    verbose mode) as we already do if we get ENOENT, since ENOTDIR is what we
    get if the file and the directory it was in got removed between us getting
    the filename and trying to open it.

  + Handle ENOENT, ENOTDIR and EACCES from readdir().

  + If we've already opened the file (as we often will have if using a modern
    libmagic with magic_descriptor() available), then use fstat() on that fd
    rather than stat()/lstat() on the pathname.

  + Pass error message string and errno value in ReadError exceptions.

  + Report strerror(errno) if we can't read a file.

  + Filtering via text/html now handles HTML documents which specify a charset.

  + Add support for indexing Microsoft Publisher files using pub2xhtml.

  + Restrict the length of what we consider to be an extension, currently to 7
    characters or whatever the longest extension in the mime_map is if it is

  + Avoid '//' in temporary filenames (cosmetic only).

  + Extend --filter to handle commands which produce HTML on stdout.

  + Don't report an error if a file is deleted (or renamed) between us reading
    the directory entry for it and trying to read the file itself by default.
    In --verbose mode, the situation is still reported, but now with a
    specific message.

  + If omindex receives any of the signals SIGHUP, SIGINT, SIGQUIT or SIGTERM,
    then kill any active external filter child process, then handle the signal
    as we did before.  If setpgid() is available, put each external filter in
    its own process group and kill the whole process group when we get a

  + Use magic_descriptor() if the version of libmagic we're building against
    is new enough to have it.  This eliminates an extra opening of a file
    being indexed in certain cases.

  + Use rst2html to handle .rst and .rest files.

  + Index title with an 'S' prefix rather than no prefix.

  + If the document with the highest existing docid before the run was updated,
    we were reporting it as "added", but now we correctly report it as

  + Catch and report std::exception explicitly, so failing to allocate memory
    is no longer reported as "Unknown exception".

omindex-list: New tool to list URLs of all the documents in a database
(or list of databases) indexed by omindex.

* The HTML parser now explicitly handles <APPLET>, <OBJECT> and <TR>.

* Use a generated compact and efficient table to convert HTML tag names
  to enum codes - this is both faster and smaller than the approach we were
  using, with the benefit that the table is auto-generated.

* Always use our built-in conversion code for the character sets it can handle
  (previously we'd use iconv if available; now we only use iconv for other
  character sets).  This gives us more consistent results, and in particular
  means we now handle BOMs better (at least when using GNU iconv).

* A lot of data labelled as "iso-8859-1" is actually "windows-1252".  The two
  only differ in characters which are control characters in iso-8859-1, so
  assume the latter when we see the former.


  + Remove special error handling case noting that index=nopos was replaced
    with indexnopos - this was removed in 1.1.0 so there's been enough time to


* Add support for sorting by more than one value - e.g. SORT=+1,-2

* Add $msizelower and $msizeupper which provide access to the lower and upper
  bounds on the number of matches.

* Add support for $set{weighting,coord}.

* Add weightingpurefilter option.  Normally a query consisting only of filter
  terms won't have relevance weights calculated.  This new option allows you to
  specify a weighting scheme to use for such queries, with the same values
  supported as for the existing weighting option.  For example,
  $set{weightingpurefilter,coord} will weight such queries by how many filter
  terms match each document.

* $filters now includes DATEVALUE, which means we'll force the first page when
  reloading or changing page starting from existing URLs upon upgrade to 1.4.1,
  but the exact same existing URL could be for a search without the date filter
  where we want to force the first page, so there's an inherent ambiguity
  there.  Forcing first page in this case seems the least problematic

* Implement $match command for omegascript.  Patch from Richhiey Thomas.

* Add optional prefix argument to $terms.

* $snippet now uses MSet::snippet() instead of the Snipper class.

* Add $contains{STRING1,STRING2}.  Contributed by Ayush Gupta.

* Add support for negated boolean filter terms, specified by CGI parameter "N".

* Support a direction prefix on SORT: '+' for ascending, '-' for descending.
  SORTREVERSE set to non-0 now flips the direction.  Fixes #697, reported by
  Andy Chilton.

* Add options argument to $transform.

* Cache compiled regexps used in $transform.

* Add $ord OmegaScript command which returns the Unicode codepoint for the
  first character of a UTF-8 string.

* Add $chr OmegaScript command which returns the UTF-8 string for given Unicode

* Add $csv OmegaScript command which escapes a string for use as a field in a
  CSV file ("always quote" mode inspired by patch from Gaurav Arora.)

* New $filters encoding which avoids collisions.  We also compare CGI parameter
  xFILTERS to what $filters would have returned in previous releases, so that
  on upgrades old format serialised filters are handled correctly.

* Fix $jsonarray not to prepend ']' to the first array element.

* Skip weighting scheme setup for a pure date range query - it won't be
  weighted anyway, so we can avoid having to parse weighting scheme parameters,

* Use value ranges when date range filtering by value.  Should be more
  efficient than a MatchDecider, and will automatically take advantage of any
  future value range optimisations in xapian-core.

* Add default_db and default_template config options.  These allow the default
  template and default database name to be set via the config file, rather than
  being stuck with the respective defaults of "default" and "query".  Fixes
  #310, reported by Marco Hennigs.

* Add support for non-exclusive filters.  Fixes #234, reported by Thomas

* Fix handling of multiple P.<prefix> fields - previously only the first seen
  was used.  These fields are also now taken into account when deciding if the
  query has changed.  $query now returns an OmegaScript list with one entry for
  each CGI parameter passed.

* Allow setting query expansion scheme to "bo1".

* Make the $json and $jsonarray force the text to be valid UTF-8, since
  otherwise the output isn't valid JSON.

* Check parameters to $set{weighting,bm25 ...} and $set{weighting,trad ...}
  converted OK.  Based on patch from Aarsh Shah.

* Add support to $set{weighting,...} for bb2, dlh, dph, ifb2, ineb2, inl2, lm,
  pl2 when we're built against a xapian-core which is new enough to have these

* Add $snippet to generate a snippet of text tailored to the search.

* Add new $json and $jsonarray OmegaScript commands to support producing JSON

* Add $truncate command which truncates a string after a word.

* Add support for $set{weighting,tfidf} to allow the new TfIdfWeight weighting
  scheme to be used.

+ DEFAULTOP now defaults to AND rather than OR, since that matches what pretty
  much every search engine does these days.  Closes ticket#512.

* Allow mapping a query string prefix to more than one term prefix (which
  xapian-core has supported since 1.0.4).

* Add support for search inputs for multiple probabilistic prefixes, with
  support for per-prefix stemmers.

* Drop legacy support for handling '.' separated terms in xP - that changed in
  Omega 0.9.7, more than 5 years ago now.

* Remove support for OLDP CGI parameter which was superseded by xP
  approximately a decade ago, and isn't even documented!

* Drop special handling for R-prefixed terms in $prettyterm - we stopped
  generating these in Xapian 1.0.


* Lower case all HTML tags, attributes and values; explicitly close <option>
  tags.  Patches from Vivek Pal and Nirmal Singhania.

* Migrate Omega Templates to HTML5.  Patch from Nirmal Sighania.

* templates/query: Remove stray double quote from generated URL for spelling
  suggestion when THRESHOLD is set.  Patch from Nirmal Singhania.

* templates/opensearch: Change response feeds to support OpenSearch 1.1.
  Patch from Nirmal Singhania.

* templates/query: Fix setting setting of prefix map for P - in 1.3.2, this
  would failed to also search in the subject.  Now it also searches in the
  subject and topic.

* templates/query:

  + We now map unprefixed queries to include S-prefixed terms to match the
    change in omindex to prefixing terms from the title with S.  You may want
    to make the same update to your own templates.

  + Set up prefixes for 'author:' and 'title:'.

Revision 1.18 / (download) - annotate - [select for diffs], Sat Apr 30 14:14:17 2016 UTC (7 years, 5 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2016Q3-base, pkgsrc-2016Q3, pkgsrc-2016Q2-base, pkgsrc-2016Q2
Changes since 1.17: +5 -5 lines
Diff to previous 1.17 (colored)

Update to 1.2.23. From the changelog:


* Update links to Xapian website and trac to use https, which is now supported,
  thanks to James Aylett.


* Fix HTML/XML entity decoding to be O(n) not O(n²) - processing HTML/XML with
  a lot of entities is now much faster.


* Remove unused country code to name maps.  These were intended as examples,
  but they aren't very useful as such, and really just bloat the templates

Revision 1.17 / (download) - annotate - [select for diffs], Wed Jan 13 21:03:49 2016 UTC (7 years, 8 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2016Q1-base, pkgsrc-2016Q1
Changes since 1.16: +5 -5 lines
Diff to previous 1.16 (colored)

Update to 1.2.22. From the changelog:


* Stop maintaining ChangeLog files.  They make merging patches harder, and stop
  'git cherry-pick' from working as it should.  The git repo history should be
  sufficient for complying with GPLv2 2(a).

* Clarify help text for omindex --mime-type option.

* docs/omegascript.rst:

  + Fix documentation of $last to say it's the MSet index *one beyond* the end
    of the current page.  Reported by Andrew Chilton.

  + Clarify that $split and $substr work in bytes.  Previously we said
    "characters" which could be taken as meaning they work with UTF-8

  + Update documentation for $filters - it was missing these CGI parameters
    from the list of those serialised: COLLAPSE, DOCIDORDER, SORT, SORTREVERSE,

  + Explicitly note user can use $setmap to create their own maps.

* docs/overview.rst:

  + SVG extraction is built-in too.

  + Expand paragraph about command `false`.  Note the versions where explicit
    support was added, and that this will also work with any version on Unix,
    where `false` is a command.

  + Document `cdb_dir`.

* docs/cgiparams.rst: Document behaviour if xDB is not set.

* Change "characters" to "bytes" in a few places to clarify that we don't mean
  Unicode code points.


* omindex:

  + Add '--title-size' option.

  + Handle .oft the same way as .msg - it's some sort of template email, and
    has essentially the same format.


* Make $querydescription ensure the match has been run, so that it includes

* Avoid $allterms, $cgilist, $filterterms and $terms being O(n²) in the number
  of items in the returned list.

* If xFILTERS is not set, don't force the first page as that's unhelpful if
  someone fails to set it in their template.

* When environment variable SERVER_PROTOCOL is set to INCLUDED (as it is when
  we're being included in a page), we already suppress the HTTP headers, but
  now we suppress the blank line after the header too.

* Support option flag_cjk_ngram if built against xapian-core >= 1.2.22.


* Add test coverage for parsing of HTML entities.

build system:

* Fix error reporting if PCRE isn't installed. Fixes #693, reported by lhz7370.


* Avoid warning when building with glibc >= 2.21.

* Don't provide our own implementation of sleep() under __WIN32__ if there
  already is one - mingw provides one, and in some situations it seems to clash
  with ours.  Reported to xapian-discuss by John Alveris.

* Stop trying to use O_STREAMING - the patch to implement it was never merged
  into the Linux kernel, and I can't find any evidence that other platforms
  implement it.  The constant value O_STREAMING used now seems to be used for
  the part of O_SYNC which isn't covered by O_DSYNC, which seems likely to hurt
  performance if anything.

Revision 1.16 / (download) - annotate - [select for diffs], Wed Nov 4 02:00:15 2015 UTC (7 years, 11 months ago) by agc
Branch: MAIN
CVS Tags: pkgsrc-2015Q4-base, pkgsrc-2015Q4
Changes since 1.15: +2 -1 lines
Diff to previous 1.15 (colored)

Add SHA512 digests for distfiles for textproc category

Problems found locating distfiles:
	Package cabocha: missing distfile cabocha-0.68.tar.bz2
	Package convertlit: missing distfile
	Package php-enchant: missing distfile php-enchant/enchant-1.1.0.tgz

Otherwise, existing SHA1 digests verified and found to be the same on
the machine holding the existing distfiles (morden).  All existing
SHA1 digests retained for now as an audit trail.

Revision 1.15 / (download) - annotate - [select for diffs], Sat May 23 18:21:16 2015 UTC (8 years, 4 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2015Q3-base, pkgsrc-2015Q3, pkgsrc-2015Q2-base, pkgsrc-2015Q2
Changes since 1.14: +5 -5 lines
Diff to previous 1.14 (colored)

Update to 1.2.21. From the changelog:


* docs/overview.rst: Document 'E' prefixed boolean terms for filtering by
  extension (see #668, reported by bramvdh).

* docs/encodings.rst: Add a document about character encoding, as suggested by
  James Aylett in #550.

* docs/cgiparams.rst: Improve wording of docs for SORT parameter.

* docs/omegascript.rst: Update documentation references to DATE1, DATE2, and
  DAYSMINUS which were renamed in 0.6.x and the compatibility aliases removed
  in 1.0.0.


* omindex:

  + outlookmsg2html: Fix handling of message/rfc822 subparts.

  + Ignore extensions .msi and .msp, which are Microsoft installer files, but
    which libmagic sometimes incorrectly identifies as application/msword.

  + Interpret a command of "false" in "--filter" as meaning to ignore files
    with that MIME type.


* $prettyurl now decodes valid UTF-8 sequences, and some additional ASCII
  characters in the path part: []@!$&'()*+.;= (Fixes #550 and #644, reported by
  catkin and terencz.)

* $prettyurl now leaves the query and fragment parts of the URL alone and won't
  decode an escaped "/" (omindex doesn't create URLs with any of these, so we
  only risk breaking other URLs which have them).

* Drop compilation date and time from output when run from the command line -
  they prevent reproducible builds and the version number is sufficient

* Handle CGI parameter [=0 as [=1.


* templates/query: When listing matching terms, don't make the commas italic.

* templates/query: Eliminate blank line before <html>.

* templates/xml: Add XML declaration.

* templates/godmode: Specify charset utf-8 in the content-type.

* templates/xml: Update handling of DATE1, DATE2 and DAYSMINUS which were
  renamed in 0.6.x and the compatibility aliases removed in 1.0.0.

build system:

* Link test programs with libtool's '-no-install' or '-no-fast-install', like
  we already do in xapian-core, which means that libtool doesn't need to
  generate shell script wrappers for them on most platforms.

* configure: Use pkg-config in preference to determine flags needed to
  compile and link with PCRE, as this will just work when cross-compiling
  (at least under MXE).

* configure: Define MINGW_HAS_SECURE_API under mingw to get _putenv_s()
  declared in stdlib.h.

* Enable automake option 'subdir-objects' to avoid warning from newer automake.


* Add spaces between literal strings and macros which expand to literal strings
  for C++11 compatibility.

* Remove 'register' as it's deprecated and clang spits out warnings because of
  that.  Any modern compiler likely just ignores it as an optimisation hint

* Avoid doing link tests with libmagic in configure as they fail on mingw due
  to not automatically picking up libraries which libmagic itself depends on.

Revision 1.14 / (download) - annotate - [select for diffs], Mon Nov 17 09:06:01 2014 UTC (8 years, 10 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2015Q1-base, pkgsrc-2015Q1, pkgsrc-2014Q4-base, pkgsrc-2014Q4
Changes since 1.13: +4 -4 lines
Diff to previous 1.13 (colored)

Update to 1.2.19. From the changelog:


* docs/overview.rst: Note that pdftotext is part of poppler as well as xpdf.
  (Noted by Paul Wise)

Revision 1.13 / (download) - annotate - [select for diffs], Sun Jul 6 15:21:32 2014 UTC (9 years, 2 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2014Q3-base, pkgsrc-2014Q3
Changes since 1.12: +5 -5 lines
Diff to previous 1.12 (colored)

Update to 1.2.18. From the changelog:


* omindex:

  + Work around libmagic returning a MIME content-type of "Composite Document
    File V2 Document[...]" or "application/CDFV2-corrupt" by returning a more
    suitable filetype based on looking at the file's extension.

  + The starting URL wasn't previously URL encoded.  In 1.3.2, this will be
    fixed by URL encoding it as we do for the rest of the path, for the 1.2
    branch we only URL encode it if it contains a character <= 31 or at least
    one of '#', '%', ':' or '?'.  This avoids a one-off reindex of every
    document in the database in cases which work OK in practice.

  + When we skip a file because it exceeds the configured size limit, include
    that size limit in the message.


* Add support for setting the query expansion scheme to use.


* Don't compile in - it isn't currently used, and it fails to build
  with mingw.  (fixes #635, reported by Alexis Denis)

* Fix warning when built with GCC 4.7.2 using -Os.

* Removed unused inline function, fixing compiler warning.

Revision 1.12 / (download) - annotate - [select for diffs], Thu Feb 20 19:15:43 2014 UTC (9 years, 7 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2014Q2-base, pkgsrc-2014Q2, pkgsrc-2014Q1-base, pkgsrc-2014Q1
Changes since 1.11: +5 -5 lines
Diff to previous 1.11 (colored)

Update to 1.2.17. From the changelog:


* docs/overview.html: Add Abiword as an example use of --filter, based on patch
  from Frank J Bruzzaniti (fixes#383).


* Fix "no previous declaration" warning on platforms which don't have


* omindex:

  + Fix off-by-one when finding documents to delete which would sometimes cause
    omindex to fail to delete documents from the database when they weren't
    refound during an index update.

  + Decode dates in xlsx files.

  + Ignore extensions 'adm', 'cur', and 'ico' by default.

  + Group-readable files which are owner-readable but not world-readable should
    still get a "readable by owner" term added.  Reported by Emmanuel Garette.

build system:

* Compress source tarballs with xz instead of gzip.

* configure: Sync compiler warning flag machinery against xapian-core.  The
  changes are special handling for clang, passing -fshow-column where
  supported, and handling for new warning flags in GCC 4.6 and 4.7.

Revision 1.11 / (download) - annotate - [select for diffs], Tue Jun 4 21:28:26 2013 UTC (10 years, 4 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2013Q4-base, pkgsrc-2013Q4, pkgsrc-2013Q3-base, pkgsrc-2013Q3, pkgsrc-2013Q2-base, pkgsrc-2013Q2
Changes since 1.10: +6 -6 lines
Diff to previous 1.10 (colored)

Update to 1.2.15. From the changelog:

Omega 1.2.15 (2013-04-16):


* Don't pointlessly link utf8convert.o into the omega CGI.

Omega 1.2.14 (2013-03-14):


* omindex:

  + Correct "max" -> "min" when reserving space for shared strings in .xlsx
    files.  This just means we now reserve a more appropriate amount of space
    to start with.

  + Ignore .com files by default.

Omega 1.2.13 (2013-01-09):


* omindex:

  + Extracting text using external filters now works for filenames containing a
    newline character - previously the newline got lost during escaping for the

  + Fix segfault when -F option without a ':' is passed.

  + Skip a file if we get a read error while calculating the MD5 checksum (used
    for duplicate detection) - previously we used a checksum of the file up to
    that point.

  + Avoid rereading SVG and Atom files when we calculate their MD5 checksums.

  + Improvement --help output and man page, most notably:

    - Say explicitly that --sample-size accepts the same formats as --max-size.

    - Note default size limit on files to index is unlimited.

  + When generating a sample for a CSV file, limit the size we pre-allocate to
    the CSV file size if that's smaller than the requested sample size, in case
    the user sets that limit very high.


* Fix to decode %-encoded character at the end of the query string.

Omega 1.2.12 (2012-06-27):

No changes since 1.2.11 except to bump the version - this release was made to
fix an incorrect library version information update in xapian-core 1.2.11.

Omega 1.2.11 (2012-06-26):


* Change HTML parser's handling of multiple <body> tags and of text outside of
  <body> to match the behaviour of modern web browsers.  (ticket#599)

* omindex:

  + Add command line option to control the size of the document sample stored.
    Patch from Mihai Bivol.

  + Rework .xlsx parsing to substitute the shared strings into the positions
    they are used in, so that the sample actually matches what appears in the
    spreadsheet, and to index calculated cell contents.

  + Improve handling of headers and footers in OpenDocument documents.

  + pdftotext outputs a formfeed between each page, which messes up our "empty
    body" check, so trim any trailing formfeeds before this check.

Omega 1.2.10 (2012-05-09):


* Add support for CDATA to HTML/XML parser.

* omindex:

  + Add --max-size option, based on patch from ndaley in ticket#587.

  + Add support for atom feed files, patch from Mihai Bivol in ticket#595.

  + If the document with the highest existing docid before the run was updated,
    we were reporting it as "added", but now we correctly report it as
    "updated".  (Backported from 1.3.0).

  + Catch and report std::exception explicitly, so failing to allocate memory
    is no longer reported as "Unknown exception".  (Backported from 1.3.0).

Omega 1.2.9 (2012-03-08):


* docs/overview.html:

  + Document that libmagic is used to determine the MIME type if the extension
    isn't known.  Partly addresses ticket#569.

  + We now limit time as well as CPU and memory for external filters.


* Our HTML parser now ignores sections bracketed by <!--UdmComment--> and
  <!--/UdmComment-->, like we already do for <!--htdig_noindex-->.

* omindex: Add more extensions to the default ignore list: bin dat db fon jar
  lnk pyc pyd pyo sqlite sqlite3 sqlite-journal tmp ttf

Revision 1.10 / (download) - annotate - [select for diffs], Tue Jan 10 01:03:59 2012 UTC (11 years, 8 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2013Q1-base, pkgsrc-2013Q1, pkgsrc-2012Q4-base, pkgsrc-2012Q4, pkgsrc-2012Q3-base, pkgsrc-2012Q3, pkgsrc-2012Q2-base, pkgsrc-2012Q2, pkgsrc-2012Q1-base, pkgsrc-2012Q1
Changes since 1.9: +6 -6 lines
Diff to previous 1.9 (colored)

Update to 1.2.8. Changelog since 1.0.18 is way too long and highlights
aren't obvious. Lots of bug fixes.

Revision 1.9 / (download) - annotate - [select for diffs], Tue Feb 16 14:53:13 2010 UTC (13 years, 7 months ago) by wiz
Branch: MAIN
CVS Tags: pkgsrc-2011Q4-base, pkgsrc-2011Q4, pkgsrc-2011Q3-base, pkgsrc-2011Q3, pkgsrc-2011Q2-base, pkgsrc-2011Q2, pkgsrc-2011Q1-base, pkgsrc-2011Q1, pkgsrc-2010Q4-base, pkgsrc-2010Q4, pkgsrc-2010Q3-base, pkgsrc-2010Q3, pkgsrc-2010Q2-base, pkgsrc-2010Q2, pkgsrc-2010Q1-base, pkgsrc-2010Q1
Changes since 1.8: +4 -7 lines
Diff to previous 1.8 (colored)

Update to 1.0.18.
The rlimit issue adressed in patches ac,ad,ae was already addressed in
release 1.0.11, so remove them.

Omega 1.0.18 (2010-02-14):


* Make the default charset "utf-8" not "UTF-8" as we lower case explicitly
  specified character sets to compare to see if we need to reparse.  Previously
  XML documents which explicitly specified their character set as UTF-8 would
  cause needless restart or the parser.

* omindex:

  + Increase the wdf boost for the document title from 2 to 5, since 2 isn't
    really enough.

* scriptindex:

  + Don't abort with "Unknown Exception" if indexing is disallowed or we hit
    </body> for a document which had an overridden character set.  Fixes

Omega 1.0.17 (2009-11-18):


* omindex:

  + On Linux, change the memory limit on external filters to use _SC_PHYS_PAGES
    since _SC_AVPHYS_PAGES excludes pages used by the OS cache and so will
    often report a really low value.  Fixes Debian bug#548987 and ticket#358.

  + Fix likely crash when reading output from external filter program if read()
    is interrupted by a signal.

  + Fix potential crash when indexing PostScript files (fixed by using delete[]
    (not delete) for array allocated by new[]).


* utf8converttest: Charset "8859_1" isn't understood by Solaris libiconv, and
  isn't a standard charset name, so just test it when using our built-in
  converter and GNU libc.


* Fix build failure on Mac OS X 10.6.

* Also check for socketpair() in -lxnet if it isn't found without, which
  enables resource limits on external filter programs called by omindex on
  Solaris, and possibly some other platforms.  Fixes ticket#412.

Revision 1.8 / (download) - annotate - [select for diffs], Thu Sep 10 18:54:29 2009 UTC (14 years ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2009Q4-base, pkgsrc-2009Q4, pkgsrc-2009Q3-base, pkgsrc-2009Q3
Changes since 1.7: +5 -5 lines
Diff to previous 1.7 (colored)

Update to 1.0.16. From the changelog:

* Fix cross-site scripting vulnerability in reporting of exceptions

Revision 1.7 / (download) - annotate - [select for diffs], Thu Aug 27 13:22:42 2009 UTC (14 years, 1 month ago) by schmonz
Branch: MAIN
Changes since 1.6: +5 -5 lines
Diff to previous 1.6 (colored)

Update to 1.0.15. From the changelog:

* omegascript.vim: The list of OmegaScript commands in the vim mode was rather
  out of date, and a few commands were misclassified.  Fix both problems and
  avoid future recurrences by automatically generating those lists from the
  command list in

* omegascript.html: Document that $date uses UTC.  (ticket#314)

* query: Link to "" rather than "".
* inc/toptermsjs: Use double-quotes rather than single quotes for parameter
  values on the <script> tag.

* omindex: Implement correct handling of paths when calling external filter
  programs on Microsoft Windows.

Revision 1.6 / (download) - annotate - [select for diffs], Thu Jul 23 19:27:21 2009 UTC (14 years, 2 months ago) by schmonz
Branch: MAIN
Changes since 1.5: +4 -4 lines
Diff to previous 1.5 (colored)

Update to 1.0.14:

* omindex: Make sure that output is flushed after every message, not just after
  some of them.

* Avoid infinite loop in omindex and scriptindex when reading files under
  Cygwin with automatic end of line translation enabled.  This same bug can
  also manifest on Unix platforms if the file is truncated by another process
  while being read.

Revision 1.5 / (download) - annotate - [select for diffs], Sat Jul 18 22:28:28 2009 UTC (14 years, 2 months ago) by schmonz
Branch: MAIN
Changes since 1.4: +5 -5 lines
Diff to previous 1.4 (colored)

Update to 1.0.13. From the changelog:

* omindex:
  + If the filter program needed for a file format isn't installed, report this
    explicitly when skipping subsequent files with the extension instead of
    misleadingly reporting "Unknown extension".
  + Make -s actually work as a short-form for --stemmer (as documented by
    "omindex --help" and "man omindex").
  + Drop the copyright info from the output of --version as it's perennially
    out of date and we don't report it for any other Xapian programs.
* scriptindex:
  + Add new "valuenumeric" action to add a document value using
    Xapian::sortable_serialise() to allow numeric sorting (ticket#260).

Revision 1.4 / (download) - annotate - [select for diffs], Mon Apr 20 22:25:38 2009 UTC (14 years, 5 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2009Q2-base, pkgsrc-2009Q2
Changes since 1.3: +7 -8 lines
Diff to previous 1.3 (colored)

Update to 1.0.12. From the changelog:

* $log now retries a partial write, or one interrupted by a system call.
* cgiparams.html: Note the technique of using a stub database file to allow a
  default of searching over multiple databases.
* omindex:
  + Add support for indexing Microsoft Office 2007 formats and XPS files
  + Fix the extraction of metadata from OpenDocument formats.
  + Fix "-l" which would previously always cause a segmentation fault if used
    ("--depth-limit" wasn't affected).
* Fix to compile when RLIMIT_AS isn't available (as on NetBSD and OpenBSD).
  Instead use RLIMIT_VMEM or RLIMIT_DATA if either is available, else don't try
  to limit the memory the filter process can use.

Revision 1.3 / (download) - annotate - [select for diffs], Wed Jan 7 22:40:14 2009 UTC (14 years, 8 months ago) by wiz
Branch: MAIN
CVS Tags: pkgsrc-2009Q1-base, pkgsrc-2009Q1
Changes since 1.2: +5 -5 lines
Diff to previous 1.2 (colored)

Update to 1.0.10:

Omega 1.0.10 (2008-12-23):

build system:

* This release now uses newer versions of the autotools (autoconf 2.62 ->
  2.63; automake 1.10.1 -> 1.10.2).  The newer autoconf fixes a regression
  in autoconf 2.62 (and so Omega 1.0.7) with detecting the endian-ness of some

Omega 1.0.9 (2008-10-31):


* docs/overview.html: Document HTML parsing a bit, including robots
  meta and htdig_noindex.


* omega: Catch std::exception and report what its what() method returns.

* omega: Remove undocumented and non-functional support for numeric sorting
  via CGI parameter SORT=#<slot> (SORT=<slot> works as before).

build system:

* configure: Sync warning flag handling changes from xapian-core to eliminate
  many warnings from GCC 4.3.

Omega 1.0.8 (2008-09-04):


* Fix a few typos and improve wording in a few places.


* omindex:

  + If the character encoding is specified using <meta http-equiv=...> in an
    HTML document then reparse the document if it isn't the encoding we're
    already using so that any preceding <title> is converted correctly

  + Convert text from meta tag parameters to UTF-8 (bug#293).

  + Handle <meta charset="..."> (new in HTML 5).

  + Fix bug in HTML tag parameter parsing which was probably just a small
    performance penalty in real world cases, but could perhaps result in
    parsing bogus extra parameters in carefully contrived situations.


* Add missing <signal.h>, noted on FreeBSD by Henrik Brix Andersen.

Revision 1.2 / (download) - annotate - [select for diffs], Sun Jul 27 04:06:00 2008 UTC (15 years, 2 months ago) by schmonz
Branch: MAIN
CVS Tags: pkgsrc-2008Q4-base, pkgsrc-2008Q4, pkgsrc-2008Q3-base, pkgsrc-2008Q3, cube-native-xorg-base, cube-native-xorg
Changes since 1.1: +5 -1 lines
Diff to previous 1.1 (colored)

Fix build on NetBSD (4.0, at least): include <signal.h> and avoid
RLIMIT_AS on systems without it. Also fix path to Perl interpreter
in installed scripts, and as a result, bump PKGREVISION.

Revision / (download) - annotate - [select for diffs] (vendor branch), Sat Jul 26 23:37:29 2008 UTC (15 years, 2 months ago) by schmonz
Branch: TNF
CVS Tags: pkgsrc-base
Changes since 1.1: +0 -0 lines
Diff to previous 1.1 (colored)

Initial import of Omega, which operates on a set of Xapian databases.
Each database is created and updated separately using either omindex
or scriptindex. You can search these databases (or any other Xapian
database with suitable contents) via a web front-end provided by
omega, a CGI application.  A search can also be done over more than
one database at once.

Revision 1.1 / (download) - annotate - [select for diffs], Sat Jul 26 23:37:29 2008 UTC (15 years, 2 months ago) by schmonz
Branch: MAIN

Initial revision

This form allows you to request diff's between any two revisions of a file. You may select a symbolic revision name using the selection box or you may type in a numeric name using the type-in text box.

CVSweb <>