One Hat Cyber Team
  • Dir : ~/proc/self/root/usr/share/doc/perl-HTML-Parser/
  • View File Name : Changes
    tag is seen. Unlike other literal elements, the text content is not 'cdata'. * The XML ' entity is decoded. It apos-char itself is still encoded as ' as ' is not really an HTML tag, and not recognized by many HTML browsers. 3.21 2001-04-10 * Fix a memory leak which occurred when using filter methods. * Avoid a few compiler warnings (DEC C): - Trailing comma found in enumerator list - "unsigned char" is not compatible with "const char". * Doc update. 3.20 2001-04-02 * Some minor documentation updates. 3.19_94 2001-03-30 * Implemented 'tag', 'line', 'column' argspecs. * HTML::PullParser doc update. eg/hform is an example of HTML::PullParser usage. 3.19_93 2001-03-27 * Shorten 'report_only_tags' to 'report_tags'. I think it reads better. * Bleadperl portability fixes. 3.19_92 2001-03-25 * HTML::HeadParser made more efficient by using 'ignore_elements'. * HTML::LinkExtor made more efficient by using 'report_only_tags'. * HTML::TokeParser generalized into HTML::PullParser. HTML::PullParser only support the get_token/unget_token interface of HTML::TokeParser, but is more flexible because the information that make up an token is customisable. HTML::TokeParser is made into an HTML::PullParser subclass. 3.19_91 2001-03-19 * Array references can be passed to the filter methods. Makes it easier to use them as constructor options. * Example programs updated to use filters. * Reset ignored_element state on EOF. * Documentation updates. * The netscape_buggy_comment() method now generates mandatory warning about its deprecation. 3.19_90 2001-03-13 * This is an developer only release. It contains some new experimental features. The interface to these might still change. * Implemented filters to reduce the numbers of callbacks generated: - $p->ignore_tags() - $p->report_only_tags() - $p->ignore_elements() * New @attr argspec. Less overhead than 'attr' and allow compatibility with XML::Parser style start events. * The whole argspec can be wrapped up in @{...} to signal flattening. Only makes a difference when the target is an array. 3.19 2001-03-09 * Avoid the entity2char global. That should make the module more thread safe. Patch by Gurusamy Sarathy . 3.18 2001-02-24 * There was a C++ style comment left in util.c. Strict C compilers do not like that kind of stuff. 3.17 2001-02-23 * The 3.16 release broke MULTIPLICITY builds. Fixed. 3.16 2001-02-22 * The unbroken_text option now works across ignored tags. * Fix casting of pointers on some 64 bit platforms. * Fix decoding of Unicode entities. Only optionally available for perl-5.7.0 or better. * Expose internal decode_entities() function at the Perl level. * Reindented some code. 3.15 2000-12-26 * HTML::TokeParser's get_tag() method now takes multiple tags to match. Hopefully the documentation is also a bit clearer. * #define PERL_NO_GET_CONTEXT: Should speed up things for thread enabled versions of perl. * Quote some more entities that also happens to be perl keywords. This avoids warnings on perl-5.004. * Unicode entities only triggered for perl-5.7.0 or higher. 3.14 2000-12-03 * If a handler triggered by flushing text at eof called the eof method then infinite recursion occurred. Fixed. Bug discovered by Jonathan Stowe . * Allow to be parsed as declaration. 3.13 2000-09-17 * Experimental support for decoding of Unicode entities. 3.12 2000-09-14 * Some tweaks to get it to compile with "Optimierender Microsoft (R) 32-Bit C/C++-Compiler, Version 12.00.8168, fuer x86." Patch by Matthias Waldorf . * HTML::Entities documentation spelling patch by David Dyck . 3.11 2000-08-22 * HTML::LinkExtor and eg/hrefsub now obtain %linkElements from the HTML::Tagset module. 3.10 2000-06-29 * Avoid core dump when stack gets relocated as the result of text handler invocation while $p->unbroken_text is enabled. Needed to refresh the stack pointer. 3.09 2000-06-28 * Avoid core dump if somebody clobbers the aliased $self argument of a handler. * HTML::TokeParser documentation update suggested by Paul Makepeace . 3.08 2000-05-23 * Fix core dump for large start tags. Bug spotted by Alexander Fraser * Added yet another example program: eg/hanchors * Typo fix by Jamie McCarthy 3.07 2000-03-20 * Fix perl5.004 builds (was broken in 3.06) * Declaration parsing mode now only triggers for and . Based on patch by la mouton . 3.06 2000-03-06 * Multi-threading/MULTIPLICITY compilation fix. Both Doug MacEachern and Matthias Urlichs provided a patch. * Avoid some "statement not reached" warnings from picky compilers. * Remove final commas in enums as ANSI C does not allow them and some compilers actually care. Patch by James Walden * Added eg/htextsub example program. 3.05 2000-01-22 * Implemented $p->unbroken_text option * Don't parse content of certain HTML elements as CDATA when xml_mode is enabled. * Offset was reported with wrong sign for text at end of chunk. 3.04 2000-01-15 * Backed out 3.03-patch that checked for legal handler and attribute names in the HTML::Parser constructor. * Documentation typo fixed by Michael. 3.03 2000-01-14 * We did not get out of comment mode for comments ending with an odd number of "-" before ">". Patch by la mouton * Documentation patch by Michael. 3.02 1999-12-21 * Hide ~-magic IV-pointer to 'struct p_state' behind a reference. This allow copying of the internal _hparser_xs_state element, and will make HTML-Tree-0.61 work again. * Introduced $p->init() which might be useful for subclasses that only want the initialization part of the constructor. * Filled out DIAGNOSTICS section of the HTML::Parser POD. 3.01 1999-12-19 * Rely on ~-magic instead of a DESTROY method to deallocate the internal 'struct p_state'. This avoid memory leaks when people simply wipe of the content of the object hash. * One of the assertion in hparser.c had opposite logic. This made the parser fail when compiled with a -DDEBUGGING perl. * Don't assume any specific order of hash keys in the t/cases.t. This test failed with some newer development releases of perl. 3.00 1999-12-14 * Documentation update (most of it from Michael) * Minor patch to eg/hstrip so that it use a "" handler instead of &ignore. * Test suite patches from Michael 2.99_96 1999-12-13 * Patches from Michael: - A handler of "" means that the event will be ignored. More efficient than using 'sub {}' as handler. - Don't use a perl hash for looking up argspec keywords. - Documentation tweaks. 2.99_95 1999-12-09 * (this is a 3.00 candidate) * Fixed core dump when "<" was followed by an 8-bit character. Spotted and test case provided by Doug MacEachern. Doug had been running HTML-Parser-XS through more that 1 million urls that had been downloaded via LWP. * Handlers can now invoke $p->eof to request the parsing to terminate. HTML::HeadParser has been simplified by taking advantage of this. Also added a title-extraction example that uses this. * Michael once again fixed my bad English in the HTML::Parser documentation. * netscape_buggy_comment will carp instead of warn * updated TODO/README * Documented that HTML::Filter is depreciated. * Made backslash reserved in literal argspec strings. * Added several new test scripts. 2.99_94 1999-12-08 * (should almost be a 3.00 candidate) * Renamed 'cdata_flag' as 'is_cdata'. * Dropped support for wrapping callback handler and argspec in an array and passing a reference to $p->handler. It created ambiguities when you want to pass a array as handler destination and not update argspec. The wrapping for constructor arguments are unchanged. * Reworked the documentation after updates from Michael. * Simplified internal check_handler(). It should probably simply be inlined in handler() again. * Added argspec 'length' and 'undef' * Fix statement-less label. Fix suggested by Matthew Langford . * Added two more example programs: eg/hstrip and eg/htext. * Various minor patches from Michael. 2.99_93 1999-12-07 * Documentation update * $p->bool_attr_value renamed as $p->boolean_attribute_value * Internal renaming: attrspec --> argspec * Introduced internal 'enum argcode' in hparser.c * Added eg/hrefsub 2.99_92 1999-12-05 * More documentation patches from Michael * Renamed 'token1' as 'token0' as suggested by Michael * For artificial end tags we now report 'tokens', but not 'tokenpos'. * Boolean attribute values show up as (0, 0) in 'tokenpos' now. * If $p->bool_attr_value is set it will influence 'tokens' * Fix for core dump when parsing when $p->strict_names(0). Based on fix by Michael. * Will av_extend() the tokens/tokenspos arrays. * New test suite script by Michael: t/attrspec.t 2.99_91 1999-12-04 * Implemented attrspec 'offset' * Documentation patch from Michael * Some more cleanup/updated TODO 2.99_90 1999-12-03 * (first beta for 3.00) * Using "realloc" as a parameter name in grow_tokens created problems for some people. Fix by Paul Schinder * Patch by Michael that makes array handler destinations really work. * Patch by Michael that make HTML::TokeParser use this. This gave a a speedup of about 80%. * Patch by Michael that makes t/cases into a real test. * Small HTML::Parser documentation patch by Michael. * Renamed attrspec 'origtext' to 'text' and 'decoded_text' to 'dtext' * Split up Parser.xs. Moved stuff into hparser.c and util.c * Dropped html_ prefix from internal parser functions. * Renamed internal function html_handle() as report_event(). 2.99_17 1999-12-02 * HTML::Parser documentation patch from Michael. * Fix memory leaks in html_handler() * Patch that makes an array legal as handler destination. Also from Michael. * The end of marked sections does not eat successive newline any more. * The artificial end event for empty tag in xml_mode did not report an empty origtext. * New constructor option: 'api_version' 2.99_16 1999-12-01 * Support "event" in argspec. It expands to the name of the handler (minus "default"). * Fix core dump for large start tags. The tokens_grow() routine needed an adjustment. Added test for this; t/largstags.t. 2.99_15 1999-11-30 * Major restructuring/simplification of callback interface based on initial work by Michael. The main news is that you now need to tell what arguments you want to be provided to your callbacks. * The following parser options has been eliminated: $p->decode_text_entities $p->keep_case $p->v2_compat $p->pass_self $p->attr_pos 2.99_14 1999-11-26 * Documentation update by Michael A. Chase. * Fix for declaration parsing by Michael A. Chase. * Workaround for perl5.004_05 bug. Can't return &PL_sv_undef. 2.99_13 1999-11-22 * New Parser.pm POD based on initial work by Michael A. Chase. All new features should now be described. * $p->callback(start => undef) will not reset the callback. * $p->xml_mode() did not parse attributes correct because HCTYPE_NOT_SPACE_EQ_SLASH_GT flag was never set. * A few more tests. 2.99_12 1999-11-18 * Implemented $p->attr_pos attribute. This causes attr positions within $origtext of the start tag to be reported instead of the attribute values. The positions are reported as 4 numbers; end of previous attr, start of this attr, start of attr value, and end of attr. This should make substr() manipulations of $origtext easy. * Implemented $p->unbroken_text attribute. This makes sure that text segments are never broken and given back as separate text callbacks. It delays text callbacks until some other markup has been recognized. * More English corrections by Michael A. Chase. * HTML::LinkExtor now recognizes even more URI attributes as suggested by Sean M. Burke * Completed marked sections support. It is also now a compile time decision if you want this supported or not. The only drawback of enabling it should be a possible parsing speed reduction. I have not measured this yet. * The keys for callbacks initialized in the constructor are now suffixed with "_cb". * Renamed $p->pass_cbdata to $p->pass_self. * Added magic number to the p_state struct. 2.99_11 1999-11-17 * Don't leak $@ modifications from HTML::Parser constructor. * Included HTML::Parser POD. * Marked sections almost work. CDATA and RCDATA should work. * For tags that take us into literal_mode;