dev@glassfish.java.net

Re: HTML parser

From: Lloyd L Chambers <Lloyd.Chambers_at_Sun.COM>
Date: Thu, 22 May 2008 13:08:18 -0700

Koshuke,

Thanks. I've taken a very quick look...do you have any sample code?

For now, my main need is to find structures of this form within a
blog, so as to be able to generate an RSS feed and/or a TOC.
<div class="blog-day">
    <div class="blog-date">
        <div class="blog-item-title">
           <p>blah blah blan</p>
           ...
        </div>
    </div>
</div>

The Swing parser was nice in that it had callbacks, perfect for what I
wanted, if you have the time, what's the equivalent with NekoHTML?

Lloyd

On May 21, 2008, at 9:25 AM, Kohsuke Kawaguchi wrote:

> Jason Lee wrote:
>> On Tue, May 20, 2008 at 12:22 PM, Lloyd L Chambers
>> <Lloyd.Chambers_at_sun.com> wrote:
>>> Anyone know of a good HTML (XHTML) parser?
>>>
>>> I've tried the JDK Swing parse, but it can't parse XHTML properly.
>> I've heard good things about NekoHTML
>> (http://sourceforge.net/projects/nekohtml), and it has had a release
>> in the last month or so. FWIW, HttpUnit uses this.
>
> +1. I've been using NekoHTML in most of the projects where I needed
> HTML parsing.
>
> --
> Kohsuke Kawaguchi
> Sun Microsystems kohsuke.kawaguchi_at_sun.com

---
Lloyd L Chambers
lloyd.chambers_at_sun.com
Sun Microsystems, Inc