# LagartoParser

**LagartoParser** is an *event-based* HTML parser. It processes the input and emits events as they are parsed; using a [visitor pattern](https://en.wikipedia.org/wiki/Visitor_pattern).  This makes parsing very fast and memory-usage is minimal. However, sometimes event-based parsing can be tedious; in that case, try **LagartoDom** parser instead.&#x20;

Let's see it in action:

```java
LagartoParser lagartoParser = new LagartoParser("<html><h1>Hello</h1></html>");

TagVisitor tagVisitor = new EmptyTagVisitor() {
    @Override
    public void tag(final Tag tag) {
        if (tag.nameEquals("h1")) {
            System.out.println(tag.getName());
        }
    }
	
    @Override
    public void text(final CharSequence text) {
        System.out.println(text);
    }
};

lagartoParser.parse(tagVisitor);
```

As the input content is parsed, the callback methods in the visitor get invoked. In this case, the result is:

```
h1
Hello
h1
```

Note that the `tag()` event was emitted twice: first for the open tag, and then for the close tag. In other words, **LagartoParser** performs the *tokenization* of the input HTML.

### Parsing specification

HTML parsing (i.e. tokenization) is done strictly by the official [HTML5 specification](https://html.spec.whatwg.org). Note the following:

* the text is emitted as a single block of text and not one by one character.
* the case of a tag name (and other tokens) is not changed when emitted.
* **LagartoParser** does only tokenization. The DOM tree is not created, neither validated.
* the script tag is emitted separately.
* Internet Explorer conditional comments are supported.
* XML is supported too.

{% hint style="warning" %}
**LagartoParser** only performs tokenization and it does not verify if tags make sense. For example, if your HTML has a non-closed tag, **LagartoParser** will not consider this as an error. **LagartoDom**, on the other hand, will handle these cases.
{% endhint %}

### Input types

**LagartoParser** accepts both `char[]` and `CharSequence`. This allows the usage of various implementations of inputs, including `String`, or even a `Reader`.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://lagarto.jodd.org/lagarto-parser/lagartoparser.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
