|
- Method resolution order:
- Parser
- html.parser.HTMLParser
- _markupbase.ParserBase
- builtins.object
Methods defined here:
- error(self, message)
- handle_data(self, data)
- Metoda beleži pronađene reči
Poziv metode vrši se implicitno prilikom nailaska na sadržaj
HTML elemenata. Sadržaj elementa se deli u reči koje se beleže
u odgovarajuću listu.
Argument:
- `data`: dobijeni sadržaj elementa
- handle_starttag(self, tag, attrs)
- Metoda beleži sadržaj href atributa
Poziv metode vrši se implicitno prilikom nailaska na tag
unutar HTML fajla. Ukoliko je u pitanju anchor tag, beleži
se vrednost href atributa.
Argumenti:
- `tag`: naziv taga
- `attrs`: lista atributa
- parse(self, path)
- Metoda učitava sadržaj fajla i prosleđuje ga parseru
Argument:
- `path`: putanja do fajla
Methods inherited from html.parser.HTMLParser:
- __init__(self, *, convert_charrefs=True)
- Initialize and reset this instance.
If convert_charrefs is True (the default), all character references
are automatically converted to the corresponding Unicode characters.
- check_for_whole_start_tag(self, i)
- # Internal -- check to see if we have a complete starttag; return end
# or -1 if incomplete.
- clear_cdata_mode(self)
- close(self)
- Handle any buffered data.
- feed(self, data)
- Feed data to the parser.
Call this as often as you want, with as little or as much text
as you want (may include '\n').
- get_starttag_text(self)
- Return full source of start tag: '<...>'.
- goahead(self, end)
- # Internal -- handle data as far as reasonable. May leave state
# and data to be processed by a subsequent call. If 'end' is
# true, force handling all data as if followed by EOF marker.
- handle_charref(self, name)
- # Overridable -- handle character reference
- handle_comment(self, data)
- # Overridable -- handle comment
- handle_decl(self, decl)
- # Overridable -- handle declaration
- handle_endtag(self, tag)
- # Overridable -- handle end tag
- handle_entityref(self, name)
- # Overridable -- handle entity reference
- handle_pi(self, data)
- # Overridable -- handle processing instruction
- handle_startendtag(self, tag, attrs)
- # Overridable -- finish processing of start+end tag: <tag.../>
- parse_bogus_comment(self, i, report=1)
- # Internal -- parse bogus comment, return length or -1 if not terminated
# see http://www.w3.org/TR/html5/tokenization.html#bogus-comment-state
- parse_endtag(self, i)
- # Internal -- parse endtag, return end or -1 if incomplete
- parse_html_declaration(self, i)
- # Internal -- parse html declarations, return length or -1 if not terminated
# See w3.org/TR/html5/tokenization.html#markup-declaration-open-state
# See also parse_declaration in _markupbase
- parse_pi(self, i)
- # Internal -- parse processing instr, return end or -1 if not terminated
- parse_starttag(self, i)
- # Internal -- handle starttag, return end or -1 if not terminated
- reset(self)
- Reset this instance. Loses all unprocessed data.
- set_cdata_mode(self, elem)
- unescape(self, s)
- # Internal -- helper to remove special character quoting
- unknown_decl(self, data)
Data and other attributes inherited from html.parser.HTMLParser:
- CDATA_CONTENT_ELEMENTS = ('script', 'style')
Methods inherited from _markupbase.ParserBase:
- getpos(self)
- Return current line number and offset.
- parse_comment(self, i, report=1)
- # Internal -- parse comment, return length or -1 if not terminated
- parse_declaration(self, i)
- # Internal -- parse declaration (for use by subclasses).
- parse_marked_section(self, i, report=1)
- # Internal -- parse a marked section
# Override this to handle MS-word extension syntax <![if word]>content<![endif]>
- updatepos(self, i, j)
- # Internal -- update line number and offset. This should be
# called for each piece of data exactly once, in order -- in other
# words the concatenation of all the input strings to this
# function should be exactly the entire input.
Data descriptors inherited from _markupbase.ParserBase:
- __dict__
- dictionary for instance variables (if defined)
- __weakref__
- list of weak references to the object (if defined)
|