Package Info

python-pytidylib


Python wrapper for HTML Tidy (tidylib) on Python 2 and 3


Development/Languages/Python

PyTidyLib is a Python package that wraps the HTML Tidy library. This allows you, from Python code, to "fix" invalid (X)HTML markup. Some of the library's many capabilities include:

  • Clean up unclosed tags and unescaped characters such as ampersands
  • Output HTML 4 or XHTML, strict or transitional, and add missing doctypes
  • Convert named entities to numeric entities, which can then be used in XML documents without an HTML doctype.
  • Clean up HTML from programs such as Word (to an extent)
  • Indent the output, including proper (i.e. no) indenting for pre elements, which some (X)HTML indenting code overlooks.

Small example of use

The following code cleans up an invalid HTML document and sets an option::

from tidylib import tidy_document
document, errors = tidy_document('''<p>fõo <img src="bar.jpg">''',
  options={'numeric-entities':1})
print document
print errors

License: MIT
URL: http://countergram.com/open-source/pytidylib/

Categories

Releases

Package Version Update ID Released Package Hub Version Platforms Subpackages
0.2.4-bp150.2.4 info GA Release 2018-07-30 15
  • AArch64
  • ppc64le
  • s390x
  • x86-64
  • python-pytidylib
0.2.4-bp151.2.10 info GA Release 2019-05-18 15 SP1
  • AArch64
  • ppc64le
  • s390x
  • x86-64
  • python-pytidylib
0.2.4-bp151.3.1 info GA Release 2019-07-17 15 SP1
  • AArch64
  • ppc64le
  • s390x
  • x86-64
  • python-pytidylib
0.3.2-bp152.1.10 info GA Release 2020-04-17 15 SP2
  • AArch64
  • ppc64le
  • s390x
  • x86-64
  • python2-pytidylib
  • python3-pytidylib
0.3.2-bp153.1.17 info GA Release 2021-03-06 15 SP3
  • AArch64
  • ppc64le
  • s390x
  • x86-64
  • python2-pytidylib
  • python3-pytidylib