Metadata-Version: 1.0
Name: encutils
Version: 0.8.3
Summary: Encoding detection collection for Python.
Home-page: http://cthedot.de/encutils/
Author: Christof Hoeke
Author-email: c@cthedot.de
License: encutils has a dual-license, please choose whatever you prefer:

    * encutils is published under the `LGPL 3 or later <http://cthedot.de/encutils/license/>`__
    * encutils is published under the
      `Creative Commons License <http://creativecommons.org/licenses/by/3.0/>`__.

    
Download-URL: http://cthedot.de/encutils/
Description: 
        ===================================================
        encutils - encoding detection collection for Python
        ===================================================
        
        encutils
        ========
        :Author: Christof Hoeke, see http://cthedot.de/encutils/
        :Copyright: 2005-2008: Christof Hoeke
        :License: encutils has a dual-license, please choose whatever you prefer:
        
        * encutils is published under the `LGPL 3 or later <http://cthedot.de/encutils/license/>`__
        * encutils is published under the
        `Creative Commons License <http://creativecommons.org/licenses/by/3.0/>`__.
        
        encutils is free software: you can redistribute it and/or modify
        it under the terms of the GNU Lesser General Public License as published by
        the Free Software Foundation, either version 3 of the License, or
        (at your option) any later version.
        
        encutils is distributed in the hope that it will be useful,
        but WITHOUT ANY WARRANTY; without even the implied warranty of
        MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
        GNU Lesser General Public License for more details.
        
        You should have received a copy of the GNU Lesser General Public License
        along with encutils.  If not, see <http://www.gnu.org/licenses/>.
        
        
        A collection of helper functions to detect encodings of text files (like HTML, XHTML, XML, CSS, etc.) retrieved via HTTP, file or string.
        
        ``getEncodingInfo`` is probably the main function of interest which uses
        other supplied functions itself and gathers all information together and
        supplies an ``EncodingInfo`` object with the following properties:
        
        - ``encoding``: The guessed encoding
        Encoding is the explicit or implicit encoding or None and
        always lowercase.
        
        - from HTTP response
        * ``http_encoding``
        * ``http_media_type``
        
        - from HTML <meta> element
        * ``meta_encoding``
        * ``meta_media_type``
        
        - from XML declaration
        * ``xml_encoding``
        
        example::
        
        >>> import encutils
        >>> info = encutils.getEncodingInfo(url='http://cthedot.de/encutils/')
        
        >>> print info  # = str(info)
        utf-8
        
        >>> info        # = repr(info)
        <encutils.EncodingInfo object encoding='utf-8' mismatch=False at 0xb86d30>
        
        >>> print info.logtext
        HTTP media_type: text/html
        HTTP encoding: utf-8
        HTML META media_type: text/html
        HTML META encoding: utf-8
        Encoding (probably): utf-8 (Mismatch: False)
        
        
        references
        ==========
        XML
        RFC 3023 (http://www.ietf.org/rfc/rfc3023.txt)
        
        easier explained in
        - http://feedparser.org/docs/advanced.html
        - http://www.xml.com/pub/a/2004/07/21/dive.html
        
        HTML
        http://www.w3.org/TR/REC-html40/charset.html#h-5.2.2
        
        TODO
        ====
        - parse @charset of HTML elements?
        - check for more texttypes if only text given
        
Keywords: encoding,i18n,xml,html,css
Platform: Python 2.3 and later.
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Web Environment
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: GNU Library or Lesser General Public License (LGPL)
Classifier: License :: Other/Proprietary License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Topic :: Internet
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Internationalization
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Text Processing :: Markup :: HTML
Classifier: Topic :: Text Processing :: Markup :: XML
