PHP Programming

Whilst creating this website, I realised that the mechanisms in place for saving an Microsoft Word document as HTML (filtered webpage) and then importing the source into the Jaws CMS caused quite a few issues. For example, the supposedly "filtered" saved output from MS Word is far from filtered and is likely to destroy any existing formatting that had been created by the CMS.

This got me thinking and through some simple code using a recently updated HTML Purifier, I managed to extract a filtered HTML document that was suitable for generating the articles you see on the right-hand side. Through the following code I have reduced the time it takes to convert a Word document to a fully formed XHTML document.

The PHP code is based around the PHP5 version of HTML Purifier 2.0:

if ($_POST['q']){
        $dirty_html = $_POST['q'];
        if (!$dirty_html) {
                echo ('You must write some HTML!');
        } else {
                $config = HTMLPurifier_Config::createDefault();
                $config->set('Core', 'Encoding', 'ISO-8859-1');
                $config->set('HTML', 'Doctype', 'XHTML 1.0 Transitional');
                $config->set('HTML', 'TidyLevel', 'heavy');
                $config->set('Core','AcceptFullDocuments',true);
                $config->set('HTML', 'Allowed', 'a[href|title],em,p,blockquote,img');
               
                $purifier = new HTMLPurifier($config);
                $clean_html = $purifier->purify( $dirty_html );
                echo $clean_html;
        }
}
 

Access to the working demo will be available soon in a new "tools" area of the site. Of course the combination of PHP4 and PHP5 might prove interesting - maybe link to the development server might do face-smile.png

pdavies | PHP Programming | 23 June, 12:19pm | 1 comments

Category Cloud

Link Dump Tag Cloud