Saturday, 15 September 2012

Expanding emoticons, URLs and other DOM manipulations

Introduction

Users of IM applications are used to be able to click links in conversations and see little smiley faces instead of strings like :) or :(. This all has to happen regardless that messages are passed (at least in Gadu-Gadu) only with bold/italics/underline formatting. So this is our job to find all links and emoticons in text and change them.

DOM processing using Visitor pattern

The best way do to that is to get through DOM tree of message (converting bold/italics/underline formatting to proper HTML DOM is not that hard) and do everything on DomText nodes. Unfortunately Qt does not give an easy way to do that - one  have to traverse DOM manually. So we have wrote DomVisitor and DomProcessor classes that allows us to manipulate DOM trees with ease. DomProcessor acts as acceptor of visitors for QDomDocument with possibility to add, remove or update nodes is DOM tree.

Implemented DomVisitor subclasses are:
  • IgnoreLinksDomVisitor - very important proxy visitor that disables processing of all content under a tags - we do not want to expand emoticons inside links after all (and :/ is very popular)
  • EmoticonExpander - expands emoticons into images
  • DomTextRegexpVisitors - expands all matches of regexp expressions in DomText nodes in whole document into list of nodes (that requires another subclass)
So all we need to do is to parse message into DomDocument, wrap it into DomProcessor and run every required DomVisitor.

Collapsing emoticons

Users also want to be able to copy content of message in plain-text format, with textual representation of emoticons. It would be nice to be able to run our inverted DomVisitor instances (like EmoticonCollapser) on text copied from QWebView. Unfortunately it is not possible, as many Adium styles use files that are not valid XML (just HTML). It is not possible to convert copied content to valid XML, so we have to use regular expression to extract emoticon and link content.

For some time we were considering using non-visible QWebView and manipulate QWebElement objects instead of QDomNode, but this seems a bit overkill. We also do not want to force our users to use only fully-XML compilant Adium styles to make copied text always valid XML... I would be very happy to hear about working Qt-based HTML to XML converter.

Code

Code is available at our Gitorious repository. Feel free to download it!