Home Cyber Crime ‘LEXSS’ injection: How to bypass lexical parsers by abusing HTML parsing logic

‘LEXSS’ injection: How to bypass lexical parsers by abusing HTML parsing logic


Adam Bannister

24 June 2021 at 15:29 UTC

Up to date: 24 June 2021 at 15:34 UTC

Researcher digs deeper into approach that uncovered flaws in well-liked WYSIWYG HTML textual content editors

Image: PortSwigger Lt

A safety researcher has penned a deep dive on bypassing lexical parsers with particular HTML tags that leverage HTML parsing logic to finally execute arbitrary JavaScript code.

Chris Davis, safety advisor at Bishop Fox, has beforehand deployed the hacking approach to unearth excessive danger cross-site scripting (XSS) vulnerabilities in two well-liked What-You-See-Is-What-You-Get (WYSIWYG) HTML textual content editors.

The issues in TinyMCE (disclosed in August 2020) and Froala (disclosed earlier this month) affected a mixed 700,000 websites that included the purposes.

What’s lexical parsing?

“Lexical parsing is a really subtle manner of stopping XSS as a result of it evaluates whether or not the info is directions or plaintext earlier than performing extra logic resembling blocking or encoding the info,” says Davis in his technical write-up.

It separates “consumer knowledge (i.e., non-dangerous textual content material) from laptop directions (i.e., JavaScript and sure harmful HTML tags)”, he continues. “In situations the place the consumer is allowed a subset of HTML by design, this sort of parsing can be utilized to find out what’s allowed content material and what shall be blocked or sanitized.”

In addition to WYSIWYG HTML editors, lexical sanitizing parsers are extensively used to guard rich-text editors, e mail shoppers, and sanitization libraries resembling DOMPurify from XSS assaults.

Read more about the latest hacking techniques

Nevertheless, Davis demonstrates how lexical parsers may be tricked into viewing harmful content material “as textual content knowledge and never laptop directions”.

That is attainable as a result of “HTML shouldn’t be designed to be parsed twice; slight variations in parsing can happen between the preliminary HTML parser and the sanitizing parser; and sanitizing parsers usually implement their very own processing logic”.

Context states and namespace confusion

Key to the research are context states: knowledge state classes into which HTML parts are sorted by the HTML parser throughout tokenization. “Totally different equipped parts alter how knowledge in these parts is parsed and rendered by switching the context state of the info,” mentioned Davis.

The researcher’s ‘LEXSS’ approach additionally exploits namespace confusion, an space of analysis impressively furthered by Michał Bentkowski’s DOMPurify bypass in 2020. “HTML parser will context change to separate namespaces when it encounters MathML or SVG parts, which can be utilized to confuse the parser,” mentioned Davis.

Conceptualizing XSS danger

The potential affect of XSS assaults varies by context.

“In lots of circumstances the chance shall be nominal and in others catastrophic,” Chris Davis tells The Day by day Swig. In essentially the most extreme circumstances, XSS might be exploited “to do issues like switch of funds, execution of monetary securities trades or exfiltration of high secret knowledge”.

“One method to conceptualize the chance of XSS is to think about whenever you’re at any web site, what may an attacker do in the event that they managed your actions? As XSS permits that degree of management inside a website’s origin, usually unbeknownst to the consumer.”


As for preventative steps, “when implementing purposes that enable some user-controlled HTML by design”, builders ought to “course of the HTML as near the unique parse as attainable”, explains Davis.

“For organizations that aren’t creating a majority of these options however fairly together with them of their purposes, a very good patch coverage will go a great distance in stopping exploitation.”

Organizations must also “think about implementing a content security policy (CSP) into the appliance” to “block JavaScript injection at a browser-defined degree”.

Future analysis

Requested why he pursued this analysis avenue, Davis tells The Day by day Swig: “Any such context state parsing based mostly analysis is so widespread but comparatively uncovered.

“So getting a greater understanding of how HTML generally is parsed and the way rich-text type editors or sanitization libraries then parse that knowledge and the way we will exploit that data was, to me, fascinating.”

He provides that he expects related flaws to floor in “some actually impactful targets” resembling e mail shoppers, and that digging additional into HTML parsing is also fruitful.

“I actually hope this work aids different researchers in taking it to the following degree,” he concludes.

RECOMMENDED Misconfigurations in most Active Directory environments create serious security holes, researchers find

Source link