Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External entities can cause officedissector to freeze #13

Closed
dputtick opened this issue Jul 31, 2017 · 3 comments
Closed

External entities can cause officedissector to freeze #13

dputtick opened this issue Jul 31, 2017 · 3 comments

Comments

@dputtick
Copy link
Contributor

Hi there - currently, officedissector is vulnerable to a specific type of denial of service using external entitites. For example, an office document containing an external entity linking to /dev/random will wait for /dev/random to return a character, causing officedissector to hang without returning an error or timing out. Some possible solutions:

  • Add a timeout for xml parsing. Unfortunately, it's tough to design a timeout that works well for every use case.
  • Turn off parsing external entities in lxml. This is an easy switch to flip, however, I'm not sure whether there might sometimes be a valid reason for an office document to contain external entitites.
  • Avoid parsing external entities, but list them. This seems like the best solution, because it prevents the dos attack and allows the user to identify documents that might be using the attack with malicious intent. I spent some time looking through the lxml API docs and couldn't find any hints about how to implement this. It looks like some of the standard library xml parsers might offer some solutions, but perhaps at the expense of performance, or requiring significant changes to officedissector.

Any thoughts? I would be interested in helping with the patch, but wanted to get your opinion first.

@naegelejd
Copy link
Contributor

Thanks for the write-up @dputtick. I like the second and third options. We should initially disable parsing of external entities in lxml (or maybe make it configurable). I like the idea of listing external entities as they are encountered, but if implementing it would require a ton of work then it may not be worthwhile right now.

If you get a chance, could you try disabling parsing external entities and see how officedissector handles the included tests?

@dputtick
Copy link
Contributor Author

dputtick commented Oct 3, 2017

@naegelejd Sorry I'm late to replying! I wrote a small POC for that change last month, and it passed all of the existing tests. I know that external entities are valid xml, but I'm not sure if a valid office document would ever have them.

@naegelejd
Copy link
Contributor

I'm fine with disallowing external entities in OfficeDissector. If you want to share your POC as a pull request I'd be happy to merge it. Thanks for testing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants