Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

High entropy string not being detected (by default) #323

Open
connorjburton opened this issue Jan 18, 2022 · 2 comments
Open

High entropy string not being detected (by default) #323

connorjburton opened this issue Jan 18, 2022 · 2 comments
Labels
enhancement New feature or request

Comments

@connorjburton
Copy link

connorjburton commented Jan 18, 2022

🐛 Bug Report

I was trying strings generated by this online tool and could not understand why it wasn't working. https://passwordsgenerator.net/

For example, this string qdKy]^$6bc!hU}Z;VRURf=j9F-}6~Xh~ykM.bKxfHdw_z6=Ura2Z".?!U!;;Kxt!Rw(g>"/Fw~e%gF'4)=\[!c>jbB;Kg=8V%\ is not classed as high entropy to Tartufo

To Reproduce

  • Create a file in a folder
  • Run git init
  • Run git add .
  • Run git commit -m "initial commit"
  • Add a the high entropy string above to the file
  • Run git add .
  • Run docker run --rm -v "$(pwd):/git" godaddy/tartufo pre-commit
  • Does not report secrets detected

Expected Behavior

Should report strings like the above as high entropy by default

Code Example

Environment

Using Docker so environment isn't relevant.

@connorjburton connorjburton added the bug Something isn't working label Jan 18, 2022
@rbailey-godaddy
Copy link
Contributor

@cburton-godaddy, this is expected behavior for the current implementation.

The trick is that tartufo only looks at the entropy of certain strings, namely those that appear to be hexadecimal or base64 encodings. Because your sample string contains many punctuation characters that aren't used by any of those encodings, it is broken up into lots of tiny strings that might be valid base64 or hex encodings, but all of them are so short they are discarded without examination.

We've thought in the past about adding support for other less-common encodings (such as the password hashes typically found in /etc/shadow on Linux systems, as an example) but those would probably be easier to handle using regex detection (like GPG signatures, X.509 keys, etc.).

Similarly, if you wanted to spot sequences like your example above, instead of entropy I'd consider a regex that matches your alphabet (just about everything except whitespace, it appears) and alarm if it's detected (without considering its entropy). This is somewhat guided by how you might encounter such a string in your code -- would it be delimited (by quotation marks, etc.) or on a line by itself in input data (i.e. delimited by beginning-of-line and end-of-line)?

Performing unconstrained entropy detection would be a major change, because we wouldn't be able to spot high-entropy sequences without doing some sort of rolling scan (basically considering every substring of length >= 20 in the entire file).

Assuming you aren't just kicking tartufo to see how it screams :) I'd be interested in a use case or example that we could use as a basis for more concrete and considered evaluation of gaps in the present implementation.

In the meantime, I consider this report a bug only to the extent that it's a "documentation bug" -- the general behavior hasn't changed since before the truffleHog fork, but I reviewed the description on tartufo.readthedocs.io and I agree the paragraph on entropy scanning is misleadingly vague.

@connorjburton
Copy link
Author

Hi @rbailey-godaddy, thanks for the quick reply.

Your logic makes sense, after I posted this issue I discussed with team and I think we came to the same conclusion that there must be some heuristic you are using to find strings to analyze (the base64/hex encoding), otherwise you would have to do a rolling scan as you said, which I presume would be much more expensive.

As a consumer of the package, I would like to see Tartufo handle thsese strings, as they are high entropy and could 100% exist in codebases. However I understand your reasoning that this isn't a simple job and apprecicate the thought and work that has already been put into the package.

The reason I came across this as I was demonstrating our pre commit and build integrations and went to go grab a random high entropy string to test with and encountered this problem.

I also agree that updating the documentation would be great, as this tool was described to us (interally) as a solution to preventing secrets in a repo, a more in-depth description of how the tool works would have been greatly beneficial.

Thanks for your time writing this out! Feel free to close this issue if you think that is appropriate.

@sushantmimani sushantmimani added enhancement New feature or request and removed bug Something isn't working labels Jun 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants