Skip to content

twilio/cerebro

Repository files navigation

Cerebro

Cerebro finds secrets such as passwords, tokens, private keys and more in a Git repo.

Tech Stack

Cerebro requires:

  • Python 3.5
  • SQLite

Getting Started

Configure Target Repositories

Populate the targets.yaml file in the config directory using the example:

$ cp config/targets.example.yaml config/targets.yaml

Local Usage

Clone this repo and export the following environment variables:

  • CEREBRO_DATABASE_URL - full/path/to/sqlite/database/file

If you wish to receive Cerebro results in Slack, also configure:

  • SLACK_API_URL - Incoming web hooks endpoint from Slack
  • SLACK_CHANNEL_OR_USER - The @user or #channel to send scan notifications to

Set up the environment:

$ make local-install

Execute (or setup a cron job for the following code snippet):

$ python cerebro.py

or

$ make local-run

Run the tests:

$ make local-test

Docker-compose Usage

Copy the env-example file & edit it appropriately:

$ cp env-example to .env

Build the docker environment (it will use Ubuntu-latest)

$ make docker-build

Run the tests:

$ make docker-tests

Run cerebro:

$ make docker-run

Tests

Execute

$ pytest -sv tests/

Reporting

A summary of results is provided in JSON format by default or can be provided via Slack, while detailed results can be reviewed directly in SQLite or [Todo - Add the url of the cerebro dashboard once we have a box configured for it]. Alternatively, results can be viewed directly in SQLite.

Definitions

These definitions describe how raw data is processed and stored:

  • BLOCK_SIZE - this is the size for any contiguous set of characters (i.e. BASE64 or HEXADECIMAL) searched for in the codebase entropy. Default is 20
  • TOKENS - a BLOCK_SIZE of characters that were matched during the scan process
  • BLOBS - represents portions of a file containing a TOKEN

Design Notes

There are 3 high-level components involved in the operation of cerebro, they are:

  • Git Level Operations
    • Pulling the latest commit of the master branch from each repo in targets.yaml, checking for diffs in repo if repo had been previously scanned (i.e. pulled) and creating sub-directories with "diffed" content (i.e. stored in workspace/diffs) for subsequent scanning.
  • Operating System Level Operations:
    • targets.yaml: a list of repos for cerebro to scan.
    • bad_patterns.txt: a list of regexes used by egrep.
    • egrep : performs recursive regex grepping for each repo from targets.yaml using patterns from bad_patterns.txt.
  • Python Level Operations:
    • Each matched string is tested for entropy using Shannon's algorithm, the basic concept of which is - a BLOCK_SIZE of BASE64 characters with an entropy greater than 4.5 or BLOCK_SIZE of HEXADECIMAL characters with entropy greater than 3.0 is flagged as a TOKEN.
    • For config files however (i.e. .conf, .yaml, .ini, .erb, .rb), we set the BLOCK_SIZE to 6, which ensures that smaller chunks of tokens with sufficient entropy are matched
    • These results are then further filtered by options set in the main.yaml configuration file e.g. excluding test or 3rd-party library framework directories and or specific files from the search.

About

No description, website, or topics provided.

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published