A collection of python tools useful in an Internet context.
In particular these functions are of interest:
- Managing headless chrome instance to model web-browsing.
- Parallel traceroute in python (as of now not compatible with Windows due to WinAPI issues)
- Graph / Network generator of networks captured with either the traceroute or the web-browser simulation
Depends heavily on other python modules and a contains a modified version of py-traceroute (https://github.com/dnaeon/pytraceroute).
And a word of warning, this repository exists to show code used in my thesis, not primarily so others can reuse this. The code quality as awful, use at your own risk :-)
If you use this in your research, please cite (if possible):
- Lindeberg, Fredrik. “Coordinating the Internet: Thought Styles, Technology and Coordination.” PhD of Technology, Linköping University, 2021. https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-173713
bibtex:
@thesis{LindebergCoordinating2021,
title = {Coordinating the {{Internet}}: {{Thought}} Styles, Technology and Coordination},
shorttitle = {Coordinating the {{Internet}}},
author = {Lindeberg, Fredrik},
date = {2021},
institution = {{Link\"oping University}},
url = {https://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-161812},
langid = {english},
type = {PhD of Technology}
}
An example of visit to four Swedish newssites.
First off, we need all dependencies
This project is dependent on chrome-har-capturer
(see https://github.com/cyrus-and/chrome-har-capturer) to capture hars with the help of Google Chrome / Chromium.
Chrome or Chromium is easily installed by either your package manager or by googling a download link. As long as the version is relatively modern there should be no issues.
chrome-har-capturer
is installed via npm like this (--global
is needed for the utility to be on path
):
npm install --global chrome-har-capturer
If you are missing npm
head over to https://nodejs.org/en/download/ and download and install npm
/ nodejs
.
Graphviz is the most decent graph generator I have found, as such I really suggest it. Graphviz is available on most linux distributions and in brew
for OSX.
You also need som python modules, and of course python (3+) itself. Python is most easily downloaded from their homepage (https://www.python.org/downloads/) or your package manager of choice.
When you have pip(3) installed, run the following:
pip3 install -r requirements.txt
Or manually:
pip3 install networkx matplotlib splinter bs4 pyasn dnspython pycountry elevate pandas cleanco string_grouper pygraphviz requests
Depending on system pip3 might not be aliased and instead pip should be used. Also some systems might require pip3 to be run as root (or use the --user
flag to do a user install).
This repository contains tools usable in general python projects, such as (simple and task specific) har-parsing, parallellized UDP traceroute, and a tool for generating graphs of said traceroutes.
See src/
directory, and in particular generatemap.py
, parallelltracert.py
, harutilities.py
and internetgraph.py
. The har generation is done by generatehar.sh
, a bash-script, which handles input / output and invokes chrome-har-capturer
with decent arguments (i.e. timeouts).
The most interesting tool to use quickly is the generate_map.py
, which generates a map / graph from either a list of urls or a set of har-files. Due to the nature of the traceroute (modifying packets at low level) root-acccess is required. On most systems this can be attained with sudo
. This meanst that you should make sure that you understand what the Python-script does before running it).
Some example usages:
## Generatemap will always ask for priv-escalation unless found
## Visit New York Times and CNN, and then show the graph (-w for website)
python3 generatemap.py -w www.nytimes.com www.cnn.com
## Draw a single graph based on a har (or set of hars) and put the graphs in "myoutputfolder"
python3 generatemap.py -e inputfolder/some_data.har someotherfolder/some_data2.har -o myoutputfolder/
## Draw separate graphs based on urls quietly (-s or --separate to do individual runs, -q or --quiet for no output)
python3 generatemap.py -s -q -w thesun.co.uk nytimes.com cnn.com -o testingsep