This tools puprose is to scan current jobs in TripleO CI and report about their failure reasons. In addition it could be used for tracking infra issues, bugs, etc.
Based on a clean Fedora 24 cloud image
STEPS::
useradd sova
update sudoers w/ %sova ALL=(ALL) NOPASSWD: ALL
su - sova
sudo yum install -y \
git \
python3-virtualenv \
git \
gcc \
python-devel \
libyaml \
libffi* \
libxml2* \
libxslt* \
python-lxml \
libxslt-python \
redhat-rpm-config \
openssl-devel
git clone https://github.com/sshnaidm/sova.git
cd sova
virtualenv-3 ~/virtenv
source ~/virtenv/bin/activate
pip install -r requirements.txt
# setup gerrit authentication
# contact #oooq on freenode to get the required ssh key, robi_id_rsa
chmod 600 /tmp/robi_id_rsa
cp /tmp/robi_id_rsa ~/sova/
python manual.py
# wait for the data to be populated
# update the flask config to listen on all ports
s/app.run()/app.run(host='0.0.0.0')/ on flaskapp.py
python flaskapp.py
# now point your browser at <ip>:5000 and you should have the equivalent of https://status-tripleoci.rhcloud.com/
There are two pages in TripleO CI now: https://tripleo.org/cistatus.html and https://tripleo.org/cistatus-periodic.html (doesn't work now) which shows current patch and periodic jobs and their status - failed or succeeded.
The scripts that compose these pages is in tripleo-ci repository: https://github.com/openstack-infra/tripleo-ci/blob/master/scripts/tripleo-jobs-gerrit.py and https://github.com/openstack-infra/tripleo-ci/blob/master/scripts/tripleo-jobs.py
First one is working right now when upstream Jenkins are not available (security issues), the second will work when upstream Jenkins will be accessible again.
First script just go over all upstream Jenkinses and requests it for all tripleo jobs (by names), then reads their status and shows on the page. The second one connects to upstream gerrit by ssh and gets patches data from it. It asks for patches from a few projects, which run tripleo jobs (but not all). Periodic jobs are not tracked now, because they are not in gerrit, you can see their logs in:
- https://logs.openstack.org/periodic/periodic-tripleo-ci-f22-ha-liberty/
- https://logs.openstack.org/periodic/periodic-tripleo-ci-f22-ha-mitaka/
- https://logs.openstack.org/periodic/periodic-tripleo-ci-f22-ha/
- https://logs.openstack.org/periodic/periodic-tripleo-ci-f22-nonha/
- https://logs.openstack.org/periodic/periodic-tripleo-ci-f22-upgrades/
This tool shall read the data about all Tripleo CI jobs and report the reasons for failures if they are failed. It could be run from outside by any external script (by importing meow dunction) or running watchcat.py in command line (with appropriate parameters in it).
Firstly it connects to gerrit by ssh using a special user robo
(ask sshnaidm for
its private SSH key) and gets patches for all configured project in config.py.
Gerrit connection is done in utils.py file. The patches amount is limited to 200 to avoid overloading. This gives us all data about the patches that are opened now and run TripleO CI in them.
For periodic jobs it connects to links (see above) and just parses the HTML page to get all periodic jobs info from it.
Then it parses gerrit data or HTML data in case of periodic jobs and create list of all jobs we have with some jobs related data like its length, status, time, etc. See periodic.py and patches.py files for additional info.
Then it passes a given filters (in filters.py) and remain only those jobs we want to analyze.
The script analyzes all jobs (analysis.py) and returns data about them. If script runs in console it prints the info to console, if it run by app.py it creates index.html using Jinja template template.html. For CSS and JS it uses bootstrap framework (https://getbootstrap.com)
The analysis is done by searching log file for specific patterns (patterns.py).
The pattern contains:
- file that it should contain it
- text or regular expression to search
- resulting message if pattern was found
- tag - if it's code issue, or infra, or anything else
The script downloads the mentioned in the pattern file (JobFile in utils.py) and searches there for a text or regular expression.
Regexp pattern could catch particular text and display it in the message.
If pattern is found, the resulting message will be added to output. The output could contain more than one message if more than one pattern was found.
The report in console will contain jobs date, reasons for failure if found, link to logs. The report in HTML page will contain much more info: patch links, commit messages, length and time, project and branch, etc. It will be also a little statistics about the tags - how many percents "infra" issues and how many jobs has known reasons. (TODO)
- Generate statistics for today, yesterday, last week
- Show statistics in the page
- Find acceptable limits for patches amount
- Add grids to pages to look pretty
- Find a hosting for the page (openshift could be as temporary solution)
- Add parallelization to get web pages and analyzing
- Where to keep SSH private key for Gerrit?