Documentation of backend components and admin procedures for Toolforge. See Help:Toolforge for user facing documentation about actually using Toolforge to run your bots and webservices.
Performing admin procedures requires having admin permissions on Toolforge. There is not a single "admin" flag, but a set of interrelated permissions you can be granted. These are described in detail in the page Toolforge roots and Toolforge admins.
FailoverTools should be able to survive the failure of any one virt* node. Some items may need manual failover
WebProxyThe front web proxy is now a stateless web service.
There are two tools-proxy-N
VMs in the tools
project, which previously ran Dynamicproxy and nowadays just proxy everything to the K8s HAProxies.
The only meaningful thing that currently happens on them is the toolviews counting based on the access logs. Otherwise we could remove those nodes and just point to HAProxy.
In case one VM is not working correctly, we can failover from one VM to the other, which can be done by manually reassigning the floating IP in Horizon or from the OpenStack CLI.
Static webserverThis is a stateless simple nginx http server. Simply switch the floating IP from tools-static-10 to tools-static-11 (or vice versa) to switch over. Recovery is also equally trivial - just bring the machine back up and make sure puppet is ok.
Checker serviceThis is the service that Icinga hits to check status of several services. It's totally stateless.
See Portal:Toolforge/Admin/Toolschecker
RedisRedis uses Sentinel to automatically fail over in case of a node failure.
PrometheusSee Portal:Toolforge/Admin/Prometheus#Failover.
tools-serviceService nodes run the Toolforge internal aptly service, to serve .deb packages as a repository for all the other nodes.
Command orchestrationToolforge and Toolsbeta both have a local cumin server.
Administrative tasks Logging in as rootFor normal login root access see Toolforge roots and Toolforge admins.
In case the normal login does not work for example due to an LDAP failure, administrators can also directly log in as root. To prepare for that occasion, generate a separate key with ssh-keygen
, add an entry to the passwords::root::extra_keys
hash in Horizon's 'Project Puppet' section with your shell username as key and your public key as value and wait a Puppet cycle to have your key added to the root
accounts. Add to your ~/.ssh/config
:
# Use different identity for Tools root. Match host *.tools.eqiad1.wikimedia.cloud user root IdentityFile ~/.ssh/your_secret_root_key
The code that reads passwords::root::extra_keys
is in labs/private:modules/passwords/manifests/init.pp.
Useful for dealing with security critical situations. Just touch /etc/nologin
and PAM will prevent any and all non-root logins.
Users are increasingly noticing slowness on tools-login due to either CPU or IOPS exhaustion caused by people running processes there instead of on Kubernetes. Here are some tips for finding the processes in need of killing:
$ iotop
$ ps axo user:32,pid,cmd | grep -Ev "^($USER|root|daemon|_lldpd|messagebus|nagios|nslcd|ntp|prometheus|statd|syslog|Debian-exim|www-data)" | grep -ivE 'screen|tmux|-bash|mosh-server|sshd:|/bin/bash|/bin/zsh'
pyb.py
kill with extreme prejudice.!log
something like: !log tools.$TOOL Killed $PROC process running on tools-bastion-NN. See https://wikitech.wikimedia.org/wiki/Help:Toolforge/Jobs_framework for instructions on running jobs on Kubernetes.
Local packages are provided by an aptly
repository on tools-services-05
.
On tools-services-05
, you can manipulate the package database by various commands; cf. aptly(1)
. Afterwards, you need to publish the database to the file Packages
by (for the trusty-tools
repository) aptly publish --skip-signing update trusty-tools
. To use the packages on the clients you need to wait 30 minutes again or run apt-get update
. In general, you should never just delete packages, but move them to ~tools.admin/archived-packages
.
You can always see where a package is (would be) coming from with apt-cache showpkg $package
.
Package repositories
Packagers effectively get root on our systems, as they could add a rootkit to the package, or upload an unsafe sshd version, and apt-get will happily install it
Hardness clause: in extraordinary cases, and for 'grandfathered in' packages, we can deviate from this policy, as long as security and maintainability are kept in mind.
apt.wikimedia.org
We assume that whatever is good for production is also OK for Toolforge.
aptly
We manage the aptly repository ourselves.
A list of locally maintained packages can be found under /local packages.
Building packages Deploy new misctools package Testing/QA for a new tools-webservice packageSee also tools-webservice source tree README.
There is a simple flask app in toolsbeta using the tool test
that is set up to be deployed via webservice on Kubernetes.
After running become test
, you can go to the qa/tools-webservice
directory. This is checked out via anonymous https, and is suitable for checking out a patch you are reviewing. There is an untracked file in there that is useful here, usually. The webservicefile at the route is just a copy of the one in the scripts
folder in the repo. The only difference is:
9d8 < sys.path.insert(0, '')
That exchanges the distribution installed package in the python path for the local directory, so if you run ./webservice $somecommand
it will run what is in your local folder rather than what is in /usr/lib/python3/dist-packages/
. If you are testing changes made directly to scripts/webservice
in the repo, you will likely need to copy that over the file and add sys.path.insert(0, "")
after the import sys line.
If there is no import sys
line in this version of the code, add one! This should let you bang on your new version without having to mess with packaging, yet.
To get a look at webserver statistics, goaccess is installed on the webproxies. Usage:
goaccess --date-format="%d/%b/%Y" --log-format='%h - - [%d:%t %^] "%r" %s %b "%R" "%u"' -q -f/var/log/nginx/access.log
Interactive key bindings are documented on the man page. HTML output is supported by piping to a file. Note that nginx logs are rotated (twice?) daily, so there is only very recent data available.
Banning an IP from tool labsOn Hiera:Tools, add the IP to the list of dynamicproxy::banned_ips, then force a puppet run on the webproxies. Add a note to Help:Toolforge/Banned explaining why. The user will get a message like [1].
Deploying the main web pageThis website (plus the 403/500/503 error pages) are hosted under tools.admin
. To deploy,
$ become admin $ cd tool-admin-web $ git pullRegenerate replica.my.cnf
This requires access to the cloudcontrol host which is running maintain-dbusers, and can be done as follows:
$ ssh cloudcontrolXXXX.eqiad.wmnet $ sudo /usr/local/sbin/maintain-dbusers delete tools.${NAME} --account-type=tool :# or $ sudo /usr/local/sbin/maintain-dbusers delete ${USERNAME} --account-type=user
Once the account has been deleted, the maintain-dbusers service will automatically recreate the user account.
Debugging bad MariaDB credentials Sometimes things go wrong and a user's replica.my.cnf
credentials don't propigate everywhere. You can check the status on various servers to try and narrow down what went wrong.
The database credentials needed are in /etc/dbusers.yaml
on the cloudcontrol host running maintain-dbusers.
$ ssh cloudcontrolXXXX.eqiad.wmnet $ sudo cat /etc/dbusers.yaml :# look for the accounts-backend['password'] for the m5-master connections (user: labsdbaccounts) :# look for the labsdbs['password'] for the other connections (user: labsdbadmin) $ CHECK_UID=u12345 # User id to check for :# Check if the user is in our meta datastore $ mariadb -h m5-master.eqiad.wmnet -u labsdbaccounts -p -e "USE labsdbaccounts; SELECT * FROM account WHERE mysql_username='${CHECK_UID}'\G" :# Check if all the accounts are created in the labsdb boxes from meta datastore. $ ACCT_ID=.... # Account_id is foreign key (id from account table) $ mariadb -h m5-master.eqiad.wmnet -u labsdbaccounts -p -e "USE labsdbaccounts; SELECT * FROM labsdbaccounts.account_host WHERE account_id=${ACCT_ID}\G" :# Check the actual labsdbs if needed $ mariadb -h clouddbXXXX.eqiad.wmnet -u labsdbadmin -p -e 'SELECT User, Password from mysql.user where User like "${CHECK_UID}";' :# Resynchronize account state on the replicas by finding missing GRANTS on each db server $ sudo maintain-dbusers harvest-replicas
See phab:T183644 for an example of fixing automatic credential creation caused when a old LDAP user becomes a Toolforge member and has an untracked user account on toolsdb.
Regenerate kubernetes credentials for tools (.kube/config)With admin credentials (root on a control plane node will do), run kubectl -n tool-<toolname> delete cm maintain-kubeusers-<toolname>
; it should get regenerated within minutes.
See Portal:Toolforge/Admin/Kubernetes#Building_new_nodes
Deleting a toolFor batch or CLI deletion of tools, use the 'mark_tool' command on a cloudcontrol node:
The awful truth about tool deletionandrew@cloudcontrol1003:~$ sudo mark_tool usage: mark_tool [-h] [--ldap-user LDAP_USER] [--ldap-password LDAP_PASSWORD] [--ldap-base-dn LDAP_BASE_DN] [--project PROJECT] [--disable] [--delete] [--enable] tool mark_tool: error: the following arguments are required: tool
Maintainers can mark their tools for deletion using the "Disable tool" button on the tool's detail page on https://toolsadmin.wikimedia.org/. In either case, the immediate effect of disabling a tool is to stop any running jobs, prevent users from logging in as that tool, and schedule archiving and deletion for 40 days in the future.
A tool can be restored within 40 days of being disabledTool archives are stored on the tools NFS server, currently tools-nfs-2.tools.eqiad1.wikimedia.cloud
:
root@labstore1004:/srv/disable-tool# ls -ltrah /srv/tools/archivedtools/ total 1.8G drwxr-xr-x 5 root root 4.0K Jun 21 19:37 .. -rw-r--r-- 1 root root 102K Jul 22 22:15 andrewtesttooltwo -rw-r--r-- 1 root root 45 Oct 13 00:47 andrewtesttooltwo.tgz -rw-r--r-- 1 root root 8.3M Oct 13 03:20 mediaplaycounts.tgz -rw-r--r-- 1 root root 1.8G Oct 13 04:01 projanalysis.tgz -rw-r--r-- 1 root root 1.3M Oct 13 21:05 reportsbot.tgz drwxr-xr-x 2 root root 4.0K Oct 13 21:10 . -rw-r--r-- 1 root root 719K Oct 13 21:10 wsm.tgz -rw-r--r-- 1 root root 4.8K Oct 13 21:20 andrewtesttoolfour.tgz
The actual deletion process is shockingly complicated. A tool will only be archived and deleted if all of the prior steps succeed, but disabling of a tool should be a sure thing.
SSL certificatesSee Portal:Toolforge/Admin/SSL_certificates.
Granting a tool write access to Elasticsearch$ ./new-es-password.sh tools.example tools.example elasticsearch.ini ---- [elasticsearch] user=tools.example password=A3rJqgFKxa/x4NlnIhmw2cXcV92it/Zv0Yt+a7yhxCw= ---- tools.example puppet master private (hieradata/labs/tools/common.yaml) ---- profile::toolforge::elasticsearch::haproxy::elastic_users: - name: 'tools.example' password: '$6$FYwP3wxT4K7O9EE$OA3P5972NWJVG/WUnD240sal34/dsNabbcawItevMYO9uoR.fJBrjSABex0EDW0wlkWHID1Tf4oJoiNvYFGmy/'
$ ssh tools-puppetserver-01.tools.eqiad1.wikimedia.cloud $ sudo -i # cd /srv/git/labs/private # vim hieradata/labs/tools/common.yaml ... merge the hiera data with the existing key... :wq # git add hieradata/labs/tools/common.yaml # git commit -m "[local] Elasticsearch credentials for $TOOL"
cloudcumin1001.eqiad.wmnet:~$ sudo cumin "O{project:tools name:.*elastic.*}" "run-puppet-agent"
$ ssh dev.toolforge.org $ sudo -i become example-tool $ toolforge envvars create TOOL_ELASTICSEARCH_USER Enter the value of your envvar (Hit Ctrl+C to cancel): <insert user> $ toolforge envvars create TOOL_ELASTICSEARCH_PASSWORD Enter the value of your envvar (Hit Ctrl+C to cancel): <insert password>
Note: An older procedure placed the credentials in /data/project/$TOOL/.elasticsearch.ini
instead.
See Managing package upgrades.
Creating a new Docker image (e.g. for new versions of Node.js)See Portal:Toolforge/Admin/Kubernetes#Docker_Images
APIsToolforge is moving towards an API-oriented model where client tools (such as those installed on bastions) contact the Toolforge API to make changes instead of making them directly.
See the user docs also.
AccessThey APIs are presented as one single aggregated endpoint though the API Gateway.
The base endpoint is https://api.svc.[project].eqiad1.wikimedia.cloud:30003
. Services are routed with subpaths, for example /jobs
for the Jobs API.
For authentication we currently use client certificates issued by the Kubernetes cluster internal CA via maintain-kubeusers. This will change in the future as we evolve how the APIs are accessed and used.
Components Deploying a componentSee the docs in gitlab.
API GatewaySee Portal:Toolforge/Admin/API_Gateway
Jobs ServiceSee Portal:Toolforge/Admin/Jobs_Service
Envvars ServiceSee Portal:Toolforge/Admin/Envvars_Service
Build ServiceSee Portal:Toolforge/Admin/Build_Service
Components serviceSee Portal:Toolforge/Admin/Component_Service
Tools-mail / EximSee Portal:Toolforge/Admin/Exim and Portal:Cloud_VPS/Admin/Email#Operations
Checker ServiceThis is the service that Icinga hits to check status of several services. It's totally stateless.
See Portal:Toolforge/Admin/Toolschecker
RedisSee Portal:Toolforge/Admin/Redis.
PrometheusSee Portal:Toolforge/Admin/Prometheus.
Apt repositorySee Portal:Toolforge/Admin/Apt_repository
Striker/Toolforge UISee Portal:Toolforge/Admin/Striker
ToolsDBSee Portal:Toolforge/Admin/ToolsDB
Kubernetes infrastructureSee Portal:Toolforge/Admin/Kubernetes
Users and communitySome information about how to manage users and general community and their relationship with Toolforge.
Project membership request approvalUser access requests show up in https://toolsadmin.wikimedia.org/tools/membership/
Some guidelines for account approvals, based on advice from scfc:
Requests left in Feedback needed for more information for more than 30 days should usually be declined with a message like "Feel free to apply again later with more complete information."
Quota managementToolforge quotas are managed via maintain-kubeusers.
See Portal:Toolforge/Admin/Kubernetes#Ingress
What makes a root/Giving root accessSee Toolforge roots and Toolforge admins
Servicegroup logtools.admin runs /data/project/admin/bin/toolhistory
, which provides an hourly snapshot of ldaplist -l servicegroup
as git repository in /data/project/admin/var/lib/git/servicegroups
These tools offer useful information about Toolforge itself:
RetroSearch is an open source project built by @garambo | Open a GitHub Issue
Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo
HTML:
3.2
| Encoding:
UTF-8
| Version:
0.7.4