A RetroSearch Logo

Home - News ( United States | United Kingdom | Italy | Germany ) - Football scores

Search Query:

Showing content from https://github.com/pytorch/ossci-job-dsl below:

pytorch/ossci-job-dsl: Jenkins job definitions for OSSCI

This is a Job DSL project for FAIR/AML's OSSCI infrastructure.

Looking for information about the actual machines the jobs run on? See fairinternal/ossci-infra.

CI failed, but my local build is fine. What should I do?!

Try prepending sudo if you get the permission denied error for the docker commands (and later figure out why your user doesn't have permissions to connect to the Docker socket; maybe you need to add yourself to the docker group and reboot).

Want to run a Docker image on a GPU? Standard issue devgpus don't allow use of Docker, so you will have to either (1) run docker on devfair, (2) get a GPU-enabled AWS instance (the OSS CI team has a few allocated, get in touch with them to see how to connect), (3) find a GPU machine that you're managing yourself. All of these will require some time to provision, so don't try to do this last minute.

Want to know more about what Docker images are available? See "Available docker images."

If you just want to reproduce a test error, there is the particular Docker image for your job which you should pull and test. But if you're interested in repurposing our CI Docker images for other purposes, it helps to know about the general structure of the Docker images our CI exposes and how they are built (so you can find the URL for a base image you might be interested in.)

For historical reasons, there are two sets of Docker images, one for PyTorch and one for Caffe2 (we intend to merge these at some point, but we haven't finished yet.

PyTorch Dockerfiles source lives at https://github.com/pytorch/pytorch-ci-dockerfiles and are built every week at https://ci.pytorch.org/jenkins/job/pytorch-docker-master/

Caffe2 Dockerfiles source lives at https://github.com/pytorch/pytorch/tree/master/docker/caffe2/jenkins and are built upon request at https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/

Summary for gdb-enabled CPU:

ssh ubuntu@$CPU_HOST
docker run --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -u jenkins -i $DOCKER_IMAGE /bin/bash

**Summary for ASAN builds (jobs like pytorch_linux_xenial_py3_clang5_asan_test)

ssh ubuntu@$CPU_HOST
docker run --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -u jenkins -i $DOCKER_IMAGE /bin/bash
export LD_PRELOAD=/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux/libclang_rt.asan-x86_64.so
cd ~/workspace
# run your test repro

Summary for gdb-enabled NVIDIA/CUDA GPU

ssh ubuntu@$GPU_HOST
docker run --rm --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -u jenkins -i --runtime=nvidia -e CUDA_VERSION=8 -e NVIDIA_VISIBLE_DEVICES=all $DOCKER_IMAGE /bin/bash

Summary for AMD/ROCM GPU

ssh -p $AMD_PORT $AMD_USERNAME@$AMD_HOST
docker run --device=/dev/kfd --device=/dev/dri --group-add video -it $DOCKER_IMAGE /bin/bash

Get credentials for an AMD GPU machine at https://fb.quip.com/Luj5AQjlH11U This should only be necessary if you actually plan to run tests on an AMD GPU; if you are debugging build failures, any old host is OK, though make sure you have 16G of RAM (at least).

What is my CPU/GPU HOST?

What is my Docker image?

Your Docker image will look something like registry.pytorch.org/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py2:69-3002.

This image is:

You can tell you've got the right one because Jenkins homedir will have a workspace directory.

What do I do once I'm in?

Read the actual jobs directory to see how to actually build/test (at the very least, you will need to set PATH to pick up the correct Python executable.)

You DO NOT need to build PyTorch; it will already be installed. But if you want to inject debugging code - feel free to and use regular python setup.py develop instructions in ~/workspace.

What do all the flags in the docker run command mean?

The CUDA docker command didn't work.

You need to install nvidia-docker 2.0 which knows how to expose CUDA devices inside Docker.

Where is my source?

Caffe2 builds don't currently store their source code in test images; you will need to git clone a copy of the source and checkout the correct one.

OS X builds are not containerized. You probably have a Macbook; first try reproducing locally. Otherwise, see https://fb.quip.com/FIDAOAi7r2A for canonical information about our OS X workers. (Facebook employees only).

Changes you make to these machines affect everyone, so please be careful.

You must Remote Desktop into the Windows machines; see this Quip for information how to access (you may have to ask for access.)

Changes you make to these machines affect everyone, so please be careful.

Taking PyTorch as an example (much of the same applies to Caffe2), here is how we structure our jobs:

.
├── jobs                    # DSL script files
├── resources               # resources for DSL scripts
├── src
│   ├── main
│   │   ├── groovy          # support classes
│   │   └── resources
│   │       └── idea.gdsl   # IDE support for IDEA
│   └── test
│       └── groovy          # specs
└── build.gradle            # build file

./gradlew test runs the specs.

JobScriptsSpec will loop through all DSL files and make sure they don't throw any exceptions when processed. All XML output files are written to build/debug-xml. This can be useful if you want to inspect the generated XML before check-in.

You can create the example seed job via the Rest API Runner (see below) using the pattern jobs/seed.groovy.

Or manually create a job with the same structure:

Note that starting with Job DSL 1.60 the "Additional classpath" setting is not available when Job DSL script security is enabled.

Note: the REST API Runner does not work with Automatically Generated DSL.

A gradle task is configured that can be used to create/update jobs via the Jenkins REST API, if desired. Normally a seed job is used to keep jobs in sync with the DSL, but this runner might be useful if you'd rather process the DSL outside of the Jenkins environment or if you want to create the seed job from a DSL script.

./gradlew rest -Dpattern=<pattern> -DbaseUrl=<baseUrl> [-Dusername=<username>] [-Dpassword=<password>]

Sometimes, you will be looking for a function in the Jenkins Job DSL, and it will simply not exist. DO NOT DESPAIR. Read this instead: http://www.devexp.eu/2014/10/26/use-unsupported-jenkins-plugins-with-jenkins-dsl/

In particular, http://job-dsl.herokuapp.com/ is really helpful, even if you're not necessarily working on a custom DSL function.

When you do this, you might want to edit the web UI, and then see the Jenkins? Click on "REST API" at the bottom of the job page and click the link for "config.xml", which will give you the config.xml of the job. Example: https://ci.pytorch.org/jenkins/job/skeleton-pull-request/config.xml

You can navigate to https://ci.pytorch.org/jenkins/script and run Groovy scripts to run ad hoc management tasks. This can be very useful for tasks that are tedious to execute manually.

Beware: with great power comes great responsibility!!

Mass removal of stale jobs
import jenkins.model.*
  
def folder = Jenkins.instance.items.find { job ->
  job.name == "caffe2-builds"
}

def jobs = folder.items.findAll { job -> 
  job.name =~ /^caffe2-linux-/
}

jobs.each { job -> 
  println("Planning to remove ${job.name}") 
  //job.delete()
}

null
Listing the number of queued items waiting on what labels
def map = [:]

Jenkins.instance.queue.items.each {
    i = map.get(it.assignedLabel, 0);
    map[it.assignedLabel] = i + 1;
}

sorted = map.sort { a, b -> b.value <=> a.value }

sorted.each { label, count ->
    println("${label}: ${count}");
}

println "---"

Jenkins.instance.slaves.each {
  println "${it.name} (${it.getComputer().countBusy()}/${it.getNumExecutors()}): ${it.getLabelString()}"
}

null
Pruning stale queued jobs
import hudson.model.*
  
def queue = Hudson.instance.queue
  
def cancel = queue.items.findAll {
  if (it.task.name.startsWith('name-of-job-to-cleanup')) {
    return true;
  }
  return false;
}

cancel.each {
  queue.cancel(it.task)
}
Developing using IntelliJ

A more pleasant Java development experience can be attained by working on ossci-job-dsl inside a real Java IDE. Here's how to set it up using IntelliJ:

  1. In the opening splash screen, select "Import Project"
  2. Select the directory of ossci-job-dsl
  3. Import project from external model: Gradle
  4. Click through the last screen, finishing the import
  5. To test, click "Run" and "Edit configurations"
  6. Create a new run configuration based on Gradle
  7. Select the current project as the Gradle project, and put "test" in Tasks.

You now have running tests!


RetroSearch is an open source project built by @garambo | Open a GitHub Issue

Search and Browse the WWW like it's 1997 | Search results from DuckDuckGo

HTML: 3.2 | Encoding: UTF-8 | Version: 0.7.4