--- /dev/null
+{% comment %}
+Copyright (C) The Arvados Authors. All rights reserved.
+
+SPDX-License-Identifier: CC-BY-SA-3.0
+{% endcomment %}
+
+h2(#cuda). Install NVIDA CUDA Toolkit (optional)
+
+If you want to use NVIDIA GPUs, "install the CUDA toolkit.":https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html
+
+In addition, you also must install the NVIDIA Container Toolkit:
+
+<pre>
+DIST=$(. /etc/os-release; echo $ID$VERSION_ID)
+curl -s -L https://nvidia.github.io/libnvidia-container/gpgkey | \
+ sudo apt-key add -
+curl -s -L https://nvidia.github.io/libnvidia-container/$DIST/libnvidia-container.list | \
+ sudo tee /etc/apt/sources.list.d/libnvidia-container.list
+sudo apt-get update
+apt-get install libnvidia-container1 libnvidia-container-tools nvidia-container-toolkit
+</pre>
* "collections" # Part of keepweb
* "keepproxy"
-Ie., for 'keepproxy', the script will lookup for
+Ie., for 'keepproxy', the script will look for
<notextile>
<pre><code>${CUSTOM_CERTS_DIR}/keepproxy.crt
</code></pre>
</notextile>
-h2. Vocabulary definition format
+h2. Definition format
The JSON file describes the available keys and values and if the user is allowed to enter free text not defined by the vocabulary.
Internally, Workbench2 uses the IDs to do property based searches, so if the user searches by @Animal: Human@ or @Species: Homo sapiens@, both will return the same results.
+h2. Definition validation
+
+Because the vocabulary definition is prone to syntax or logical errors, the @controller@ service needs to do some validation before answering requests. If the vocabulary validation fails, the service won't start.
+The site administrator can make sure the vocabulary file is correct before even trying to start the @controller@ service by running @arvados-server config-check@. When the vocabulary definition isn't correct, the administrator will get a list of issues like the one below:
+
+<notextile>
+<pre><code># arvados-server config-check -config /etc/arvados/config.yml
+Error loading vocabulary file "/etc/arvados/vocabulary.json" for cluster zzzzz:
+duplicate JSON key "tags.IDTAGFRUITS.values.IDVALFRUITS1"
+tag key "IDTAGCOMMENT" is configured as strict but doesn't provide values
+tag value label "Banana" for pair ("IDTAGFRUITS":"IDVALFRUITS8") already seen on value "IDVALFRUITS4"
+exit status 1
+</code></pre>
+</notextile>
+
+bq. NOTE: These validation checks are performed only on the node that hosts the vocabulary file defined on the configuration. As the same configuration file is shared between different nodes, those who don't host the file won't produce spurious errors when running @arvados-server config-check@.
+
+h2. Live updates
+
+Sometimes it may be necessary to modify the vocabulary definition in a running production environment.
+When a change is detected, the @controller@ service will automatically attempt to load the new vocabulary and check its validity before making it active.
+If the new vocabulary has some issue, the last valid one will keep being active. The service will export any errors on its health endpoint so that a monitoring solution can send an alert appropriately.
+With the above mechanisms in place, no outages should occur from making typos or other errors when updating the vocabulary file.
+
+h2. Health status
+
+To be able for the administrator to guarantee the system's metadata integrity, the @controller@ service exports a specific health endpoint for the vocabulary at @/_health/vocabulary@.
+As a first measure, the service won't start if the vocabulary file is incorrect. Once running, if there are updates (that may even be periodical), the service needs to keep running while notifying the operator that some fixing is in order.
+An example of a vocabulary health error is included below:
+
+<notextile>
+<pre><code>$ curl --silent -H "Authorization: Bearer xxxtokenxxx" https://controller/_health/vocabulary | jq .
+{
+ "error": "while loading vocabulary file \"/etc/arvados/vocabulary.json\": duplicate JSON key \"tags.IDTAGSIZES.values.IDVALSIZES3\"",
+ "health": "ERROR"
+}
+</code></pre>
+</notextile>
+
+h2. Client support
+
+Workbench2 currently takes advantage of this vocabulary definition by providing an easy-to-use interface for searching and applying metadata to different objects in the system. Because the definition file only resides on the @controller@ node, and Workbench2 is just a static web application run by every users' web browser, there's a mechanism in place that allows Workbench2 and any other client to request the active vocabulary.
+
+The @controller@ service provides an unauthenticated endpoint at @/arvados/v1/vocabulary@ where it exports the contents of the vocabulary JSON file:
+
+<notextile>
+<pre><code>$ curl --silent https://controller/arvados/v1/vocabulary | jq .
+{
+ "kind": "arvados#vocabulary",
+ "strict_tags": false,
+ "tags": {
+ "IDTAGANIMALS": {
+ "labels": [
+ {
+ "label": "Animal"
+ },
+ {
+ "label": "Creature"
+ }
+ ],
+ "strict": false,
+...
+}
+</code></pre>
+</notextile>
+
+Although the vocabulary enforcement is done on the backend side, clients can use this information to provide helping features to users, like doing ID-to-label translations, preemptive error checking, etc.
+
h2. Properties migration
After installing the new vocabulary definition, it may be necessary to migrate preexisting properties that were set up using literal strings. This can be a big task depending on the number of properties on the vocabulary and the amount of collections and projects on the cluster.
Path to the public key file that a-d-c will use to log into the compute node
--mksquashfs-mem (default: 256M)
Only relevant when using Singularity. This is the amount of memory mksquashfs is allowed to use.
- --debug
- Output debug information (default: false)
+ --nvidia-gpu-support (default: false)
+ Install all the necessary tooling for Nvidia GPU support
+ --debug (default: false)
+ Output debug information
</code></pre></notextile>
+h2(#building). NVIDIA GPU support
+
+If you plan on using instance types with NVIDIA GPUs, add @--nvidia-gpu-support@ to the build command line. Arvados uses the same compute image for both GPU and non-GPU instance types. The GPU tooling is ignored when using the image with a non-GPU instance type.
+
h2(#aws). Build an AWS image
<notextile><pre><code>~$ <span class="userinput">./build.sh --json-file arvados-images-aws.json \
</code></pre>
</notextile>
+h4. NVIDIA GPU support
+
+To specify instance types with NVIDIA GPUs, you must include an additional @CUDA@ section:
+
+<notextile>
+<pre><code> InstanceTypes:
+ g4dn:
+ ProviderType: g4dn.xlarge
+ VCPUs: 4
+ RAM: 16GiB
+ IncludedScratch: 125GB
+ Price: 0.56
+ CUDA:
+ DriverVersion: "11.4"
+ HardwareCapability: "7.5"
+ DeviceCount: 1
+</code></pre>
+</notextile>
+
+The @DriverVersion@ is the version of the CUDA toolkit installed in your compute image (in X.Y format, do not include the patchlevel). The @HardwareCapability@ is the CUDA compute capability of the GPUs available for this instance type. The @DeviceCount@ is the number of GPU cores available for this instance type.
+
h4. Minimal configuration example for Amazon EC2
The <span class="userinput">ImageID</span> value is the compute node image that was built in "the previous section":install-compute-node.html#aws.
h3(#SbatchArguments). Containers.LSF.BsubArgumentsList
-When arvados-dispatch-lsf invokes @bsub@, you can add arguments to the command by specifying @BsubArgumentsList@. You can use this to send the jobs to specific cluster partitions or add resource requests. Set @BsubArgumentsList@ to an array of strings. For example:
+When arvados-dispatch-lsf invokes @bsub@, you can add arguments to the command by specifying @BsubArgumentsList@. You can use this to send the jobs to specific cluster partitions or add resource requests. Set @BsubArgumentsList@ to an array of strings.
+
+Template variables starting with % will be substituted as follows:
+
+%U uuid
+%C number of VCPUs
+%M memory in MB
+%T tmp in MB
+%G number of GPU devices (@runtime_constraints.cuda.device_count@)
+
+Use %% to express a literal %. The %%J in the default will be changed to %J, which is interpreted by @bsub@ itself.
+
+For example:
<notextile>
<pre> Containers:
LSF:
- <code class="userinput">BsubArgumentsList: <b>["-C", "0", "-o", "/tmp/crunch-run.%J.out", "-e", "/tmp/crunch-run.%J.err"]</b></code>
+ <code class="userinput">BsubArgumentsList: <b>["-o", "/tmp/crunch-run.%%J.out", "-e", "/tmp/crunch-run.%%J.err", "-J", "%U", "-n", "%C", "-D", "%MMB", "-R", "rusage[mem=%MMB:tmp=%TMB] span[hosts=1]", "-R", "select[mem>=%MMB]", "-R", "select[tmp>=%TMB]", "-R", "select[ncpus>=%C]"]</b></code>
</pre>
</notextile>
Note that the default value for @BsubArgumentsList@ uses the @-o@ and @-e@ arguments to write stdout/stderr data to files in @/tmp@ on the compute nodes, which is helpful for troubleshooting installation/configuration problems. Ensure you have something in place to delete old files from @/tmp@, or adjust these arguments accordingly.
+h3(#SbatchArguments). Containers.LSF.BsubCUDAArguments
+
+If the container requests access to GPUs (@runtime_constraints.cuda.device_count@ of the container request is greater than zero), the command line arguments in @BsubCUDAArguments@ will be added to the command line _after_ @BsubArgumentsList@. This should consist of the additional @bsub@ flags your site requires to schedule the job on a node with GPU support. Set @BsubCUDAArguments@ to an array of strings. For example:
+
+<notextile>
+<pre> Containers:
+ LSF:
+ <code class="userinput">BsubCUDAArguments: <b>["-gpu", "num=%G"]</b></code>
+</pre>
+</notextile>
h3(#PollPeriod). Containers.PollInterval
See "Set up Docker":../install-docker.html
+{% include 'install_cuda' %}
+
{% assign arvados_component = 'python-arvados-fuse crunch-run arvados-docker-cleaner' %}
{% include 'install_compute_fuse' %}
{% include 'install_packages' %}
+{% include 'install_cuda' %}
+
h2(#singularity). Set up Singularity
Follow the "Singularity installation instructions":https://sylabs.io/guides/3.7/user-guide/quick_start.html. Make sure @singularity@ and @mksquashfs@ are working:
</code></pre>
</notextile>
-h2. Vocabulary configuration (optional)
+h2. Vocabulary configuration
-Workbench2 can load a vocabulary file which lists available metadata properties for groups and collections. To configure the property vocabulary definition, please visit the "Metadata Vocabulary Format":{{site.baseurl}}/admin/metadata-vocabulary.html page in the Admin section.
+Workbench2 will load, if available, a vocabulary definition which lists available metadata properties for groups and collections. To learn how to configure the property vocabulary definition, please visit the "Metadata Vocabulary Format":{{site.baseurl}}/admin/metadata-vocabulary.html page in the Admin section.
{% assign arvados_component = 'arvados-workbench2' %}
<notextile>
<pre><code>scp -r provision.sh local* user@host:
+# if you use custom certificates (not Let's Encrypt), make sure to copy those too:
+# scp -r certs user@host:
ssh user@host sudo ./provision.sh --roles comma,separated,list,of,roles,to,apply
</code></pre>
</notextile>
<notextile>
<pre><code>scp -r provision.sh local* tests user@host:
+# if you use custom certificates (not Let's Encrypt), make sure to copy those too:
+# scp -r certs user@host:
ssh user@host sudo ./provision.sh
</code></pre>
</notextile>
1. Go to Vscode
1. Vscode: Click on the `Terminal` menu
1. Vscode: Click `Run Task…`
- 1. Vscode: Select `Configure Arvados`
- 1. Vscode: Paste text into the `Current API_TOKEN and API_HOST from Workbench` prompt
- 1. Vscode: This will create files called `API_HOST` and `API_TOKEN`
+ 1. Vscode: Select `Set Arvados Host`
+ 1. Vscode: Paste the value of API Host from the Workbench `Get API Token` dialog (found in the User menu) at the prompt
+ 1. Vscode: Next, run task `Set Arvados Token`
+ 1. Vscode: Paste the value of API Token from the Workbench `Get API Token` dialog
+ 1. Vscode: These will create files called `API_HOST` and `API_TOKEN`
## 3. Register & run a workflow
property1: value1
property2: $(inputs.value2)
- arv:CUDARequirement:
+ cwltool:CUDARequirement:
cudaVersionMin: "11.0"
cudaComputeCapabilityMin: "9.0"
deviceCountMin: 1
|_. Field |_. Type |_. Description |
|processProperties|key-value map, or list of objects with the fields {propertyName, propertyValue}|The properties that will be set on the container request. May include expressions that reference `$(inputs)` of the current workflow or tool.|
-h2(#CUDARequirement). arv:CUDARequirement
+h2(#CUDARequirement). cwltool:CUDARequirement
Request support for Nvidia CUDA GPU acceleration in the container. Assumes that the CUDA runtime (SDK) is installed in the container, and the host will inject the CUDA driver libraries into the container (equal or later to the version requested).
h2(#performance). Performance
-To get the best perfomance from your workflows, be aware of the following Arvados features, behaviors, and best practices:
+To get the best perfomance from your workflows, be aware of the following Arvados features, behaviors, and best practices.
-If you have a sequence of short-running steps (less than 1-2 minutes each), use the Arvados extension "arv:RunInSingleContainer":cwl-extensions.html#RunInSingleContainer to avoid scheduling and data transfer overhead by running all the steps together at once. To use this feature, @cwltool@ must be installed in the container image.
+Does your application support NVIDIA GPU acceleration? Use "cwltool:CUDARequirement":cwl-extensions.html#CUDARequirement to request nodes with GPUs.
+
+If you have a sequence of short-running steps (less than 1-2 minutes each), use the Arvados extension "arv:RunInSingleContainer":cwl-extensions.html#RunInSingleContainer to avoid scheduling and data transfer overhead by running all the steps together in the same container on the same node. To use this feature, @cwltool@ must be installed in the container image. Example:
+
+{% codeblock as yaml %}
+class: Workflow
+cwlVersion: v1.0
+$namespaces:
+ arv: "http://arvados.org/cwl#"
+inputs:
+ file: File
+outputs: []
+requirements:
+ SubworkflowFeatureRequirement: {}
+steps:
+ subworkflow-with-short-steps:
+ in:
+ file: file
+ out: [out]
+ # This hint indicates that the subworkflow should be bundled and
+ # run in a single container, instead of the normal behavior, which
+ # is to run each step in a separate container. This greatly
+ # reduces overhead if you have a series of short jobs, without
+ # requiring any changes the CWL definition of the sub workflow.
+ hints:
+ - class: arv:RunInSingleContainer
+ run: subworkflow-with-short-steps.cwl
+{% endcodeblock %}
Avoid declaring @InlineJavascriptRequirement@ or @ShellCommandRequirement@ unless you specifically need them. Don't include them "just in case" because they change the default behavior and may add extra overhead.
Workflows should always provide @DockerRequirement@ in the @hints@ or @requirements@ section.
-Build a reusable library of components. Share tool wrappers and subworkflows between projects. Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-language/workflows and tool registries such as "Dockstore":http://dockstore.org .
+Build a reusable library of components. Share tool wrappers and subworkflows between projects. Make use of and contribute to "community maintained workflows and tools":https://github.com/common-workflow-library and tool registries such as "Dockstore":http://dockstore.org .
CommandLineTools wrapping custom scripts should represent the script as an input parameter with the script file as a default value. Use @secondaryFiles@ for scripts that consist of multiple files. For example:
import (
"bytes"
+ "errors"
"flag"
"fmt"
"io"
"os/exec"
"git.arvados.org/arvados.git/lib/cmd"
+ "git.arvados.org/arvados.git/sdk/go/arvados"
"git.arvados.org/arvados.git/sdk/go/ctxlog"
"github.com/ghodss/yaml"
"github.com/sirupsen/logrus"
if err != nil {
return 1
}
+
+ // Check for configured vocabulary validity.
+ for id, cc := range withDepr.Clusters {
+ if cc.API.VocabularyPath == "" {
+ continue
+ }
+ vd, err := os.ReadFile(cc.API.VocabularyPath)
+ if err != nil {
+ if errors.Is(err, os.ErrNotExist) {
+ // If the vocabulary path doesn't exist, it might mean that
+ // the current node isn't the controller; so it's not an
+ // error.
+ continue
+ }
+ logger.Errorf("Error reading vocabulary file %q for cluster %s: %s\n", cc.API.VocabularyPath, id, err)
+ continue
+ }
+ mk := make([]string, 0, len(cc.Collections.ManagedProperties))
+ for k := range cc.Collections.ManagedProperties {
+ mk = append(mk, k)
+ }
+ _, err = arvados.NewVocabulary(vd, mk)
+ if err != nil {
+ logger.Errorf("Error loading vocabulary file %q for cluster %s:\n%s\n", cc.API.VocabularyPath, id, err)
+ continue
+ }
+ }
+
cmd := exec.Command("diff", "-u", "--label", "without-deprecated-configs", "--label", "relying-on-deprecated-configs", "/dev/fd/3", "/dev/fd/4")
for _, obj := range []interface{}{withoutDepr, withDepr} {
y, _ := yaml.Marshal(obj)
SystemRootToken: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
API:
MaxItemsPerResponse: 1234
+ VocabularyPath: /this/path/does/not/exist
Collections:
BlobSigningKey: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
PostgreSQL:
c.Check(stderr.String(), check.Equals, "")
}
+func (s *CommandSuite) TestCheck_VocabularyErrors(c *check.C) {
+ tmpFile, err := ioutil.TempFile("", "")
+ c.Assert(err, check.IsNil)
+ defer os.Remove(tmpFile.Name())
+ _, err = tmpFile.WriteString(`
+{
+ "tags": {
+ "IDfoo": {
+ "labels": [
+ {"label": "foo"}
+ ]
+ },
+ "IDfoo": {
+ "labels": [
+ {"label": "baz"}
+ ]
+ }
+ }
+}`)
+ c.Assert(err, check.IsNil)
+ tmpFile.Close()
+ vocPath := tmpFile.Name()
+ var stdout, stderr bytes.Buffer
+ in := `
+Clusters:
+ z1234:
+ ManagementToken: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ SystemRootToken: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ API:
+ MaxItemsPerResponse: 1234
+ VocabularyPath: ` + vocPath + `
+ Collections:
+ BlobSigningKey: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+ PostgreSQL:
+ Connection:
+ sslmode: require
+ Services:
+ RailsAPI:
+ InternalURLs:
+ "http://0.0.0.0:8000": {}
+ Workbench:
+ UserProfileFormFields:
+ color:
+ Type: select
+ Options:
+ fuchsia: {}
+ ApplicationMimetypesWithViewIcon:
+ whitespace: {}
+`
+ code := CheckCommand.RunCommand("arvados config-check", []string{"-config", "-"}, bytes.NewBufferString(in), &stdout, &stderr)
+ c.Check(code, check.Equals, 1)
+ c.Check(stderr.String(), check.Matches, `(?ms).*Error loading vocabulary file.*for cluster.*duplicate JSON key.*tags.IDfoo.*`)
+}
+
func (s *CommandSuite) TestCheck_DeprecatedKeys(c *check.C) {
var stdout, stderr bytes.Buffer
in := `
func (conn *Conn) loadVocabularyFile() error {
vf, err := os.ReadFile(conn.cluster.API.VocabularyPath)
if err != nil {
- return fmt.Errorf("couldn't reading the vocabulary file: %v", err)
+ return fmt.Errorf("while reading the vocabulary file: %v", err)
}
mk := make([]string, 0, len(conn.cluster.Collections.ManagedProperties))
for k := range conn.cluster.Collections.ManagedProperties {
disp.lsfcli.logger = disp.logger
disp.lsfqueue = lsfqueue{
logger: disp.logger,
- period: time.Duration(disp.Cluster.Containers.CloudVMs.PollInterval),
+ period: disp.Cluster.Containers.CloudVMs.PollInterval.Duration(),
lsfcli: &disp.lsfcli,
}
disp.ArvClient.AuthToken = disp.AuthToken
// Try "bkill" every few seconds until the LSF job disappears
// from the queue.
- ticker := time.NewTicker(5 * time.Second)
+ ticker := time.NewTicker(disp.Cluster.Containers.CloudVMs.PollInterval.Duration() / 2)
defer ticker.Stop()
for qent, ok := disp.lsfqueue.Lookup(ctr.UUID); ok; _, ok = disp.lsfqueue.Lookup(ctr.UUID) {
err := disp.lsfcli.Bkill(qent.ID)
c.Assert(err, check.IsNil)
cluster, err := cfg.GetCluster("")
c.Assert(err, check.IsNil)
- cluster.Containers.CloudVMs.PollInterval = arvados.Duration(time.Second)
+ cluster.Containers.CloudVMs.PollInterval = arvados.Duration(time.Second / 4)
+ cluster.Containers.MinRetryPeriod = arvados.Duration(time.Second / 4)
s.disp = newHandler(context.Background(), cluster, arvadostest.Dispatch1Token, prometheus.NewRegistry()).(*dispatcher)
s.disp.lsfcli.stubCommand = func(string, ...string) *exec.Cmd {
return exec.Command("bash", "-c", "echo >&2 unimplemented stub; false")
}
// "queuedcontainer" should be running
if _, ok := s.disp.lsfqueue.Lookup(arvadostest.QueuedContainerUUID); !ok {
+ c.Log("Lookup(queuedcontainer) == false")
continue
}
// "lockedcontainer" should be cancelled because it
// has priority 0 (no matching container requests)
- if _, ok := s.disp.lsfqueue.Lookup(arvadostest.LockedContainerUUID); ok {
+ if ent, ok := s.disp.lsfqueue.Lookup(arvadostest.LockedContainerUUID); ok {
+ c.Logf("Lookup(lockedcontainer) == true, ent = %#v", ent)
continue
}
// "crTooBig" should be cancelled because lsf stub
// reports there is no suitable instance type
- if _, ok := s.disp.lsfqueue.Lookup(s.crTooBig.ContainerUUID); ok {
+ if ent, ok := s.disp.lsfqueue.Lookup(s.crTooBig.ContainerUUID); ok {
+ c.Logf("Lookup(crTooBig) == true, ent = %#v", ent)
continue
}
var ctr arvados.Container
func (q *lsfqueue) init() {
q.updated = sync.NewCond(&q.mutex)
q.nextReady = make(chan (<-chan struct{}))
- ticker := time.NewTicker(time.Second)
+ ticker := time.NewTicker(q.period)
go func() {
for range ticker.C {
// Send a new "next update ready" channel to
"fmt"
"regexp"
"strconv"
+ "strings"
"time"
)
}
return ts, err
}
+
+var errNoSignature = errors.New("locator has no signature")
+
+func signatureExpiryTime(signedLocator string) (time.Time, error) {
+ matches := SignedLocatorRe.FindStringSubmatch(signedLocator)
+ if matches == nil {
+ return time.Time{}, errNoSignature
+ }
+ expiryHex := matches[7]
+ return parseHexTimestamp(expiryHex)
+}
+
+func stripAllHints(locator string) string {
+ if i := strings.IndexRune(locator, '+'); i > 0 {
+ return locator[:i]
+ }
+ return locator
+}
type collectionFileSystem struct {
fileSystem
uuid string
+ savedPDH atomic.Value
replicas int
storageClasses []string
+ // guessSignatureTTL tracks a lower bound for the server's
+ // configured BlobSigningTTL. The guess is initially zero, and
+ // increases when we come across a signature with an expiry
+ // time further in the future than the previous guess.
+ //
+ // When the guessed TTL is much smaller than the real TTL,
+ // preemptive signature refresh is delayed or missed entirely,
+ // which is OK.
+ guessSignatureTTL time.Duration
+ holdCheckChanges time.Time
+ lockCheckChanges sync.Mutex
}
// FileSystem returns a CollectionFileSystem for the collection.
thr: newThrottle(concurrentWriters),
},
}
+ fs.savedPDH.Store(c.PortableDataHash)
if r := c.ReplicationDesired; r != nil {
fs.replicas = *r
}
return fs, nil
}
-func backdateTree(n inode, modTime time.Time) {
+// caller must have lock (or guarantee no concurrent accesses somehow)
+func eachNode(n inode, ffunc func(*filenode), dfunc func(*dirnode)) {
switch n := n.(type) {
case *filenode:
- n.fileinfo.modTime = modTime
+ if ffunc != nil {
+ ffunc(n)
+ }
case *dirnode:
- n.fileinfo.modTime = modTime
+ if dfunc != nil {
+ dfunc(n)
+ }
for _, n := range n.inodes {
- backdateTree(n, modTime)
+ eachNode(n, ffunc, dfunc)
}
}
}
+// caller must have lock (or guarantee no concurrent accesses somehow)
+func backdateTree(n inode, modTime time.Time) {
+ eachNode(n, func(fn *filenode) {
+ fn.fileinfo.modTime = modTime
+ }, func(dn *dirnode) {
+ dn.fileinfo.modTime = modTime
+ })
+}
+
+// Approximate portion of signature TTL remaining, usually between 0
+// and 1, or negative if some signatures have expired.
+func (fs *collectionFileSystem) signatureTimeLeft() (float64, time.Duration) {
+ var (
+ now = time.Now()
+ earliest = now.Add(time.Hour * 24 * 7 * 365)
+ latest time.Time
+ )
+ fs.fileSystem.root.RLock()
+ eachNode(fs.root, func(fn *filenode) {
+ fn.Lock()
+ defer fn.Unlock()
+ for _, seg := range fn.segments {
+ seg, ok := seg.(storedSegment)
+ if !ok {
+ continue
+ }
+ expiryTime, err := signatureExpiryTime(seg.locator)
+ if err != nil {
+ continue
+ }
+ if expiryTime.Before(earliest) {
+ earliest = expiryTime
+ }
+ if expiryTime.After(latest) {
+ latest = expiryTime
+ }
+ }
+ }, nil)
+ fs.fileSystem.root.RUnlock()
+
+ if latest.IsZero() {
+ // No signatures == 100% of TTL remaining.
+ return 1, 1
+ }
+
+ ttl := latest.Sub(now)
+ fs.fileSystem.root.Lock()
+ {
+ if ttl > fs.guessSignatureTTL {
+ // ttl is closer to the real TTL than
+ // guessSignatureTTL.
+ fs.guessSignatureTTL = ttl
+ } else {
+ // Use the previous best guess to compute the
+ // portion remaining (below, after unlocking
+ // mutex).
+ ttl = fs.guessSignatureTTL
+ }
+ }
+ fs.fileSystem.root.Unlock()
+
+ return earliest.Sub(now).Seconds() / ttl.Seconds(), ttl
+}
+
+func (fs *collectionFileSystem) updateSignatures(newmanifest string) {
+ newLoc := map[string]string{}
+ for _, tok := range regexp.MustCompile(`\S+`).FindAllString(newmanifest, -1) {
+ if mBlkRe.MatchString(tok) {
+ newLoc[stripAllHints(tok)] = tok
+ }
+ }
+ fs.fileSystem.root.Lock()
+ defer fs.fileSystem.root.Unlock()
+ eachNode(fs.root, func(fn *filenode) {
+ fn.Lock()
+ defer fn.Unlock()
+ for idx, seg := range fn.segments {
+ seg, ok := seg.(storedSegment)
+ if !ok {
+ continue
+ }
+ loc, ok := newLoc[stripAllHints(seg.locator)]
+ if !ok {
+ continue
+ }
+ seg.locator = loc
+ fn.segments[idx] = seg
+ }
+ }, nil)
+}
+
func (fs *collectionFileSystem) newNode(name string, perm os.FileMode, modTime time.Time) (node inode, err error) {
if name == "" || name == "." || name == ".." {
return nil, ErrInvalidArgument
return ErrInvalidOperation
}
+// Check for and incorporate upstream changes -- unless that has
+// already been done recently, in which case this func is a no-op.
+func (fs *collectionFileSystem) checkChangesOnServer() error {
+ if fs.uuid == "" && fs.savedPDH.Load() == "" {
+ return nil
+ }
+
+ // First try UUID if any, then last known PDH. Stop if all
+ // signatures are new enough.
+ checkingAll := false
+ for _, id := range []string{fs.uuid, fs.savedPDH.Load().(string)} {
+ if id == "" {
+ continue
+ }
+
+ fs.lockCheckChanges.Lock()
+ if !checkingAll && fs.holdCheckChanges.After(time.Now()) {
+ fs.lockCheckChanges.Unlock()
+ return nil
+ }
+ remain, ttl := fs.signatureTimeLeft()
+ if remain > 0.01 && !checkingAll {
+ fs.holdCheckChanges = time.Now().Add(ttl / 100)
+ }
+ fs.lockCheckChanges.Unlock()
+
+ if remain >= 0.5 {
+ break
+ }
+ checkingAll = true
+ var coll Collection
+ err := fs.RequestAndDecode(&coll, "GET", "arvados/v1/collections/"+id, nil, map[string]interface{}{"select": []string{"portable_data_hash", "manifest_text"}})
+ if err != nil {
+ continue
+ }
+ fs.updateSignatures(coll.ManifestText)
+ }
+ return nil
+}
+
+// Refresh signature on a single locator, if necessary. Assume caller
+// has lock. If an update is needed, and there are any storedSegments
+// whose signatures can be updated, start a background task to update
+// them asynchronously when the caller releases locks.
+func (fs *collectionFileSystem) refreshSignature(locator string) string {
+ exp, err := signatureExpiryTime(locator)
+ if err != nil || exp.Sub(time.Now()) > time.Minute {
+ // Synchronous update is not needed. Start an
+ // asynchronous update if needed.
+ go fs.checkChangesOnServer()
+ return locator
+ }
+ var manifests string
+ for _, id := range []string{fs.uuid, fs.savedPDH.Load().(string)} {
+ if id == "" {
+ continue
+ }
+ var coll Collection
+ err := fs.RequestAndDecode(&coll, "GET", "arvados/v1/collections/"+id, nil, map[string]interface{}{"select": []string{"portable_data_hash", "manifest_text"}})
+ if err != nil {
+ continue
+ }
+ manifests += coll.ManifestText
+ }
+ hash := stripAllHints(locator)
+ for _, tok := range regexp.MustCompile(`\S+`).FindAllString(manifests, -1) {
+ if mBlkRe.MatchString(tok) {
+ if stripAllHints(tok) == hash {
+ locator = tok
+ break
+ }
+ }
+ }
+ go fs.updateSignatures(manifests)
+ return locator
+}
+
func (fs *collectionFileSystem) Sync() error {
+ err := fs.checkChangesOnServer()
+ if err != nil {
+ return err
+ }
if fs.uuid == "" {
return nil
}
if err != nil {
return fmt.Errorf("sync failed: %s", err)
}
- coll := &Collection{
+ if PortableDataHash(txt) == fs.savedPDH.Load() {
+ // No local changes since last save or initial load.
+ return nil
+ }
+ coll := Collection{
UUID: fs.uuid,
ManifestText: txt,
}
- err = fs.RequestAndDecode(nil, "PUT", "arvados/v1/collections/"+fs.uuid, nil, map[string]interface{}{
+
+ selectFields := []string{"uuid", "portable_data_hash"}
+ fs.lockCheckChanges.Lock()
+ remain, _ := fs.signatureTimeLeft()
+ fs.lockCheckChanges.Unlock()
+ if remain < 0.5 {
+ selectFields = append(selectFields, "manifest_text")
+ }
+
+ err = fs.RequestAndDecode(&coll, "PUT", "arvados/v1/collections/"+fs.uuid, nil, map[string]interface{}{
"collection": map[string]string{
"manifest_text": coll.ManifestText,
},
- "select": []string{"uuid"},
+ "select": selectFields,
})
if err != nil {
return fmt.Errorf("sync failed: update %s: %s", fs.uuid, err)
}
+ fs.updateSignatures(coll.ManifestText)
+ fs.savedPDH.Store(coll.PortableDataHash)
return nil
}
err = io.EOF
return
}
+ if ss, ok := fn.segments[ptr.segmentIdx].(storedSegment); ok {
+ ss.locator = fn.fs.refreshSignature(ss.locator)
+ fn.segments[ptr.segmentIdx] = ss
+ }
n, err = fn.segments[ptr.segmentIdx].ReadAt(p, int64(ptr.segmentOff))
if n > 0 {
ptr.off += int64(n)
type keepClientStub struct {
blocks map[string][]byte
refreshable map[string]bool
+ reads []string // locators from ReadAt() calls
onWrite func(bufcopy []byte) // called from WriteBlock, before acquiring lock
authToken string // client's auth token (used for signing locators)
sigkey string // blob signing key
var errStub404 = errors.New("404 block not found")
func (kcs *keepClientStub) ReadAt(locator string, p []byte, off int) (int, error) {
+ kcs.Lock()
+ kcs.reads = append(kcs.reads, locator)
+ kcs.Unlock()
kcs.RLock()
defer kcs.RUnlock()
+ if err := VerifySignature(locator, kcs.authToken, kcs.sigttl, []byte(kcs.sigkey)); err != nil {
+ return 0, err
+ }
buf := kcs.blocks[locator[:32]]
if buf == nil {
return 0, errStub404
func (s *CollectionFSSuite) SetUpTest(c *check.C) {
s.client = NewClientFromEnv()
+ s.client.AuthToken = fixtureActiveToken
err := s.client.RequestAndDecode(&s.coll, "GET", "arvados/v1/collections/"+fixtureFooAndBarFilesInDirUUID, nil, nil)
c.Assert(err, check.IsNil)
s.kc = &keepClientStub{
}
}
+func (s *CollectionFSSuite) TestRefreshSignatures(c *check.C) {
+ filedata1 := "hello refresh signatures world\n"
+ fs, err := (&Collection{}).FileSystem(s.client, s.kc)
+ c.Assert(err, check.IsNil)
+ fs.Mkdir("d1", 0700)
+ f, err := fs.OpenFile("d1/file1", os.O_CREATE|os.O_RDWR, 0700)
+ c.Assert(err, check.IsNil)
+ _, err = f.Write([]byte(filedata1))
+ c.Assert(err, check.IsNil)
+ err = f.Close()
+ c.Assert(err, check.IsNil)
+
+ filedata2 := "hello refresh signatures universe\n"
+ fs.Mkdir("d2", 0700)
+ f, err = fs.OpenFile("d2/file2", os.O_CREATE|os.O_RDWR, 0700)
+ c.Assert(err, check.IsNil)
+ _, err = f.Write([]byte(filedata2))
+ c.Assert(err, check.IsNil)
+ err = f.Close()
+ c.Assert(err, check.IsNil)
+ txt, err := fs.MarshalManifest(".")
+ c.Assert(err, check.IsNil)
+ var saved Collection
+ err = s.client.RequestAndDecode(&saved, "POST", "arvados/v1/collections", nil, map[string]interface{}{
+ "select": []string{"manifest_text", "uuid", "portable_data_hash"},
+ "collection": map[string]interface{}{
+ "manifest_text": txt,
+ },
+ })
+ c.Assert(err, check.IsNil)
+
+ // Update signatures synchronously if they are already expired
+ // when Read() is called.
+ {
+ saved.ManifestText = SignManifest(saved.ManifestText, s.kc.authToken, time.Now().Add(-2*time.Second), s.kc.sigttl, []byte(s.kc.sigkey))
+ fs, err := saved.FileSystem(s.client, s.kc)
+ c.Assert(err, check.IsNil)
+ f, err := fs.OpenFile("d1/file1", os.O_RDONLY, 0)
+ c.Assert(err, check.IsNil)
+ buf, err := ioutil.ReadAll(f)
+ c.Check(err, check.IsNil)
+ c.Check(string(buf), check.Equals, filedata1)
+ }
+
+ // Update signatures asynchronously if we're more than half
+ // way to TTL when Read() is called.
+ {
+ exp := time.Now().Add(2 * time.Minute)
+ saved.ManifestText = SignManifest(saved.ManifestText, s.kc.authToken, exp, s.kc.sigttl, []byte(s.kc.sigkey))
+ fs, err := saved.FileSystem(s.client, s.kc)
+ c.Assert(err, check.IsNil)
+ f1, err := fs.OpenFile("d1/file1", os.O_RDONLY, 0)
+ c.Assert(err, check.IsNil)
+ f2, err := fs.OpenFile("d2/file2", os.O_RDONLY, 0)
+ c.Assert(err, check.IsNil)
+ buf, err := ioutil.ReadAll(f1)
+ c.Check(err, check.IsNil)
+ c.Check(string(buf), check.Equals, filedata1)
+
+ // Ensure fs treats the 2-minute TTL as less than half
+ // the server's signing TTL. If we don't do this,
+ // collectionfs will guess the signature is fresh,
+ // i.e., signing TTL is 2 minutes, and won't do an
+ // async refresh.
+ fs.(*collectionFileSystem).guessSignatureTTL = time.Hour
+
+ refreshed := false
+ for deadline := time.Now().Add(time.Second * 10); time.Now().Before(deadline) && !refreshed; time.Sleep(time.Second / 10) {
+ _, err = f1.Seek(0, io.SeekStart)
+ c.Assert(err, check.IsNil)
+ buf, err = ioutil.ReadAll(f1)
+ c.Assert(err, check.IsNil)
+ c.Assert(string(buf), check.Equals, filedata1)
+ loc := s.kc.reads[len(s.kc.reads)-1]
+ t, err := signatureExpiryTime(loc)
+ c.Assert(err, check.IsNil)
+ c.Logf("last read block %s had signature expiry time %v", loc, t)
+ if t.Sub(time.Now()) > time.Hour {
+ refreshed = true
+ }
+ }
+ c.Check(refreshed, check.Equals, true)
+
+ // Second locator should have been updated at the same
+ // time.
+ buf, err = ioutil.ReadAll(f2)
+ c.Assert(err, check.IsNil)
+ c.Assert(string(buf), check.Equals, filedata2)
+ loc := s.kc.reads[len(s.kc.reads)-1]
+ c.Check(loc, check.Not(check.Equals), s.kc.reads[len(s.kc.reads)-2])
+ t, err := signatureExpiryTime(s.kc.reads[len(s.kc.reads)-1])
+ c.Assert(err, check.IsNil)
+ c.Logf("last read block %s had signature expiry time %v", loc, t)
+ c.Check(t.Sub(time.Now()) > time.Hour, check.Equals, true)
+ }
+}
+
var bigmanifest = func() string {
var buf bytes.Buffer
for i := 0; i < 2000; i++ {
err = s.client.RequestAndDecode(nil, "DELETE", "arvados/v1/collections/"+oob.UUID, nil, nil)
c.Assert(err, check.IsNil)
+ wf, err = s.fs.OpenFile("/home/A Project/oob/test.txt", os.O_CREATE|os.O_RDWR, 0700)
+ c.Assert(err, check.IsNil)
+ err = wf.Close()
+ c.Check(err, check.IsNil)
+
err = project.Sync()
c.Check(err, check.NotNil) // can't update the deleted collection
_, err = s.fs.Open("/home/A Project/oob")
import (
"bytes"
"encoding/json"
+ "errors"
"fmt"
"reflect"
+ "strconv"
"strings"
)
}
err = json.Unmarshal(data, &voc)
if err != nil {
- return nil, fmt.Errorf("invalid JSON format error: %q", err)
+ var serr *json.SyntaxError
+ if errors.As(err, &serr) {
+ offset := serr.Offset
+ errorMsg := string(data[:offset])
+ line := 1 + strings.Count(errorMsg, "\n")
+ column := offset - int64(strings.LastIndex(errorMsg, "\n")+len("\n"))
+ return nil, fmt.Errorf("invalid JSON format: %q (line %d, column %d)", err, line, column)
+ }
+ return nil, fmt.Errorf("invalid JSON format: %q", err)
}
if reflect.DeepEqual(voc, &Vocabulary{}) {
return nil, fmt.Errorf("JSON data provided doesn't match Vocabulary format: %q", data)
}
+
+ shouldReportErrors := false
+ errors := []string{}
+
+ // json.Unmarshal() doesn't error out on duplicate keys.
+ dupedKeys := []string{}
+ err = checkJSONDupedKeys(json.NewDecoder(bytes.NewReader(data)), nil, &dupedKeys)
+ if err != nil {
+ shouldReportErrors = true
+ for _, dk := range dupedKeys {
+ errors = append(errors, fmt.Sprintf("duplicate JSON key %q", dk))
+ }
+ }
voc.reservedTagKeys = make(map[string]bool)
for _, managedKey := range managedTagKeys {
voc.reservedTagKeys[managedKey] = true
for systemKey := range voc.systemTagKeys() {
voc.reservedTagKeys[systemKey] = true
}
- err = voc.validate()
+ validationErrs, err := voc.validate()
if err != nil {
- return nil, err
+ shouldReportErrors = true
+ errors = append(errors, validationErrs...)
+ }
+ if shouldReportErrors {
+ return nil, fmt.Errorf("%s", strings.Join(errors, "\n"))
}
return voc, nil
}
-func (v *Vocabulary) validate() error {
- if v == nil {
+func checkJSONDupedKeys(d *json.Decoder, path []string, errors *[]string) error {
+ t, err := d.Token()
+ if err != nil {
+ return err
+ }
+ delim, ok := t.(json.Delim)
+ if !ok {
return nil
}
+ switch delim {
+ case '{':
+ keys := make(map[string]bool)
+ for d.More() {
+ t, err := d.Token()
+ if err != nil {
+ return err
+ }
+ key := t.(string)
+
+ if keys[key] {
+ *errors = append(*errors, strings.Join(append(path, key), "."))
+ }
+ keys[key] = true
+
+ if err := checkJSONDupedKeys(d, append(path, key), errors); err != nil {
+ return err
+ }
+ }
+ // consume closing '}'
+ if _, err := d.Token(); err != nil {
+ return err
+ }
+ case '[':
+ i := 0
+ for d.More() {
+ if err := checkJSONDupedKeys(d, append(path, strconv.Itoa(i)), errors); err != nil {
+ return err
+ }
+ i++
+ }
+ // consume closing ']'
+ if _, err := d.Token(); err != nil {
+ return err
+ }
+ }
+ if len(path) == 0 && len(*errors) > 0 {
+ return fmt.Errorf("duplicate JSON key(s) found")
+ }
+ return nil
+}
+
+func (v *Vocabulary) validate() ([]string, error) {
+ if v == nil {
+ return nil, nil
+ }
tagKeys := map[string]string{}
// Checks for Vocabulary strictness
if v.StrictTags && len(v.Tags) == 0 {
- return fmt.Errorf("vocabulary is strict but no tags are defined")
+ return nil, fmt.Errorf("vocabulary is strict but no tags are defined")
}
// Checks for collisions between tag keys, reserved tag keys
// and tag key labels.
+ errors := []string{}
for key := range v.Tags {
if v.reservedTagKeys[key] {
- return fmt.Errorf("tag key %q is reserved", key)
+ errors = append(errors, fmt.Sprintf("tag key %q is reserved", key))
}
lcKey := strings.ToLower(key)
if tagKeys[lcKey] != "" {
- return fmt.Errorf("duplicate tag key %q", key)
+ errors = append(errors, fmt.Sprintf("duplicate tag key %q", key))
}
tagKeys[lcKey] = key
for _, lbl := range v.Tags[key].Labels {
label := strings.ToLower(lbl.Label)
if tagKeys[label] != "" {
- return fmt.Errorf("tag label %q for key %q already seen as a tag key or label", lbl.Label, key)
+ errors = append(errors, fmt.Sprintf("tag label %q for key %q already seen as a tag key or label", lbl.Label, key))
}
tagKeys[label] = lbl.Label
}
// Checks for value strictness
if v.Tags[key].Strict && len(v.Tags[key].Values) == 0 {
- return fmt.Errorf("tag key %q is configured as strict but doesn't provide values", key)
+ errors = append(errors, fmt.Sprintf("tag key %q is configured as strict but doesn't provide values", key))
}
// Checks for collisions between tag values and tag value labels.
tagValues := map[string]string{}
for val := range v.Tags[key].Values {
lcVal := strings.ToLower(val)
if tagValues[lcVal] != "" {
- return fmt.Errorf("duplicate tag value %q for tag %q", val, key)
+ errors = append(errors, fmt.Sprintf("duplicate tag value %q for tag %q", val, key))
}
// Checks for collisions between labels from different values.
tagValues[lcVal] = val
for _, tagLbl := range v.Tags[key].Values[val].Labels {
label := strings.ToLower(tagLbl.Label)
if tagValues[label] != "" && tagValues[label] != val {
- return fmt.Errorf("tag value label %q for pair (%q:%q) already seen on value %q", tagLbl.Label, key, val, tagValues[label])
+ errors = append(errors, fmt.Sprintf("tag value label %q for pair (%q:%q) already seen on value %q", tagLbl.Label, key, val, tagValues[label]))
}
tagValues[label] = val
}
}
}
- return nil
+ if len(errors) > 0 {
+ return errors, fmt.Errorf("invalid vocabulary")
+ }
+ return nil, nil
}
func (v *Vocabulary) getLabelsToKeys() (labels map[string]string) {
import (
"encoding/json"
+ "regexp"
+ "strings"
check "gopkg.in/check.v1"
)
},
},
}
- err := s.testVoc.validate()
+ _, err := s.testVoc.validate()
c.Assert(err, check.IsNil)
}
},
},
{
- "Valid data, but uses reserved key",
+ "Invalid JSON error with line & column numbers",
+ `{"tags":{
+ "aKey":{
+ "labels": [,{"label": "A label"}]
+ }
+ }}`,
+ false, `invalid JSON format:.*\(line \d+, column \d+\)`, nil,
+ },
+ {
+ "Invalid JSON with duplicate & reserved keys",
`{"tags":{
"type":{
"strict": false,
- "labels": [{"label": "Type"}]
+ "labels": [{"label": "Class", "label": "Type"}]
+ },
+ "type":{
+ "labels": []
}
}}`,
- false, "tag key.*is reserved", nil,
+ false, "(?s).*duplicate JSON key \"tags.type.labels.0.label\"\nduplicate JSON key \"tags.type\"\ntag key \"type\" is reserved", nil,
},
}
tests := []struct {
name string
voc *Vocabulary
- errMatches string
+ errMatches []string
}{
{
"Strict vocabulary, no keys",
&Vocabulary{
StrictTags: true,
},
- "vocabulary is strict but no tags are defined",
+ []string{"vocabulary is strict but no tags are defined"},
},
{
"Collision between tag key and tag key label",
},
},
},
- "", // Depending on how the map is sorted, this could be one of two errors
+ nil, // Depending on how the map is sorted, this could be one of two errors
},
{
"Collision between tag key and tag key label (case-insensitive)",
},
},
},
- "", // Depending on how the map is sorted, this could be one of two errors
+ nil, // Depending on how the map is sorted, this could be one of two errors
},
{
"Collision between tag key labels",
},
},
},
- "tag label.*for key.*already seen.*",
+ []string{"(?s).*tag label.*for key.*already seen.*"},
},
{
"Collision between tag value and tag value label",
},
},
},
- "", // Depending on how the map is sorted, this could be one of two errors
+ nil, // Depending on how the map is sorted, this could be one of two errors
},
{
"Collision between tag value and tag value label (case-insensitive)",
},
},
},
- "", // Depending on how the map is sorted, this could be one of two errors
+ nil, // Depending on how the map is sorted, this could be one of two errors
},
{
"Collision between tag value labels",
},
},
},
- "tag value label.*for pair.*already seen.*on value.*",
+ []string{"(?s).*tag value label.*for pair.*already seen.*on value.*"},
},
{
"Collision between tag value labels (case-insensitive)",
},
},
},
- "tag value label.*for pair.*already seen.*on value.*",
+ []string{"(?s).*tag value label.*for pair.*already seen.*on value.*"},
},
{
"Strict tag key, with no values",
},
},
},
- "tag key.*is configured as strict but doesn't provide values",
+ []string{"(?s).*tag key.*is configured as strict but doesn't provide values"},
+ },
+ {
+ "Multiple errors reported",
+ &Vocabulary{
+ StrictTags: false,
+ Tags: map[string]VocabularyTag{
+ "IDTAGANIMALS": {
+ Strict: true,
+ Labels: []VocabularyLabel{{Label: "Animal"}, {Label: "Creature"}},
+ },
+ "IDTAGSIZES": {
+ Labels: []VocabularyLabel{{Label: "Animal"}, {Label: "Size"}},
+ },
+ },
+ },
+ []string{
+ "(?s).*tag key.*is configured as strict but doesn't provide values.*",
+ "(?s).*tag label.*for key.*already seen.*",
+ },
},
}
for _, tt := range tests {
c.Log(c.TestName()+" ", tt.name)
- err := tt.voc.validate()
+ validationErrs, err := tt.voc.validate()
c.Assert(err, check.NotNil)
- if tt.errMatches != "" {
- c.Assert(err, check.ErrorMatches, tt.errMatches)
+ for _, errMatch := range tt.errMatches {
+ seen := false
+ for _, validationErr := range validationErrs {
+ if regexp.MustCompile(errMatch).MatchString(validationErr) {
+ seen = true
+ break
+ }
+ }
+ if len(validationErrs) == 0 {
+ c.Assert(err, check.ErrorMatches, errMatch)
+ } else {
+ c.Assert(seen, check.Equals, true,
+ check.Commentf("Expected to see error matching %q:\n%s",
+ errMatch, strings.Join(validationErrs, "\n")))
+ }
}
}
}
}
tracker.updates <- c
go func() {
+ fallbackState := Queued
err := d.RunContainer(d, c, tracker.updates)
if err != nil {
text := fmt.Sprintf("Error running container %s: %s", c.UUID, err)
if err, ok := err.(dispatchcloud.ConstraintsNotSatisfiableError); ok {
+ fallbackState = Cancelled
var logBuf bytes.Buffer
fmt.Fprintf(&logBuf, "cannot run container %s: %s\n", c.UUID, err)
if len(err.AvailableTypes) == 0 {
}
}
text = logBuf.String()
- d.UpdateState(c.UUID, Cancelled)
}
d.Logger.Printf("%s", text)
lr := arvadosclient.Dict{"log": arvadosclient.Dict{
"event_type": "dispatch",
"properties": map[string]string{"text": text}}}
d.Arv.Create("logs", lr, nil)
- d.Unlock(c.UUID)
}
-
- d.mtx.Lock()
- delete(d.trackers, c.UUID)
- d.mtx.Unlock()
+ // If checkListForUpdates() doesn't close the tracker
+ // after 2 queue updates, try to move the container to
+ // the fallback state, which should eventually work
+ // and cause the tracker to close.
+ updates := 0
+ for upd := range tracker.updates {
+ updates++
+ if upd.State == Locked || upd.State == Running {
+ // Tracker didn't clean up before
+ // returning -- or this is the first
+ // update and it contains stale
+ // information from before
+ // RunContainer() returned.
+ if updates < 2 {
+ // Avoid generating confusing
+ // logs / API calls in the
+ // stale-info case.
+ continue
+ }
+ d.Logger.Printf("container %s state is still %s, changing to %s", c.UUID, upd.State, fallbackState)
+ d.UpdateState(c.UUID, fallbackState)
+ }
+ }
}()
return tracker
}
d.Logger.Debugf("ignoring %s locked by %s", c.UUID, c.LockedByUUID)
} else if alreadyTracking {
switch c.State {
- case Queued:
+ case Queued, Cancelled, Complete:
+ d.Logger.Debugf("update has %s in state %s, closing tracker", c.UUID, c.State)
tracker.close()
+ delete(d.trackers, c.UUID)
case Locked, Running:
+ d.Logger.Debugf("update has %s in state %s, updating tracker", c.UUID, c.State)
tracker.update(c)
- case Cancelled, Complete:
- tracker.close()
}
} else {
switch c.State {
#
# SPDX-License-Identifier: AGPL-3.0
+{%- set passenger_pkg = 'nginx-mod-http-passenger'
+ if grains.osfinger in ('CentOS Linux-7') else
+ 'libnginx-mod-http-passenger' %}
+{%- set passenger_mod = '/usr/lib64/nginx/modules/ngx_http_passenger_module.so'
+ if grains.osfinger in ('CentOS Linux-7',) else
+ '/usr/lib/nginx/modules/ngx_http_passenger_module.so' %}
+{%- set passenger_ruby = '/usr/local/rvm/rubies/ruby-2.7.2/bin/ruby'
+ if grains.osfinger in ('CentOS Linux-7', 'Ubuntu-18.04',) else
+ '/usr/bin/ruby' %}
+
### NGINX
nginx:
install_from_phusionpassenger: true
lookup:
- passenger_package: libnginx-mod-http-passenger
- passenger_config_file: /etc/nginx/conf.d/mod-http-passenger.conf
+ passenger_package: {{ passenger_pkg }}
+ ### PASSENGER
+ passenger:
+ passenger_ruby: {{ passenger_ruby }}
+
+ ### SERVER
+ server:
+ config:
+ # This is required to get the passenger module loaded
+ # In Debian it can be done with this
+ # include: 'modules-enabled/*.conf'
+ load_module: {{ passenger_mod }}
+
+ worker_processes: 4
### SNIPPETS
snippets:
# replace with the IP address of your resolver
# - resolver: 127.0.0.1
- ### SERVER
- server:
- config:
- include: 'modules-enabled/*.conf'
- worker_processes: 4
-
### SITES
servers:
managed:
--- /dev/null
+# Copyright (C) The Arvados Authors. All rights reserved.
+#
+# SPDX-License-Identifier: Apache-2.0
+
+{%- set orig_cert_dir = salt['pillar.get']('extra_custom_certs_dir', '/srv/salt/certs') %}
+{%- set dest_cert_dir = '/etc/nginx/ssl' %}
+{%- set certs = salt['pillar.get']('extra_custom_certs', []) %}
+
+{% if certs %}
+extra_custom_certs_file_directory_certs_dir:
+ file.directory:
+ - name: /etc/nginx/ssl
+ - require:
+ - pkg: nginx_install
+
+ {%- for cert in certs %}
+ {%- set cert_file = 'arvados-' ~ cert ~ '.pem' %}
+ {#- set csr_file = 'arvados-' ~ cert ~ '.csr' #}
+ {%- set key_file = 'arvados-' ~ cert ~ '.key' %}
+ {% for c in [cert_file, key_file] %}
+extra_custom_certs_file_copy_{{ c }}:
+ file.copy:
+ - name: {{ dest_cert_dir }}/{{ c }}
+ - source: {{ orig_cert_dir }}/{{ c }}
+ - force: true
+ - user: root
+ - group: root
+ - unless: cmp {{ dest_cert_dir }}/{{ c }} {{ orig_cert_dir }}/{{ c }}
+ - require:
+ - file: extra_custom_certs_file_directory_certs_dir
+ {%- endfor %}
+ {%- endfor %}
+{%- endif %}
{%- set dest_cert_dir = '/etc/nginx/ssl' %}
{%- set certs = salt['pillar.get']('extra_custom_certs', []) %}
+{% if certs %}
extra_custom_certs_file_directory_certs_dir:
file.directory:
- name: /etc/nginx/ssl
- require:
- pkg: nginx_install
-{%- for cert in certs %}
- {%- set cert_file = 'arvados-' ~ cert ~ '.pem' %}
- {#- set csr_file = 'arvados-' ~ cert ~ '.csr' #}
- {%- set key_file = 'arvados-' ~ cert ~ '.key' %}
- {% for c in [cert_file, key_file] %}
+ {%- for cert in certs %}
+ {%- set cert_file = 'arvados-' ~ cert ~ '.pem' %}
+ {#- set csr_file = 'arvados-' ~ cert ~ '.csr' #}
+ {%- set key_file = 'arvados-' ~ cert ~ '.key' %}
+ {% for c in [cert_file, key_file] %}
extra_custom_certs_file_copy_{{ c }}:
file.copy:
- name: {{ dest_cert_dir }}/{{ c }}
- unless: cmp {{ dest_cert_dir }}/{{ c }} {{ orig_cert_dir }}/{{ c }}
- require:
- file: extra_custom_certs_file_directory_certs_dir
+ {%- endfor %}
{%- endfor %}
-{%- endfor %}
+{%- endif %}
# help you deploy them. In order to do that, you need to set `USE_LETSENCRYPT=no` above,
# and copy the required certificates under the directory specified in the next line.
# The certs will be copied from this directory by the provision script.
-CUSTOM_CERTS_DIR="./certs"
+# Please set it to the FULL PATH to the certs dir if you're going to use a different dir
+# Default is "${SCRIPT_DIR}/certs", where the variable "SCRIPT_DIR" has the path to the
+# directory where the "provision.sh" script was copied in the destination host.
+# CUSTOM_CERTS_DIR="${SCRIPT_DIR}/certs"
# The script expects cert/key files with these basenames (matching the role except for
-# keepweb, which is split in both downoad/collections):
+# keepweb, which is split in both download/collections):
# "controller"
# "websocket"
# "workbench"
# "webshell"
# "download" # Part of keepweb
# "collections" # Part of keepweb
-# "keep" # Keepproxy
+# "keepproxy" # Keepproxy
# Ie., 'keep', the script will lookup for
-# ${CUSTOM_CERTS_DIR}/keep.crt
-# ${CUSTOM_CERTS_DIR}/keep.key
+# ${CUSTOM_CERTS_DIR}/keepproxy.crt
+# ${CUSTOM_CERTS_DIR}/keepproxy.key
# The directory to check for the config files (pillars, states) you want to use.
# There are a few examples under 'config_examples'.
DATABASE_PASSWORD=please_set_this_to_some_secure_value
# SSL CERTIFICATES
-# Arvados REQUIRES valid SSL to work correctly. Otherwise, some components will fail
-# to communicate and can silently drop traffic. You can try to use the Letsencrypt
-# salt formula (https://github.com/saltstack-formulas/letsencrypt-formula) to try to
-# automatically obtain and install SSL certificates for your instances or set this
-# variable to "no", provide and upload your own certificates to the instances and
-# modify the 'nginx_*' salt pillars accordingly (see CUSTOM_CERTS_DIR below)
+# Arvados REQUIRES valid SSL to work correctly. Otherwise, some components will
+# fail to communicate and can silently drop traffic. Set USE_LETSENCRYPT="yes"
+# to use the Let's Encrypt salt formula
+# (https://github.com/saltstack-formulas/letsencrypt-formula) to automatically
+# obtain and install SSL certificates for your hostname(s).
+#
+# Alternatively, set this variable to "no" and provide and upload your own
+# certificates to the instances and modify the 'nginx_*' salt pillars
+# accordingly
USE_LETSENCRYPT="no"
# If you going to provide your own certificates for Arvados, the provision script can
# help you deploy them. In order to do that, you need to set `USE_LETSENCRYPT=no` above,
# and copy the required certificates under the directory specified in the next line.
# The certs will be copied from this directory by the provision script.
-CUSTOM_CERTS_DIR="./certs"
+# Please set it to the FULL PATH to the certs dir if you're going to use a different dir
+# Default is "${SCRIPT_DIR}/certs", where the variable "SCRIPT_DIR" has the path to the
+# directory where the "provision.sh" script was copied in the destination host.
+# CUSTOM_CERTS_DIR="${SCRIPT_DIR}/certs"
# The script expects cert/key files with these basenames (matching the role except for
-# keepweb, which is split in both downoad/collections):
+# keepweb, which is split in both download/collections):
# "controller"
# "websocket"
# "workbench"
DATABASE_PASSWORD=please_set_this_to_some_secure_value
# SSL CERTIFICATES
-# Arvados REQUIRES valid SSL to work correctly. Otherwise, some components will fail
-# to communicate and can silently drop traffic. You can try to use the Letsencrypt
-# salt formula (https://github.com/saltstack-formulas/letsencrypt-formula) to try to
-# automatically obtain and install SSL certificates for your instances or set this
-# variable to "no", provide and upload your own certificates to the instances and
-# modify the 'nginx_*' salt pillars accordingly
+# Arvados REQUIRES valid SSL to work correctly. Otherwise, some components will
+# fail to communicate and can silently drop traffic. Set USE_LETSENCRYPT="yes"
+# to use the Let's Encrypt salt formula
+# (https://github.com/saltstack-formulas/letsencrypt-formula) to automatically
+# obtain and install SSL certificates for your hostname(s).
+#
+# Alternatively, set this variable to "no" and provide and upload your own
+# certificates to the instances and modify the 'nginx_*' salt pillars
+# accordingly
USE_LETSENCRYPT="no"
# The directory to check for the config files (pillars, states) you want to use.
done
}
+copy_custom_cert() {
+ cert_dir=${1}
+ cert_name=${2}
+
+ mkdir -p /srv/salt/certs
+
+ if [ -f ${cert_dir}/${cert_name}.crt ]; then
+ cp -v ${cert_dir}/${cert_name}.crt /srv/salt/certs/arvados-${cert_name}.pem
+ else
+ echo "${cert_dir}/${cert_name}.crt does not exist. Exiting"
+ exit 1
+ fi
+ if [ -f ${cert_dir}/${cert_name}.key ]; then
+ cp -v ${cert_dir}/${cert_name}.key /srv/salt/certs/arvados-${cert_name}.key
+ else
+ echo "${cert_dir}/${cert_name}.key does not exist. Exiting"
+ exit 1
+ fi
+}
+
DEV_MODE="no"
CONFIG_FILE="${SCRIPT_DIR}/local.params"
CONFIG_DIR="local_config_dir"
WORKBENCH2_EXT_SSL_PORT=3001
USE_LETSENCRYPT="no"
-CUSTOM_CERTS_DIR="./certs"
+CUSTOM_CERTS_DIR="${SCRIPT_DIR}/certs"
## These are ARVADOS-related parameters
# For a stable release, change RELEASE "production" and VERSION to the
else
# If we add individual roles, make sure we add the repo first
echo " - arvados.repo" >> ${S_DIR}/top.sls
+ # We add the custom_certs state
+ grep -q "custom_certs" ${S_DIR}/top.sls || echo " - extra.custom_certs" >> ${S_DIR}/top.sls
+
+ # And we add the basic part for the certs pillar
+ if [ "x${USE_LETSENCRYPT}" != "xyes" ]; then
+ # And add the certs in the custom_certs pillar
+ echo "extra_custom_certs_dir: /srv/salt/certs" > ${P_DIR}/extra_custom_certs.sls
+ echo "extra_custom_certs:" >> ${P_DIR}/extra_custom_certs.sls
+ grep -q "extra_custom_certs" ${P_DIR}/top.sls || echo " - extra_custom_certs" >> ${P_DIR}/top.sls
+ fi
+
for R in ${ROLES}; do
case "${R}" in
"database")
grep -q "letsencrypt" ${S_DIR}/top.sls || echo " - letsencrypt" >> ${S_DIR}/top.sls
else
# Use custom certs
- cp -v ${CUSTOM_CERTS_DIR}/controller.* "${F_DIR}/extra/extra/files/"
- # We add the custom_certs state
- grep -q "custom_certs" ${S_DIR}/top.sls || echo " - extra.custom_certs" >> ${S_DIR}/top.sls
+ copy_custom_cert ${CUSTOM_CERTS_DIR} controller
+ grep -q controller ${P_DIR}/extra_custom_certs.sls || echo " - controller" >> ${P_DIR}/extra_custom_certs.sls
fi
grep -q "arvados.${R}" ${S_DIR}/top.sls || echo " - arvados.${R}" >> ${S_DIR}/top.sls
# Pillars
grep -q "aws_credentials" ${P_DIR}/top.sls || echo " - aws_credentials" >> ${P_DIR}/top.sls
- grep -q "docker" ${P_DIR}/top.sls || echo " - docker" >> ${P_DIR}/top.sls
grep -q "postgresql" ${P_DIR}/top.sls || echo " - postgresql" >> ${P_DIR}/top.sls
grep -q "nginx_passenger" ${P_DIR}/top.sls || echo " - nginx_passenger" >> ${P_DIR}/top.sls
grep -q "nginx_${R}_configuration" ${P_DIR}/top.sls || echo " - nginx_${R}_configuration" >> ${P_DIR}/top.sls
else
# Use custom certs, special case for keepweb
if [ ${R} = "keepweb" ]; then
- cp -v ${CUSTOM_CERTS_DIR}/download.* "${F_DIR}/extra/extra/files/"
- cp -v ${CUSTOM_CERTS_DIR}/collections.* "${F_DIR}/extra/extra/files/"
+ copy_custom_cert ${CUSTOM_CERTS_DIR} download
+ copy_custom_cert ${CUSTOM_CERTS_DIR} collections
else
- cp -v ${CUSTOM_CERTS_DIR}/${R}.* "${F_DIR}/extra/extra/files/"
+ copy_custom_cert ${CUSTOM_CERTS_DIR} ${R}
fi
- # We add the custom_certs state
- grep -q "custom_certs" ${S_DIR}/top.sls || echo " - extra.custom_certs" >> ${S_DIR}/top.sls
-
fi
# webshell role is just a nginx vhost, so it has no state
if [ "${R}" != "webshell" ]; then
${P_DIR}/nginx_${R}_configuration.sls
fi
else
- grep -q ${R} ${P_DIR}/extra_custom_certs.sls || echo " - ${R}" >> ${P_DIR}/extra_custom_certs.sls
-
# As the pillar differ whether we use LE or custom certs, we need to do a final edition on them
# Special case for keepweb
if [ ${R} = "keepweb" ]; then
s#__CERT_PEM__#/etc/nginx/ssl/arvados-${kwsub}.pem#g;
s#__CERT_KEY__#/etc/nginx/ssl/arvados-${kwsub}.key#g" \
${P_DIR}/nginx_${kwsub}_configuration.sls
+ grep -q ${kwsub} ${P_DIR}/extra_custom_certs.sls || echo " - ${kwsub}" >> ${P_DIR}/extra_custom_certs.sls
done
else
sed -i "s/__CERT_REQUIRES__/file: extra_custom_certs_file_copy_arvados-${R}.pem/g;
s#__CERT_PEM__#/etc/nginx/ssl/arvados-${R}.pem#g;
s#__CERT_KEY__#/etc/nginx/ssl/arvados-${R}.key#g" \
${P_DIR}/nginx_${R}_configuration.sls
+ grep -q ${R} ${P_DIR}/extra_custom_certs.sls || echo " - ${R}" >> ${P_DIR}/extra_custom_certs.sls
fi
fi
;;
grep -q "docker" ${S_DIR}/top.sls || echo " - docker.software" >> ${S_DIR}/top.sls
grep -q "arvados.${R}" ${S_DIR}/top.sls || echo " - arvados.${R}" >> ${S_DIR}/top.sls
# Pillars
- grep -q "" ${P_DIR}/top.sls || echo " - docker" >> ${P_DIR}/top.sls
+ grep -q "docker" ${P_DIR}/top.sls || echo " - docker" >> ${P_DIR}/top.sls
;;
"dispatcher")
# States
- grep -q "docker" ${S_DIR}/top.sls || echo " - docker.software" >> ${S_DIR}/top.sls
grep -q "arvados.${R}" ${S_DIR}/top.sls || echo " - arvados.${R}" >> ${S_DIR}/top.sls
# Pillars
# ATM, no specific pillar needed