4 title: Analyzing workflow performance
7 Copyright (C) The Arvados Authors. All rights reserved.
9 SPDX-License-Identifier: CC-BY-SA-3.0
12 The @crunchstat-summary@ tool can be used to analyze workflow and container performance. It can be installed from packages (@apt install python3-crunchstat-summary@ or @yum install rh-python36-python-crunchstat-summary@). @crunchstat-summary@ analyzes the crunchstat lines from the logs of a container or workflow and generates a report in text or html format.
16 The @crunchstat-summary@ tool has a number of command line arguments:
19 <pre><code>~$ <span class="userinput">crunchstat-summary -h</span>
20 usage: crunchstat-summary [-h]
21 [--job UUID | --container UUID | --pipeline-instance UUID | --log-file LOG_FILE]
22 [--skip-child-jobs] [--format {html,text}]
23 [--threads THREADS] [--verbose]
25 Summarize resource usage of an Arvados Crunch job
28 -h, --help show this help message and exit
29 --job UUID, --container-request UUID
30 Look up the specified job or container request and
31 read its log data from Keep (or from the Arvados event
32 log, if the job is still running)
33 --container UUID [Deprecated] Look up the specified container find its
34 container request and read its log data from Keep (or
35 from the Arvados event log, if the job is still
37 --pipeline-instance UUID
38 [Deprecated] Summarize each component of the given
39 pipeline instance (historical pre-1.4)
40 --log-file LOG_FILE Read log data from a regular file
41 --skip-child-jobs Do not include stats from child jobs/containers
42 --format {html,text} Report format
43 --threads THREADS Maximum worker threads to run
44 --verbose, -v Log more information (once for progress, twice for
49 h2(#examples). Examples
51 @crunchstat-summary@ prints to stdout. The html report, in particular, should be redirected to a file and then loaded in a browser.
53 An example text report for a single workflow step:
56 <pre><code>~$ <span class="userinput">crunchstat-summary --container-request pirca-xvhdp-rs0ef250emtmbj8 --format text</span>
57 category metric task_max task_max_rate job_total
58 blkio:0:0 read 63067755822 53687091.20 63067755822
59 blkio:0:0 write 64484253320 16376234.80 64484253320
61 cpu sys 2147.29 0.60 2147.29
62 cpu user 549046.22 15.99 549046.22
63 cpu user+sys 551193.51 16.00 551193.51
64 fuseop:create count 1 0.10 1
65 fuseop:create time 0.01 0.00 0.01
66 fuseop:destroy count 0 0 0
67 fuseop:destroy time 0 0 0.00
68 fuseop:flush count 12 0.70 12
69 fuseop:flush time 0.00 0.00 0.00
70 fuseop:forget count 0 0 0
71 fuseop:forget time 0 0 0.00
72 fuseop:getattr count 40 2.70 40
73 fuseop:getattr time 0.00 0.00 0.00
74 fuseop:lookup count 36 2.90 36
75 fuseop:lookup time 0.67 0.07 0.67
76 fuseop:mkdir count 0 0 0
77 fuseop:mkdir time 0 0 0.00
78 fuseop:on_event count 0 0 0
79 fuseop:on_event time 0 0 0.00
80 fuseop:open count 9 0.30 9
81 fuseop:open time 0.00 0.00 0.00
82 fuseop:opendir count 0 0 0
83 fuseop:opendir time 0 0 0.00
84 fuseop:read count 481185 409.60 481185
85 fuseop:read time 370.11 2.14 370.11
86 fuseop:readdir count 0 0 0
87 fuseop:readdir time 0 0 0.00
88 fuseop:release count 7 0.30 7
89 fuseop:release time 0.00 0.00 0.00
90 fuseop:rename count 0 0 0
91 fuseop:rename time 0 0 0.00
92 fuseop:rmdir count 0 0 0
93 fuseop:rmdir time 0 0 0.00
94 fuseop:setattr count 0 0 0
95 fuseop:setattr time 0 0 0.00
96 fuseop:statfs count 0 0 0
97 fuseop:statfs time 0 0 0.00
98 fuseop:unlink count 0 0 0
99 fuseop:unlink time 0 0 0.00
100 fuseop:write count 5414406 1123.00 5414406
101 fuseop:write time 475.04 0.11 475.04
102 fuseops read 481185 409.60 481185
103 fuseops write 5414406 1123.00 5414406
104 keepcache hit 961402 819.20 961402
105 keepcache miss 946 0.90 946
106 keepcalls get 962348 820.00 962348
107 keepcalls put 961 0.30 961
108 mem cache 22748987392 - -
110 mem rss 27185491968 - -
113 net:docker0 tx+rx 0 - 0
114 net:ens5 rx 1100398604 - 1100398604
115 net:ens5 tx 1445464 - 1445464
116 net:ens5 tx+rx 1101844068 - 1101844068
117 net:keep0 rx 63086467386 53687091.20 63086467386
118 net:keep0 tx 64482237590 20131128.60 64482237590
119 net:keep0 tx+rx 127568704976 53687091.20 127568704976
120 statfs available 398721179648 - 398721179648
121 statfs total 400289181696 - 400289181696
122 statfs used 1568198656 0 1568002048
123 time elapsed 34820 - 34820
125 # Max CPU time spent by a single task: 551193.51s
126 # Max CPU usage in a single interval: 1599.52%
127 # Overall CPU usage: 1582.98%
128 # Max memory used by a single task: 27.19GB
129 # Max network traffic in a single task: 127.57GB
130 # Max network speed in a single interval: 53.69MB/s
131 # Keep cache miss rate 0.10%
132 # Keep cache utilization 99.97%
133 # Temp disk utilization 0.39%
134 #!! bwamem-samtools-view max RSS was 25927 MiB -- try reducing runtime_constraints to "ram":27541477785
135 #!! bwamem-samtools-view max temp disk utilization was 0% of 381746 MiB -- consider reducing "tmpdirMin" and/or "outdirMin"
139 When @crunchstat-summary@ is given a container or container request uuid for a toplevel workflow runner container, it will generate a report for the whole workflow. If the workflow is big, it can take a long time to generate the report.
141 The equivalent html report can be generated as follows:
144 <pre><code>~$ <span class="userinput">crunchstat-summary --container-request pirca-xvhdp-rs0ef250emtmbj8 --format html > report.html</span>
148 When loaded in a browser:
150 !(full-width)images/crunchstat-summary-html.png!