4 title: Analyzing workflow performance
7 Copyright (C) The Arvados Authors. All rights reserved.
9 SPDX-License-Identifier: CC-BY-SA-3.0
12 {% include 'tutorial_expectations' %}
14 The @crunchstat-summary@ tool can be used to analyze workflow and container performance. It can be installed from packages (@apt install python3-crunchstat-summary@ or @yum install rh-python36-python-crunchstat-summary@), or in a Python virtualenv (@pip install crunchstat_summary@). @crunchstat-summary@ analyzes the crunchstat lines from the logs of a container or workflow and generates a report in text or html format.
18 The @crunchstat-summary@ tool has a number of command line arguments:
21 <pre><code>~$ <span class="userinput">crunchstat-summary -h</span>
22 usage: crunchstat-summary [-h]
23 [--job UUID | --container UUID | --pipeline-instance UUID | --log-file LOG_FILE]
24 [--skip-child-jobs] [--format {html,text}]
25 [--threads THREADS] [--verbose]
27 Summarize resource usage of an Arvados Crunch job
30 -h, --help show this help message and exit
31 --job UUID, --container-request UUID
32 Look up the specified job or container request and
33 read its log data from Keep (or from the Arvados event
34 log, if the job is still running)
35 --container UUID [Deprecated] Look up the specified container find its
36 container request and read its log data from Keep (or
37 from the Arvados event log, if the job is still
39 --pipeline-instance UUID
40 [Deprecated] Summarize each component of the given
41 pipeline instance (historical pre-1.4)
42 --log-file LOG_FILE Read log data from a regular file
43 --skip-child-jobs Do not include stats from child jobs/containers
44 --format {html,text} Report format
45 --threads THREADS Maximum worker threads to run
46 --verbose, -v Log more information (once for progress, twice for
51 h2(#examples). Examples
53 @crunchstat-summary@ prints to stdout. The html report, in particular, should be redirected to a file and then loaded in a browser.
55 An example text report for a single workflow step:
58 <pre><code>~$ <span class="userinput">crunchstat-summary --container-request pirca-xvhdp-rs0ef250emtmbj8 --format text</span>
59 category metric task_max task_max_rate job_total
60 blkio:0:0 read 63067755822 53687091.20 63067755822
61 blkio:0:0 write 64484253320 16376234.80 64484253320
63 cpu sys 2147.29 0.60 2147.29
64 cpu user 549046.22 15.99 549046.22
65 cpu user+sys 551193.51 16.00 551193.51
66 fuseop:create count 1 0.10 1
67 fuseop:create time 0.01 0.00 0.01
68 fuseop:destroy count 0 0 0
69 fuseop:destroy time 0 0 0.00
70 fuseop:flush count 12 0.70 12
71 fuseop:flush time 0.00 0.00 0.00
72 fuseop:forget count 0 0 0
73 fuseop:forget time 0 0 0.00
74 fuseop:getattr count 40 2.70 40
75 fuseop:getattr time 0.00 0.00 0.00
76 fuseop:lookup count 36 2.90 36
77 fuseop:lookup time 0.67 0.07 0.67
78 fuseop:mkdir count 0 0 0
79 fuseop:mkdir time 0 0 0.00
80 fuseop:on_event count 0 0 0
81 fuseop:on_event time 0 0 0.00
82 fuseop:open count 9 0.30 9
83 fuseop:open time 0.00 0.00 0.00
84 fuseop:opendir count 0 0 0
85 fuseop:opendir time 0 0 0.00
86 fuseop:read count 481185 409.60 481185
87 fuseop:read time 370.11 2.14 370.11
88 fuseop:readdir count 0 0 0
89 fuseop:readdir time 0 0 0.00
90 fuseop:release count 7 0.30 7
91 fuseop:release time 0.00 0.00 0.00
92 fuseop:rename count 0 0 0
93 fuseop:rename time 0 0 0.00
94 fuseop:rmdir count 0 0 0
95 fuseop:rmdir time 0 0 0.00
96 fuseop:setattr count 0 0 0
97 fuseop:setattr time 0 0 0.00
98 fuseop:statfs count 0 0 0
99 fuseop:statfs time 0 0 0.00
100 fuseop:unlink count 0 0 0
101 fuseop:unlink time 0 0 0.00
102 fuseop:write count 5414406 1123.00 5414406
103 fuseop:write time 475.04 0.11 475.04
104 fuseops read 481185 409.60 481185
105 fuseops write 5414406 1123.00 5414406
106 keepcache hit 961402 819.20 961402
107 keepcache miss 946 0.90 946
108 keepcalls get 962348 820.00 962348
109 keepcalls put 961 0.30 961
110 mem cache 22748987392 - -
112 mem rss 27185491968 - -
115 net:docker0 tx+rx 0 - 0
116 net:ens5 rx 1100398604 - 1100398604
117 net:ens5 tx 1445464 - 1445464
118 net:ens5 tx+rx 1101844068 - 1101844068
119 net:keep0 rx 63086467386 53687091.20 63086467386
120 net:keep0 tx 64482237590 20131128.60 64482237590
121 net:keep0 tx+rx 127568704976 53687091.20 127568704976
122 statfs available 398721179648 - 398721179648
123 statfs total 400289181696 - 400289181696
124 statfs used 1568198656 0 1568002048
125 time elapsed 34820 - 34820
127 # Max CPU time spent by a single task: 551193.51s
128 # Max CPU usage in a single interval: 1599.52%
129 # Overall CPU usage: 1582.98%
130 # Max memory used by a single task: 27.19GB
131 # Max network traffic in a single task: 127.57GB
132 # Max network speed in a single interval: 53.69MB/s
133 # Keep cache miss rate 0.10%
134 # Keep cache utilization 99.97%
135 # Temp disk utilization 0.39%
136 #!! bwamem-samtools-view max RSS was 25927 MiB -- try reducing runtime_constraints to "ram":27541477785
137 #!! bwamem-samtools-view max temp disk utilization was 0% of 381746 MiB -- consider reducing "tmpdirMin" and/or "outdirMin"
141 When @crunchstat-summary@ is given a container or container request uuid for a toplevel workflow runner container, it will generate a report for the whole workflow. If the workflow is big, it can take a long time to generate the report.
143 The equivalent html report can be generated as follows:
146 <pre><code>~$ <span class="userinput">crunchstat-summary --container-request pirca-xvhdp-rs0ef250emtmbj8 --format html > report.html</span>
150 When loaded in a browser:
152 !(full-width)images/crunchstat-summary-html.png!