4 title: Analyzing workflow performance
7 Copyright (C) The Arvados Authors. All rights reserved.
9 SPDX-License-Identifier: CC-BY-SA-3.0
12 {% include 'tutorial_expectations' %}
14 *Note:* Starting from Arvados 2.7.2, these reports are generated automatically by @arvados-cwl-runner@ and can be found as @usage_report.html@ in a container request's log collection.
16 The @crunchstat-summary@ tool can be used to analyze workflow and container performance. It can be installed from packages (@apt install python3-crunchstat-summary@ or @yum install rh-python36-python-crunchstat-summary@), or in a Python virtualenv (@pip install crunchstat_summary@). @crunchstat-summary@ analyzes the crunchstat lines from the logs of a container or workflow and generates a report in text or html format.
20 The @crunchstat-summary@ tool has a number of command line arguments:
23 <pre><code>~$ <span class="userinput">crunchstat-summary -h</span>
24 usage: crunchstat-summary [-h]
25 [--job UUID | --container UUID | --pipeline-instance UUID | --log-file LOG_FILE]
26 [--skip-child-jobs] [--format {html,text}]
27 [--threads THREADS] [--verbose]
29 Summarize resource usage of an Arvados Crunch job
32 -h, --help show this help message and exit
33 --job UUID, --container-request UUID
34 Look up the specified job or container request and
35 read its log data from Keep (or from the Arvados event
36 log, if the job is still running)
37 --container UUID [Deprecated] Look up the specified container find its
38 container request and read its log data from Keep (or
39 from the Arvados event log, if the job is still
41 --pipeline-instance UUID
42 [Deprecated] Summarize each component of the given
43 pipeline instance (historical pre-1.4)
44 --log-file LOG_FILE Read log data from a regular file
45 --skip-child-jobs Do not include stats from child jobs/containers
46 --format {html,text} Report format
47 --threads THREADS Maximum worker threads to run
48 --verbose, -v Log more information (once for progress, twice for
53 When @crunchstat-summary@ is given a container or container request uuid for a toplevel workflow runner container, it will generate a report for the whole workflow. If the workflow is big, it can take a long time to generate the report.
55 h2(#examples). Examples
57 @crunchstat-summary@ prints to stdout. The html report, in particular, should be redirected to a file and then loaded in a browser.
59 The html report can be generated as follows:
62 <pre><code>~$ <span class="userinput">crunchstat-summary --container-request pirca-xvhdp-rs0ef250emtmbj8 --format html > report.html</span>
66 When loaded in a browser:
68 !(full-width)images/crunchstat-summary-html.png!
72 Using @--format text@ will print detailed usage and summary:
75 <pre><code>~$ <span class="userinput">crunchstat-summary --container-request pirca-xvhdp-rs0ef250emtmbj8 --format text</span>
76 category metric task_max task_max_rate job_total
77 blkio:0:0 read 63067755822 53687091.20 63067755822
78 blkio:0:0 write 64484253320 16376234.80 64484253320
80 cpu sys 2147.29 0.60 2147.29
81 cpu user 549046.22 15.99 549046.22
82 cpu user+sys 551193.51 16.00 551193.51
83 fuseop:create count 1 0.10 1
84 fuseop:create time 0.01 0.00 0.01
85 fuseop:destroy count 0 0 0
86 fuseop:destroy time 0 0 0.00
87 fuseop:flush count 12 0.70 12
88 fuseop:flush time 0.00 0.00 0.00
89 fuseop:forget count 0 0 0
90 fuseop:forget time 0 0 0.00
91 fuseop:getattr count 40 2.70 40
92 fuseop:getattr time 0.00 0.00 0.00
93 fuseop:lookup count 36 2.90 36
94 fuseop:lookup time 0.67 0.07 0.67
95 fuseop:mkdir count 0 0 0
96 fuseop:mkdir time 0 0 0.00
97 fuseop:on_event count 0 0 0
98 fuseop:on_event time 0 0 0.00
99 fuseop:open count 9 0.30 9
100 fuseop:open time 0.00 0.00 0.00
101 fuseop:opendir count 0 0 0
102 fuseop:opendir time 0 0 0.00
103 fuseop:read count 481185 409.60 481185
104 fuseop:read time 370.11 2.14 370.11
105 fuseop:readdir count 0 0 0
106 fuseop:readdir time 0 0 0.00
107 fuseop:release count 7 0.30 7
108 fuseop:release time 0.00 0.00 0.00
109 fuseop:rename count 0 0 0
110 fuseop:rename time 0 0 0.00
111 fuseop:rmdir count 0 0 0
112 fuseop:rmdir time 0 0 0.00
113 fuseop:setattr count 0 0 0
114 fuseop:setattr time 0 0 0.00
115 fuseop:statfs count 0 0 0
116 fuseop:statfs time 0 0 0.00
117 fuseop:unlink count 0 0 0
118 fuseop:unlink time 0 0 0.00
119 fuseop:write count 5414406 1123.00 5414406
120 fuseop:write time 475.04 0.11 475.04
121 fuseops read 481185 409.60 481185
122 fuseops write 5414406 1123.00 5414406
123 keepcache hit 961402 819.20 961402
124 keepcache miss 946 0.90 946
125 keepcalls get 962348 820.00 962348
126 keepcalls put 961 0.30 961
127 mem cache 22748987392 - -
129 mem rss 27185491968 - -
132 net:docker0 tx+rx 0 - 0
133 net:ens5 rx 1100398604 - 1100398604
134 net:ens5 tx 1445464 - 1445464
135 net:ens5 tx+rx 1101844068 - 1101844068
136 net:keep0 rx 63086467386 53687091.20 63086467386
137 net:keep0 tx 64482237590 20131128.60 64482237590
138 net:keep0 tx+rx 127568704976 53687091.20 127568704976
139 statfs available 398721179648 - 398721179648
140 statfs total 400289181696 - 400289181696
141 statfs used 1568198656 0 1568002048
142 time elapsed 34820 - 34820
143 # Elapsed time: 9h 40m 20s
144 # Assigned instance type: m5.4xlarge
145 # Instance hourly price: $0.768
146 # Max CPU usage in a single interval: 1599.52%
147 # Overall CPU usage: 1582.98%
148 # Requested CPU cores: 16
150 # Max memory used: 25926.11MB
151 # Requested RAM: 50000.00MB
152 # Maximum RAM request for this instance type: 61736.70MB
153 # Max network traffic: 127.57GB
154 # Max network speed in a single interval: 53.69MB/s
155 # Keep cache miss rate: 0.10%
156 # Keep cache utilization: 99.97%
157 # Temp disk utilization: 0.39%