embedded - How to identify, what is stalling the system in Linux? -



embedded - How to identify, what is stalling the system in Linux? -

i have embedded system, when user i/o operations, scheme stalls. action after long time. scheme quite complex , has many process running. question how can identify making scheme stall - nil literally 5 minutes. after 5 minutes, see outcome. don't know stalling system. inputs on how debug issue. have run top on system. however, doesn't lead issue. see here, jup_render taking 30% of cpu, not plenty stall system. so, not sure whether top useful here or not.

~ # top

top - 12:01:05 21 min, 1 user, load average: 1.49, 1.26, 0.87 tasks: 116 total, 2 running, 114 sleeping, 0 stopped, 0 zombie cpu(s): 44.4%us, 13.9%sy, 0.0%ni, 40.3%id, 0.0%wa, 0.0%hi, 1.4%si, 0.0%st mem: 822572k total, 389640k used, 432932k free, 1980k buffers swap: 0k total, 0k used, 0k free, 227324k cached

pid user pr ni virt res shr s %cpu %mem time+ command 850 root 20 0 309m 32m 16m s 30 4.0 3:10.88 jup_render 870 root 20 0 221m 13m 10m s 27 1.7 2:28.78 jup_render 688 root 20 0 1156m 4092 3688 s 11 0.5 1:25.49 rxserver 9 root 20 0 0 0 0 s 2 0.0 0:06.81 ksoftirqd/1 16 root 20 0 0 0 0 s 1 0.0 0:06.87 ksoftirqd/3 9294 root 20 0 1904 616 508 r 1 0.1 0:00.10 top 812 root 20 0 865m 85m 46m s 1 10.7 1:21.17 lippo_main 13 root 20 0 0 0 0 s 1 0.0 0:06.59 ksoftirqd/2 800 root 20 0 223m 8316 6268 s 1 1.0 0:08.30 rat-cadaemon 3 root 20 0 0 0 0 s 1 0.0 0:05.94 ksoftirqd/0 1456 root 20 0 80060 10m 8208 s 1 1.2 0:04.82 jup_render 1330 root 20 0 202m 10m 8456 s 0 1.3 0:06.08 jup_render 8905 root 20 0 1868 556 424 s 0 0.1 0:02.91 dropbear 1561 root 20 0 80084 10m 8204 s 0 1.2 0:04.92 jup_render 753 root 20 0 61500 7376 6184 s 0 0.9 0:04.06 ale_app 1329 root 20 0 79908 9m 8208 s 0 1.2 0:04.77 jup_render 631 dbus 20 0 3248 1636 676 s 0 0.2 0:13.10 dbus-daemon 1654 root 20 0 80068 10m 8204 s 0 1.2 0:04.82 jup_render 760 root 20 0 116m 15m 12m s 0 1.9 0:10.19 jup_server 8 root 20 0 0 0 0 s 0 0.0 0:00.00 kworker/1:0 2 root 20 0 0 0 0 s 0 0.0 0:00.00 kthreadd 7 root rt 0 0 0 0 s 0 0.0 0:00.00 migration/1 170 root 0 -20 0 0 0 s 0 0.0 0:00.00 kblockd 6 root rt 0 0 0 0 s 0 0.0 0:00.00 migration/0 167 root 20 0 0 0 0 s 0 0.0 0:00.00 sync_supers 281 root 0 -20 0 0 0 s 0 0.0 0:00.00 nfsiod

for embedded scheme has many process running, there can multitude of reasons. may need investigate in perspective.

check code race conditions , deadlock.the kernel might busy looping in status . there can scenario application waiting on select phone call or cpu resource used (this selection of cpu resource usage ruled out based on output of top command shared you) or blocked on read.

if performing blocking i/o operations, process shall wait queue , move execution path(ready queue) after completion of request. is, moved out of scheduler run queue , set special state. shall set run queue if wake sleep or resource waited made available.

immediate step shall seek out 'strace'. shall intercept/record scheme calls called process , signals received process. able show order of events , return/resumption paths of calls. can take closer area of problem.

there other many handy tools can tried based on development environment/setup. key tools below :

'iotop' - shall provide table of current i/o usage processes or threads on scheme monitoring i/o usage info output kernel.

'lttng' - makes tracing of race conditions , interrupt cascades possible. successor ltt. combination of kprobes, tracepoint , perf functionalities.

'ftrace' - linux kernel internal tracer can analyze/debug latency , performance related issues.

if scheme based on ti processor, ccs(trace analyzer) provides capability perform non-intrusive debug , analysis of scheme activity. so, note based on setup, may need utilize relevant tool .

came across few more ideas : magic sysrq key alternative in linux. if driver stuck, command sysrq p can take exact routine causing problem.

profiling of info can tell time beingness spent kernel. there couple of tools readprofile , oprofile. oprofile can enabled configuring config_profiling , config_oprofile. alternative rebuild kernel enabling profiling alternative , reading profile counters using readprofile utility booting profile=2 via command line.

mpstat can give 'the percentage of time cpu or cpus idle during scheme had outstanding disk i/o request' via 'iowait' argument.

linux embedded embedded-linux

Comments

Popular posts from this blog

Delphi change the assembly code of a running process -

json - Hibernate and Jackson (java.lang.IllegalStateException: Cannot call sendError() after the response has been committed) -

C++ 11 "class" keyword -