Juce plugins cause realtime kernel lockup

For Linux specific issues

Re: Juce plugins cause realtime kernel lockup

Postby CPB » Mon Apr 30, 2012 1:03 pm

Actually, I don't think it is a problem with combining recursive and priority inversion attributes for a mutex. PulseAudio uses this configuration ( see http://cgit.freedesktop.org/pulseaudio/ ... ex-posix.c ) and behaves itself under a RT kernel. I really thought we were onto something here too: darn it!

So, ... any other thoughts, ideas, or suggestions from anyone?
CPB
JUCE Obsessive
 
Posts: 58
Joined: Tue Jan 29, 2008 8:26 pm
Location: Bournemouth, UK

Re: Juce plugins cause realtime kernel lockup

Postby X-Ryl669 » Wed May 02, 2012 4:02 pm

The code from PulseAudio doesn't show both usage at the same time. I guess it's the case elsewhere in the code, but I don't want to spend time searching for it.

Now, I've 2 questions:
- Why are recursive mutex required in Juce ?
- What approach is there in PA that's not in Juce (or vice-versa) ?

From the kernel code you've posted, it *seems* that the first bug means that the mutex waiter must be the task with low priority that the task with high priority is blocked upon.
In the second case, it's kind of the same issue.
To me, from a bird view, it seems that a thread is doing some weird operation with the mutex (like maybe changing the priorities while a mutex is locked).

I wonder if you had time to spot the part of the code in Juce that's triggering this issue ?
You can find it with the addr2line tip I wrote few post before. You might then be able to spot the code/pattern in Juce that's causing a deadlock.
X-Ryl669
X-Ryl669
JUCE UberWeenie
 
Posts: 1124
Joined: Sun Apr 24, 2005 5:30 pm

Re: Juce plugins cause realtime kernel lockup

Postby CPB » Wed May 02, 2012 4:29 pm

Took me a little searching too, but PulseAudio does use recursive PI mutexes in thread-mainloop.c .

Bearing in mind that is seems this combination isn't necessarily such a bad thing, I think perhaps (for the time being, at least), and in the interest of not performing too much un-needed Juce surgery, I'd rather leave the mutexes recursive and look elsewhere. With Windows CriticalSection being immutably recursive, I'm not sure having Juce's CriticalSection object perform differently under different systems is such a hot idea. If all else fails, definitely a possibility for investigation, but I think perhaps as a last resort.

I'm afraid my kernel crash stack revealed nothing: not a single address corresponded to any Juce code. I do also have a kernel call trace, but I can't see anything useful here:

Code: Select all
Apr 30 11:22:26 (none) kernel: [ 1440.549114] Call Trace:
Apr 30 11:22:26 (none) kernel: [ 1440.549130]  [<c01767e1>] ? lookup_pi_state+0x171/0x230
Apr 30 11:22:26 (none) kernel: [ 1440.549152]  [<c01799e6>] rt_mutex_start_proxy_lock+0x46/0xa0
Apr 30 11:22:26 (none) kernel: [ 1440.549171]  [<c017782a>] futex_requeue+0x44a/0x760
Apr 30 11:22:26 (none) kernel: [ 1440.549192]  [<c0178728>] do_futex+0x88/0x8e0
Apr 30 11:22:26 (none) kernel: [ 1440.549212]  [<c0179049>] sys_futex+0xc9/0x130
Apr 30 11:22:26 (none) kernel: [ 1440.549256]  [<c02184d9>] ? sys_read+0x59/0x70
Apr 30 11:22:26 (none) kernel: [ 1440.549299]  [<c0474100>] syscall_call+0x7/0xb


But I think your suggestion is sensible: I'll approach this from the perspective of the Juce code and see if I can, if possible, narrow it down to a specific set of Juce code or classes, or try and get a callstack from the Juce app. Changing priorities on a locked mutex? Hmm, you could be right: that's certainly something else for me to look into.

I'll keep this updated when I've delved a little deeper...
CPB
JUCE Obsessive
 
Posts: 58
Joined: Tue Jan 29, 2008 8:26 pm
Location: Bournemouth, UK

Re: Juce plugins cause realtime kernel lockup

Postby CPB » Fri May 11, 2012 6:58 pm

I hope I'm not being premature or doing anything silly, but I suspect the latest rtmutex.c patch fixes this. If anyone else is up for some kernel building this weekend, it would be great to have a second source confirm or deny this.

On my Arch Linux test machine, which I could previously lock up, always and instantly, by simply running several instances of JuceDemo, I built the latest rt kernel (specifically, linux-rt 3.2.16_rt27-1 from https://aur.archlinux.org/packages/li/l ... -rt.tar.gz) , which includes this patch :

Code: Select all
--- lock_rtmutex.c   2012-04-30 11:35:45.000000000 +0100
+++ rtmutex.c   2012-05-11 18:36:17.000000000 +0100
@@ -75,7 +75,8 @@

static int rt_mutex_real_waiter(struct rt_mutex_waiter *waiter)
{
-   return waiter && waiter != PI_WAKEUP_INPROGRESS;
+   return waiter && waiter != PI_WAKEUP_INPROGRESS &&
+      waiter != PI_REQUEUE_INPROGRESS;
}

/*
@@ -1289,7 +1290,7 @@

   debug_rt_mutex_init(lock, name);
}
-EXPORT_SYMBOL_GPL(__rt_mutex_init);
+EXPORT_SYMBOL(__rt_mutex_init);

/**
  * rt_mutex_init_proxy_locked - initialize and lock a rt_mutex on behalf of a
@@ -1353,6 +1354,35 @@
      return 1;
   }

+#ifdef CONFIG_PREEMPT_RT_FULL
+   /*
+    * In PREEMPT_RT there's an added race.
+    * If the task, that we are about to requeue, times out,
+    * it can set the PI_WAKEUP_INPROGRESS. This tells the requeue
+    * to skip this task. But right after the task sets
+    * its pi_blocked_on to PI_WAKEUP_INPROGRESS it can then
+    * block on the spin_lock(&hb->lock), which in RT is an rtmutex.
+    * This will replace the PI_WAKEUP_INPROGRESS with the actual
+    * lock that it blocks on. We *must not* place this task
+    * on this proxy lock in that case.
+    *
+    * To prevent this race, we first take the task's pi_lock
+    * and check if it has updated its pi_blocked_on. If it has,
+    * we assume that it woke up and we return -EAGAIN.
+    * Otherwise, we set the task's pi_blocked_on to
+    * PI_REQUEUE_INPROGRESS, so that if the task is waking up
+    * it will know that we are in the process of requeuing it.
+    */
+   raw_spin_lock_irq(&task->pi_lock);
+   if (task->pi_blocked_on) {
+      raw_spin_unlock_irq(&task->pi_lock);
+      raw_spin_unlock(&lock->wait_lock);
+      return -EAGAIN;
+   }
+   task->pi_blocked_on = PI_REQUEUE_INPROGRESS;
+   raw_spin_unlock_irq(&task->pi_lock);
+#endif
+
   ret = task_blocks_on_rt_mutex(lock, waiter, task, detect_deadlock);

   if (ret && !rt_mutex_owner(lock)) {

This machine has now been running for over an hour, solid as a rock, with JuceDemo instances being intensely tested, and with no sign of any kernel lock-ups. And I've double checked that I've accidentally not changed anything else: I'm using an unpatched Juce branch and have still got PI mutexes enabled. Uname confirms that I'm definitely still running an RT kernel, specifically "Linux (none) 3.2.16-rt27-1-rt #1 SMP PREEMPT RT".

Unless I'm missing something obvious, or kernel changes have made an easily hit race condition more obscure, I'd say that this was a kernel bug all along.
CPB
JUCE Obsessive
 
Posts: 58
Joined: Tue Jan 29, 2008 8:26 pm
Location: Bournemouth, UK

Re: Juce plugins cause realtime kernel lockup

Postby edekock » Fri May 18, 2012 7:10 pm

I have upgraded my RT kernel based on the comments above from CPB, and can confirm that I too do not see the kernel crashes anymore. Looks like we've got past this problem.

FYI, I upgraded to 3.2.17:
AMD64 3.2.17-rt28 #1 SMP PREEMPT RT Sat May 19 00:40:04 WST 2012 x86_64 AMD Phenom(tm) II X6 1055T Processor AuthenticAMD GNU/Linux

Thanks to everyone for looking at this!
edekock
JUCE Weenie
 
Posts: 14
Joined: Wed Apr 13, 2011 6:56 am

Re: Juce plugins cause realtime kernel lockup

Postby CPB » Sat May 19, 2012 7:58 pm

Great, thanks for the update! I've had another tester confirm that they too no longer experience this freeze since they installed the new RT kernel. Three independent confirmations gives me confidence that this is, indeed, fixed.
CPB
JUCE Obsessive
 
Posts: 58
Joined: Tue Jan 29, 2008 8:26 pm
Location: Bournemouth, UK

Re: Juce plugins cause realtime kernel lockup

Postby X-Ryl669 » Fri Jun 01, 2012 2:54 pm

Great. I feared you'd made removing the PI feature which I absolutely needed.
X-Ryl669
X-Ryl669
JUCE UberWeenie
 
Posts: 1124
Joined: Sun Apr 24, 2005 5:30 pm

Re: Juce plugins cause realtime kernel lockup

Postby dawhead » Wed Nov 14, 2012 7:46 pm

for the record, there is a lot of serious misinformation in this thread, and for posterity's sake, i'd like to correct it.

1) the bug here was a kernel bug, not anything in JUCE or specific plugins. This became clear by the end of the thread, but that should have been clear from the outset: user space code CANNOT cause kernel crashes without the presence of a kernel bug. This is just a matter of definition.

2) it is absolutely NOT true that it is necessary to enable priority inheritance on pthread mutexes in order to avoid deadlocks. code that would deadlock in this fashion without priority inheritance is almost always incorrectly written. there is a tool named "lockdep" which can be used to analyse locking patterns and will identify lock dependencies like the one described by XRyl669. when they are discovered, they should be fixed. if you are writing code and find yourself considering lock dependencies like this, go back to the drawing board, because something has gone wrong.

3) recursive mutexes are generally considered extremely poor design. if you don't believe me, start here: http://zaval.org/resources/library/butenhof1.html TL;DR: "Recursive mutexes can be a great tool for prototyping thread support in an existing library, exactly because it lets you defer the hard part: the call path and data dependency analysis of the library. But for that same reason, always remember that you're not DONE until they're all gone, so you can produce a library you're proud of, that won't unnecessarily contrain the concurrency of the entire application. "

4) taking locks in a realtime thread is nearly always a sign of the wrong design. from a quick scan of the JUCE code in question, it is not clear why JUCE is not using a ringbuffer (a.ka. FIFO) which is lockfree for single-reader/single-writer use, rather than a mutex. this may be too superficial of a scan of the code, so i apologize if i've missed something.

btw, huge fan of JUCE. wish we had used it for Ardour from the start (which would have benefitted both Ardour and JUCE, I think)
dawhead
JUCE Weenie
 
Posts: 1
Joined: Wed Nov 14, 2012 5:06 pm

Re: Juce plugins cause realtime kernel lockup

Postby grebneke » Thu Nov 15, 2012 9:33 am

Thanks @dawhead for your clarifications. It is a really interesting thread to read, but mostly barking up the wrong tree.

dawhead wrote:1) the bug here was a kernel bug, not anything in JUCE or specific plugins. This became clear by the end of the thread, but that should have been clear from the outset: user space code CANNOT cause kernel crashes without the presence of a kernel bug. This is just a matter of definition.

This is crucial and important. If you in any way manage to crash the kernel from userland, the problem exists in the kernel and that is where you should debug. Never blame user applications for causing kernel problems.

dawhead wrote:4) taking locks in a realtime thread is nearly always a sign of the wrong design. from a quick scan of the JUCE code in question, it is not clear why JUCE is not using a ringbuffer (a.ka. FIFO) which is lockfree for single-reader/single-writer use, rather than a mutex. this may be too superficial of a scan of the code, so i apologize if i've missed something.

Very interested in a follow up on this from Jules and others.

dawhead wrote:btw, huge fan of JUCE. wish we had used it for Ardour from the start (which would have benefitted both Ardour and JUCE, I think)

Never Too Late To Do The Right Thing? :wink:
-- Johan Ekenberg
grebneke
JUCE Obsessive
 
Posts: 99
Joined: Sun Sep 12, 2010 6:24 pm
Location: Göteborg, Sweden

Re: Juce plugins cause realtime kernel lockup

Postby jules » Thu Nov 15, 2012 11:13 am

grebneke wrote:
dawhead wrote:4) taking locks in a realtime thread is nearly always a sign of the wrong design. from a quick scan of the JUCE code in question, it is not clear why JUCE is not using a ringbuffer (a.ka. FIFO) which is lockfree for single-reader/single-writer use, rather than a mutex. this may be too superficial of a scan of the code, so i apologize if i've missed something.

Very interested in a follow up on this from Jules and others.


I'm not actually sure which locks you're talking about there..?

It is of course very good advice in general. But there are places where I'd say it's ok - e.g. I've used a lock in the device manager, to lock access to the the source that's currently being played. In that case, other threads will only grab the lock for a few microseconds while they change a pointer, and they'll only do this very rarely. Under those conditions, the chances of the realtime thread actually getting interrupted are so infinitesimal that it just wouldn't justify a more complicated solution. (And in the case of the device manager, even if there was a tiny glitch when playback starts or stops, nobody would hear it). But if you find places where you do think I've mis-used a lock, do let me know!
User avatar
jules
Fearless Leader
 
Posts: 17225
Joined: Mon Sep 06, 2004 9:03 am
Location: London, UK

Re: Juce plugins cause realtime kernel lockup

Postby X-Ryl669 » Mon Nov 19, 2012 5:17 pm

dawhead wrote:2) it is absolutely NOT true that it is necessary to enable priority inheritance on pthread mutexes in order to avoid deadlocks. code that would deadlock in this fashion without priority inheritance is almost always incorrectly written. there is a tool named "lockdep" which can be used to analyse locking patterns and will identify lock dependencies like the one described by XRyl669. when they are discovered, they should be fixed. if you are writing code and find yourself considering lock dependencies like this, go back to the drawing board, because something has gone wrong.

Well, read this:
http://en.wikipedia.org/wiki/Priority_inversion
and why PI solves it (with the usual no-warranty sign):
http://en.wikipedia.org/wiki/Priority_inheritance

You can't fix other software/library code so you have to deal with it.
X-Ryl669
X-Ryl669
JUCE UberWeenie
 
Posts: 1124
Joined: Sun Apr 24, 2005 5:30 pm

Previous

Return to Linux

Who is online

Users browsing this forum: No registered users and 1 guest