Juce plugins cause realtime kernel lockup

For Linux specific issues

Re: Juce plugins cause realtime kernel lockup

Postby edekock » Tue Apr 24, 2012 5:14 pm

I can confirm that Introjucer definitely also crashes the system. My small test App also has no setPriority calls and also crashes.

The fix suggested by falkTX earlier does work though (Thanks!), so if you intend to research it further that would be a good place to start looking. For now I'm just going to go with that suggestion, and will suggest Pianoteq adopt that too.
edekock
JUCE Weenie
 
Posts: 14
Joined: Wed Apr 13, 2011 6:56 am

Re: Juce plugins cause realtime kernel lockup

Postby X-Ryl669 » Tue Apr 24, 2012 5:24 pm

Can you post the crash stack so we can try to guess where it crashes ?
What is your exact kernel version ?
Did you try the suggestion from Jules ?
What's running on the computer at the same time ?
Can you try to run the same software in an "init 1" mode (minimal single user mode, run "init 1", then "Xorg &" then "Introjucer"), to check if it's still crashing?
X-Ryl669
X-Ryl669
JUCE UberWeenie
 
Posts: 1124
Joined: Sun Apr 24, 2005 5:30 pm

Re: Juce plugins cause realtime kernel lockup

Postby edekock » Tue Apr 24, 2012 6:26 pm

$ uname -a
Linux Euan_AMD64 3.0.9-rt25-rt25 #13 SMP PREEMPT RT Sat Mar 10 18:09:58 WST 2012 x86_64 AMD Phenom(tm) II X6 1055T Processor AuthenticAMD GNU/Linux

My machine is running Gentoo Linux.

There is very little running on the computer at the time - just some typical utility apps - dropbox etc. I use XFCE to keep my system as light as possible.

top reports about 600Mb usage out of a total RAM of 12Gb (Firefox is responsible for at least half of that)

Running from init 1 (+ /home manually mounted to enable normal user login) still causes a crash.

I've uploaded a screen photo showing the relevant details of the crash, unfortunately I can't do anything with the computer once it crashes, and even with "ulimit -c" I don't get a core dump.

Let me know if you have any other ideas. Thanks for the effort.
Attachments
Crash1.jpg
Crash Screen
Crash1.jpg (480 KiB) Viewed 700 times
edekock
JUCE Weenie
 
Posts: 14
Joined: Wed Apr 13, 2011 6:56 am

Re: Juce plugins cause realtime kernel lockup

Postby jpo » Tue Apr 24, 2012 7:29 pm

As it is written "kernel bug" I think one can hardly accuse juce of being buggy here. What I notice is that the suggested patch of falkTX reverts the PTHREAD_PRIO_INHERIT attribute on the mutexes for pthread condition variables (of juce::WaitableEvent), but it does not revert it for the "normal" mutexes of juce::CriticalSection . Maybe it is a bug specific to condition condition variables.

Anyway since that patch seems to fixes all issues related to this kernel bug , I would suggest that Jules applies it.

UPDATE: well... I have one user saying that he still has some freezes of the application ui (not the whole OS) so maybe that patch is not the silver bullet
jpo
JUCE UberWeenie
 
Posts: 336
Joined: Thu Mar 20, 2008 2:45 pm

Re: Juce plugins cause realtime kernel lockup

Postby X-Ryl669 » Wed Apr 25, 2012 8:46 am

No, the PI is required. If you remove it, you MUST forbid RT thread (because as soon as a RT thread takes a mutex, the whole computer is dead).
And if you forbid RT threads, the audio code suddenly drops sample depending on the CPU usage (I'm not speaking about video here that's even worst).

A kernel bug is likely not due to the application (well, sort of), so there is nothing you can do on the application. You should debug your kernel instead.
I can only help you debug the kernel. So, if you have the source of your kernel (you likely do), take a look to kernel/rtmutex.c:724.
You'll have a line "BUG_ON(some condition)".
Then search this line on google, it's likely other users have hit that bug, and probably there is already a fix for it.

Also, if you have debug information in your Juce software, use addr2line to find out the file & line source code where the kernel crashed:
# addr2line -e /path/to/your/juce/App 0xffffffff8130F3a8
or
# addr2line -e /path/to/your/juce/App 0xffffffff81494897

Please post all the data here.
X-Ryl669
X-Ryl669
JUCE UberWeenie
 
Posts: 1124
Joined: Sun Apr 24, 2005 5:30 pm

Re: Juce plugins cause realtime kernel lockup

Postby jpo » Wed Apr 25, 2012 9:58 am

Did anyone manage to reproduce this freezing in a virtual machine ? I tried running the linux-rt kernel of ubuntu lts 10.04 in virtualbox and it did not freeze.

(I also tried to set /proc/sys/kernel/sched_rt_runtime_us equal to/proc/sys/kernel/sched_rt_period_us on a regular linux distro, not in a VM, and it did not freeze)
jpo
JUCE UberWeenie
 
Posts: 336
Joined: Thu Mar 20, 2008 2:45 pm

Re: Juce plugins cause realtime kernel lockup

Postby X-Ryl669 » Wed Apr 25, 2012 10:02 am

@jpo, me neither, as I'm using the vanilla Juce code on my rt-kernel and it works ok.
X-Ryl669
X-Ryl669
JUCE UberWeenie
 
Posts: 1124
Joined: Sun Apr 24, 2005 5:30 pm

Re: Juce plugins cause realtime kernel lockup

Postby edekock » Wed Apr 25, 2012 12:37 pm

OK, my progress so far:

I have had a look at the rtmutex.c code.

The affected line (472) is this...

BUG_ON(rt_mutex_owner(lock) == self); in the rt_spin_lock_slowlock funtion.

There was an attempt to report this as a bug a while back, but it has been defended as a valid check. It looks like the app is trying to obtain a spin lock twice from the same function. The trail around this can be found at:
http://lkml.indiana.edu/hypermail/linux/kernel/0706.2/3258.html

However to test, I changed this to a WARN_ON call instead (compiled all, rebooted etc), and now it crashes at line 472 (Coincidentally a transposition of the same numbers - I had to double check that). This is now the result of another BUG_ON statement:
BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on)); in the task_blocks_on_rt_mutex function.

I remain convinced that this is a fault in Juce somewhere, but am at a loss on how to progress.
edekock
JUCE Weenie
 
Posts: 14
Joined: Wed Apr 13, 2011 6:56 am

Re: Juce plugins cause realtime kernel lockup

Postby X-Ryl669 » Wed Apr 25, 2012 3:38 pm

edekock wrote:It looks like the app is trying to obtain a spin lock twice from the same function

Yes it's the case.
This behaviour is forbidden unless the recursivity of the mutex is switched on.
But in the Juce code, you have "pthread_mutexattr_settype (&atts, PTHREAD_MUTEX_RECURSIVE);" (in juce_posix_SharedCode.h)
Usually, people using PI don't use recursivity at the same time, but it's not the case in Juce.

So, this code path is probably not tested that much.
Anyway, have you tried the addr2line call I've written above so we can figure out the position in the Juce code that's causing the issue ?
X-Ryl669
X-Ryl669
JUCE UberWeenie
 
Posts: 1124
Joined: Sun Apr 24, 2005 5:30 pm

Re: Juce plugins cause realtime kernel lockup

Postby jpo » Wed Apr 25, 2012 9:43 pm

I'm being told that after removing both uses of PTHREAD_PRIO_INHERIT in juce_posix_SharedCode (the one in WaitableEventImpl *and* the one in CriticalSection) , everything seems to work.
jpo
JUCE UberWeenie
 
Posts: 336
Joined: Thu Mar 20, 2008 2:45 pm

Re: Juce plugins cause realtime kernel lockup

Postby X-Ryl669 » Thu Apr 26, 2012 8:56 am

jpo wrote: everything seems to work

Well, it depends on what "everything" means. If you're testing the Introjucer that's not using any RT thread, of course, it's going to work.
Can you try with a RT thread that's taking a mutex, something like this:
Code: Select all
CriticalSection mutex;

class RT : public Thread
{
     virtual void runThread()
     {
           setCurrentThreadPriority(10);
           while (!threadShouldExit())
           {
                 for (volatile int i = 0: i < 500000; i++);

                 {
                      ScopedLock scope(mutex);
                      for (volatile int i = 0: i < 1000000; i++);
                 }
           }
     }
};


class LowP : public Thread
{
     virtual void runThread()
     {
           while (!threadShouldExit())
           {
                 {
                      ScopedLock scope(mutex);
                      for (volatile int i = 0: i < 1000000; i++);
                 }
           }
     }
};



Also, can you try to disable recursive mutex (pthread_mutexattr_settype (&atts, PTHREAD_MUTEX_RECURSIVE);) instead of disabling PI ?
In POSIX, the behaviour of using recursive mutex with cond_var is undefined, so I wonder if the bug is due to this (instead of PI).
see: http://en.wikipedia.org/wiki/Reentrant_mutex
From there, it reads:
It is advised that an application should not use a PTHREAD_MUTEX_RECURSIVE mutex with condition variables because the implicit unlock performed for a pthread_cond_timedwait() or pthread_cond_wait() may not actually release the mutex (if it had been locked multiple times). If this happens, no other thread can satisfy the condition of the predicate.

This would be a major flaw that needs fixing in that case.
X-Ryl669
X-Ryl669
JUCE UberWeenie
 
Posts: 1124
Joined: Sun Apr 24, 2005 5:30 pm

Re: Juce plugins cause realtime kernel lockup

Postby jpo » Thu Apr 26, 2012 4:39 pm

No I was not referring to the Introjucer, but a synth with SCHED_RR threads for midi and audio. The thing is , I'm trying to be pragmatic here, without the fix I'm told that the application immediately freezes the computer , with the fix I'm told that it works -- it has also been working for a few years with an older version of JUCE that did not have this priority inheritance attribute. Maybe it is pure luck, or maybe the application just avoids the dangerous situations, I don't know.

I can't reproduce anything on my computers so cannot really try your suggestions ; however I don't think juce uses recursive mutexes for condition variables -- recursive is only used for its criticalsections
jpo
JUCE UberWeenie
 
Posts: 336
Joined: Thu Mar 20, 2008 2:45 pm

Re: Juce plugins cause realtime kernel lockup

Postby CPB » Sun Apr 29, 2012 12:21 am

jpo wrote:Did anyone manage to reproduce this freezing in a virtual machine ? I tried running the linux-rt kernel of ubuntu lts 10.04 in virtualbox and it did not freeze.

Yes, I've managed to reliably get this to lock in Arch RT using the JuceDemo app. In fact, on my VM, it's as easy as just running multiple instances of the JuceDemo simultaneously. 100% of the time, this will lock immediately with the afformentioned rtmutex.c assertion :

Code: Select all
[  683.649543] ------------[ cut here]------------
[  683.649561] kernel BUG at kernel/rtmutex.c:472!
[  683.649574] invalid opcode: 0000 [#1] PREEMPT SMP
[1072] exited with preempt_count 2


I'm admittedly way out of my comfort zone when it comes to RT kernel debugging, but I will take a look at X-Ryl669's suggestions regarding PTHREAD_MUTEX_RECURSIVE and see what I can come up with...
CPB
JUCE Obsessive
 
Posts: 58
Joined: Tue Jan 29, 2008 8:26 pm
Location: Bournemouth, UK

Re: Juce plugins cause realtime kernel lockup

Postby jules » Sun Apr 29, 2012 11:24 am

I'm also way out of my depth when it comes to RT linux, so am eagerly waiting for you guys to come to a consensus on this one!
User avatar
jules
Fearless Leader
 
Posts: 17193
Joined: Mon Sep 06, 2004 9:03 am
Location: London, UK

Re: Juce plugins cause realtime kernel lockup

Postby CPB » Mon Apr 30, 2012 12:19 pm

Nothing to add, but I can confirm what edekock has been saying. The only slight difference being that, even without a kernel patch to change the original assert from a BUG_ON to a WARN_ON, I get both kernel.log errors; first at rtmutex.c:472 and then rtmutex.c:724. Which correspond to the following lines, with a little preceeding code for context:

rtmutex.c:472 = BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on));
Code: Select all
/*
* Task blocks on lock.
*
* Prepare waiter and propagate pi chain
*
* This must be called with lock->wait_lock held.
*/
static int task_blocks_on_rt_mutex(struct rt_mutex *lock,
               struct rt_mutex_waiter *waiter,
               struct task_struct *task,
               int detect_deadlock)
{
   struct task_struct *owner = rt_mutex_owner(lock);
   struct rt_mutex_waiter *top_waiter = waiter;
   unsigned long flags;
   int chain_walk = 0, res;

   raw_spin_lock_irqsave(&task->pi_lock, flags);

   /*
    * In the case of futex requeue PI, this will be a proxy
    * lock. The task will wake unaware that it is enqueueed on
    * this lock. Avoid blocking on two locks and corrupting
    * pi_blocked_on via the PI_WAKEUP_INPROGRESS
    * flag. futex_wait_requeue_pi() sets this when it wakes up
    * before requeue (due to a signal or timeout). Do not enqueue
    * the task if PI_WAKEUP_INPROGRESS is set.
    */
   if (task != current && task->pi_blocked_on == PI_WAKEUP_INPROGRESS) {
      raw_spin_unlock_irqrestore(&task->pi_lock, flags);
      return -EAGAIN;
   }

   BUG_ON(rt_mutex_real_waiter(task->pi_blocked_on));


rtmutex.c:724 = BUG_ON(rt_mutex_owner(lock) == self);
Code: Select all
/*
* Slow path lock function spin_lock style: this variant is very
* careful not to miss any non-lock wakeups.
*
* We store the current state under p->pi_lock in p->saved_state and
* the try_to_wake_up() code handles this accordingly.
*/
static void  noinline __sched rt_spin_lock_slowlock(struct rt_mutex *lock)
{
   struct task_struct *lock_owner, *self = current;
   struct rt_mutex_waiter waiter, *top_waiter;
   int ret;

   rt_mutex_init_waiter(&waiter, true);

   raw_spin_lock(&lock->wait_lock);
   init_lists(lock);

   if (__try_to_take_rt_mutex(lock, self, NULL, STEAL_LATERAL)) {
      raw_spin_unlock(&lock->wait_lock);
      return;
   }

   BUG_ON(rt_mutex_owner(lock) == self);


As suggested, I've also tried changing the CriticalSection mutex to remove the PTHREAD_MUTEX_RECURSIVE attribute. With this gone, the JuceDemo doesn't even get as far as displaying its GUI, which I assume means Juce does currently rely on recursive mutexes. No kernel lock though, which could suggest either that it is an incompatibility with PI aware and recursive mutexes, or (perhaps more likely?) that the offending code no longer gets a chance to run due to an early deadlock.

Any suggestions on how I should proceed? Does anyone know for sure if recursive and priority aware attributes are mutually exclusive? Unless anyone has a better suggestion, perhaps it might be a sensible next step for me to try and replicate this (perhaps forbidden) mutex use in a stand-alone (Juce-less) app.
CPB
JUCE Obsessive
 
Posts: 58
Joined: Tue Jan 29, 2008 8:26 pm
Location: Bournemouth, UK

PreviousNext

Return to Linux

Who is online

Users browsing this forum: No registered users and 1 guest