[mythtv-users] MySQL related BE deadlocks - collective wisdom needed

Warpme warpme at o2.pl
Wed Sep 21 07:10:53 UTC 2011


On 8/31/11 7:33 AM, Warpme wrote:
> On 8/24/11 1:46 PM, Warpme wrote:
>> On 8/5/11 9:44 AM, Warpme wrote:
>>> On 8/4/11 10:39 PM, Michael T. Dean wrote:
>>>> Thought I'd mention that Daniel K just pushed a change that reworks
>>>> reconnections.  It removes the MySQL-library provided auto-reconnect
>>>> that I just put in (for the exact same reasons MySQL changed their API
>>>> to disable auto-reconnect--because it can't work with certain
>>>> application designs, and MythTV has changed since 0.21 to a design 
>>>> where
>>>> auto-reconnect cannot work).  So there's no need for further testing
>>>> that patch.
>>>>
>>>> We have high hopes that this will work around the frequent MySQL
>>>> connection drop-out issues some users were seeing.  And I'd still be
>>>> very interested to hear details if anyone can identify the specific
>>>> MySQL server/client library configuration that's causing these 
>>>> frequent
>>>> drop outs.
>>>>
>>>> If you're running master, please pull an update (to 4dfcdb8dd0c80 or
>>>> later), and please report back if it helps.  For those of you on
>>>> 0.24-fixes, I will try to create a backport patch and post it here for
>>>> testing later today.  We plan to let this sit in unstable for a bit of
>>>> testing and to prove it actually works before pushing it to -fixes, 
>>>> though.
>>>>
>>>> Mike
>>>>
>>>> _______________________________________________
>>>> mythtv-users mailing list
>>>> mythtv-users at mythtv.org
>>>> http://www.mythtv.org/mailman/listinfo/mythtv-users
>>>>
>>> Daniel, Mike,
>>>
>>> Milion thx You are working on this issue !
>>> At some point of time I was almost sure issue is my specific and I 
>>> can't expect You will loose time on this particular one.
>>>
>>> Browsing Internet I can't find any bug reports describing issue like 
>>> mine. Taking into account mysql popularity - it seems like mine 
>>> problem root cause isn't within mysqlclient lib itself but rather in 
>>> environment/app which uses this lib.
>>>
>>> Last days I upgrade switch back to MyISAM+upgrade to 5.5.15. With 
>>> this change - I still have at least deadlock per day (with famous 
>>> stack trace).
>>> So by this, 2 days ago I decided to do in one steep full 
>>> reinstall+upgrade ALL system packages. Now I'm on tests phase.
>>> As I'm trying to find root cause - I want to test one change at 
>>> time, so I will apply Daniel's commits as soon as I will have first 
>>> deadlock.
>>>
>>> BTW: What I observe: this lib is reentrant and I see, during hang, 
>>> other treads successfully accessing DB. It looks like lib hangs are 
>>> only in context of given thread.
>>> Looking for hangs in vio_net I found interesting thread on mysql 
>>> forums: http://bugs.mysql.com/bug.php?20110807-gda10d33id=33384
>>> It is related to lib crash not hang but with quite similar stack 
>>> trace - so maybe it is somehow correlated ?
>>>
>>> br 
>> Daniel, Mike,
>>
>> Sorry for long silence - I was on holidays :-)
>>
>> 3 weeks ago I do full system upgrade.
>> I have configured 33 testing rec.rules + avg 10-15 user rec.rules. 
>> This gives 45-50 rec.per day.
>>
>> Running 20110803-ge41e314 (upgraded OS but no recent Daniel's mysql 
>> enhancements) for 5 days I observed following:
>> -so far no 9773 (myth_proto) type deadlocks
>> -I had one 9792-like (scheduller) deadlock (but with little different 
>> symptoms)
>> -trace of above deadlock don't have any references to 
>> /usr/lib/libmysqlclient_r.so.16
>>
>> Next I upgrade 20110807-gda10d33 (with 4dfcdb8: Fix SQL reconnection 
>> logic. Refs #9704. Refs #9773. Refs #9792;).
>> Running it for 5 day tests:
>> -no 9773 & 9792 deadlocks
>> -in those 5 days period I had reported 1 DB successful reconnect.
>>
>> Next I upgrade to 20110812-g50606cd. This build is running 
>> continuously since 08/12.
>> -no any deadlocks
>> -no any DB reconnects in BE log.
>>
>> My conclusions:
>>
>> 1. It looks like deadlocks with entries referencing to 
>> /usr/lib/libmysqlclient_r.so.16 are more likely results of specific 
>> OS components combination, as after change OS components I wasn't 
>> able to catch them in 10days. Before upgrade I had them avg. 1 per day.
>> If I read traces correctly - directly involved components on stack 
>> trace are: kernel,libpthread,mysql,Qt & myth.
>> OS upgrade from 08/03 changed: kernel,libpthread(ad part of glibc) & 
>> myth.
>> I think best potential candidate as root cause is glibc. Upgrade was 
>> from 2.13 to 2.14.
>> Second candidate is kernel. Upgrade was from 2.6.38.7 to 2.6.39.3.
>> I plan eventually to do tests with reverted kernel - but I'm not sure 
>> will I found right time to do this as I'm experimenting on production 
>> system.
>>
>> 2. It looks like 4dfcdb8: (Fix SQL reconnection logic) solves 
>> deadlocks like in #9773 & #9792, but reports about successful DB 
>> reconnection tells that there is still place for improvement.
>>
>> 3. 20110812-g50606cd so far works great for me (10d uptime, 500+ rec. 
>> - so far no deadlocks nor DB reconnects).
>> I have plan extend code freeze on this git pull to another 10d and 
>> see how it goes.
>> Currently only DB related issues I have with 20110812-g50606cd are 
>> reflected in following log entries (avg. 1 per day):
>>
>> 2011-08-22 11:07:27.198329 E DB Error (change_program):
>> Query was:
>> UPDATE program SET starttime = ?, endtime = ? WHERE chanid = ? AND 
>> starttime = ?
>> Bindings were:
>> :CHANID=12808, :NEWEND=2011-08-24T03:40:00, 
>> :NEWSTART=2011-08-23T17:25:00,
>> :OLDSTART=2011-08-23T16:35:00
>> Driver error was [2/1062]:
>> QMYSQL3: Unable to execute statement
>> Database error was:
>> Duplicate entry '12808-2011-08-23 17:25:00-0' for key 'PRIMARY'
>> 2011-08-22 11:07:27.239906 E DB Error (change_program):
>> Query was:
>> UPDATE program SET starttime = ?, endtime = ? WHERE chanid = ? AND 
>> starttime = ?
>> Bindings were:
>> :CHANID=9305, :NEWEND=2011-08-24T03:40:00, 
>> :NEWSTART=2011-08-23T17:25:00,
>> :OLDSTART=2011-08-23T16:35:00
>> Driver error was [2/1062]:
>> QMYSQL3: Unable to execute statement
>> Database error was:
>> Duplicate entry '9305-2011-08-23 17:25:00-0' for key 'PRIMARY'
>>
>> I want to say recent Daniel's mysql related commits are GREAT for 
>> making myth production/server grade software.
>>
>> Daniel, really BIG thank You for impressive rightness in problem 
>> diagnosis and v.quick & 100% effective solution.
>> This is REALLY impressive, especially taking into account that system 
>> in question is remote install without direct access and it has many 
>> 3rd party components.
>> This is really incredible !.
>>
>> br
>>
>>
>>
>>
>>
>> _______________________________________________
>> mythtv-users mailing list
>> mythtv-users at mythtv.org
>> http://www.mythtv.org/mailman/listinfo/mythtv-users
> Hi,
> Small update:
> 2 days ago I update BE to v0.25pre-3260-ga15740a-dirty-20110829 and 
> enable active EIT scan.
> Unfortunately today I had deadlocked BE. Details are in:
> http://code.mythtv.org/trac/ticket/9704#comment:48
> This time however there is no any symptoms of client mysqllib hang in 
> traces nor any mysql reconnect reports in be logs.
> I hope it means mysql issue is resolved and this deadlock is results 
> of easy to correct thread locking issue.
> -br
>
>
> _______________________________________________
> mythtv-users mailing list
> mythtv-users at mythtv.org
> http://www.mythtv.org/mailman/listinfo/mythtv-users
Hi,
I think we can close this tread. Last 3 weeks I was testing latest 
master. What I see:
-code seems to be really stable if EIT active scan is no used (1000 rec. 
no single deadlock, no single DB reconnect)
-with active EIT scan, code seems to be stable as long as LiveTV is not used
-if I use LiveTV on tuner other than tuner with enabled active scan - no 
issues, deadlocks, etc.
-if I use LiveTV and active scan on the same tuner - quite frequently 
(once per 5-10 LiveTV starts) I have issue described in 
http://code.mythtv.org/trac/ticket/10016. Once per few days also I have 
deadlock with MYTH_PROTO_RESPONSE empty (trace attached in #10016).
Once again - thx for recent great improvements in code. Now it works 
great as PVR.
Cleaning issue from 10016 will make it really good system.
-br


-------------- next part --------------
A non-text attachment was scrubbed...
Name: warpme.vcf
Type: text/x-vcard
Size: 83 bytes
Desc: not available
Url : http://www.mythtv.org/pipermail/mythtv-users/attachments/20110921/64878c9b/attachment.vcf 


More information about the mythtv-users mailing list