[mythtv-users] How big is your database?
Michael T. Dean
mtdean at thirdcontact.com
Fri Dec 2 14:06:19 EST 2005
Ant Daniel wrote:
>On 01/12/05, chris at cpr.homelinux.net <chris at cpr.homelinux.net> wrote:
>>No doubt most of that is taken up by the "previously recorded" data.
>>That could probably be reduced to just the series # and epsiode # if it
>>wasn't for the "forget old" feature. The other thing that a good
>>clean-up would do is check the relationships so that there aren't
>>orphans in large tables like the markup table. Another nice thing to do
>>would be to have an ability to remove old frontend data from the system.
>Once again us poor non-US not considered ;)
On the bright side, I don't think this idea would be considered for
inclusion--even for US users.
>In the UK our listing data doesn't appear to give a decent series # /
>epsiode #, I know myth's supposed to generate something when not
>provided, but I haven't worked out if that works. (I guess with almost
>2yrs data I should go and have a look).
>However when first reading your post (again we don't tend to use # to
>symbolise number), I read it as series hash, epsiode hash. Now that's
>a good idea.
>Why not hash the episode description rather than store the full length
>text, wouldn't that make a significant impact on previously recorded?
OK, so why shouldn't this be considered?
Total number of records (SELECT COUNT(*) FROM ...):
recordedmarkup: 1746031 (Yes, that is 1.75M records.)
SELECT MIN(starttime) FROM oldrecorded; -> 2004-05-03 03:00:00
SELECT NOW(); -> 2005-12-02 13:07:4
So, my database is approximately 19 months old. In more than a year and
a half, I've got 3787 records in oldrecorded.
SELECT MAX(LENGTH(description)) FROM oldrecorded; -> 251
(and this is with DataDirect, which typically has much longer, more
detailed descriptions than other grabbers).
So, let's assume that every description is this same length. That means
I have 950537 bytes of description after 1 1/2 years.
Let's add in title and subtitle:
SELECT MAX(LENGTH(title)) FROM recorded; -> 39
SELECT MAX(LENGTH(subtitle)) FROM recorded; -> 58
Once again, assume all titles/subtitles are the as long as the longest,
and we have an additional 147693 and 219646 bytes. Adding these up, we
have 1317876 bytes. That's 1.26MiB. OK, so what about the actual
storage requirements (length + 1 for the two VARCHAR types and length +
2 for the TEXT type). If we don't assume that's taken care of by the
fact that most values are smaller than the max current value, we need to
add an additional 15148 bytes, So, we're up to 1.27MiB...
Looking at the actual files:
# ls -al oldrecorded.*
-rw-rw---- 1 mysql mysql 1004848 Dec 2 12:59 oldrecorded.MYD
-rw-rw---- 1 mysql mysql 462848 Dec 2 12:59 oldrecorded.MYI
-rw-rw---- 1 mysql mysql 9040 Nov 2 16:46 oldrecorded.frm
So, including other fields and the index, we have 1.41MiB (and notice
that we've overestimated the table size by assuming the max length).
Now, I could go down the road of comparing sizes when using the length
of episode ID (12) or the length of a hash, but it all comes down
to--even if we save 1.4MiB, have we accomplished anything?
To answer this question, let's look at recordedmarkup. As mentioned
above, I have 189 recordings which give me 1.75 million records in
recordedmarkup. Since recordedmarkup has a relatively simple structure
(only one VARCHAR), we can figure the row size using column type storage
Field Type Storage Required
----- ---- ----------------
chanid int(10) unsigned 4
starttime datetime 8
mark bigint(20) 8
offset varchar(32) l+1 (we'll call it 11)
type int(11) 4
------- ---- ----------------
and we get about 100 of these records per minute of an MPEG-2 recording.
From inside the directory containing the mythconverg binary files:
# du -sh
# ls -al recordedmarkup.*
-rw-rw---- 1 mysql mysql 106883020 Dec 2 13:45 recordedmarkup.MYD
-rw-rw---- 1 mysql mysql 69468160 Dec 2 13:45 recordedmarkup.MYI
-rw-rw---- 1 mysql mysql 8668 Jul 19 2004 recordedmarkup.frm
So, as Gregorio said, most of the storage (168MiB of 192MiB) required is
in the recordedmarkup table. And, there are two ways for you to
minimize the size of your recordedmarkup table:
a) delete your recordings, or
b) transcode all your recordings to NUV's with MPEG-4
Note that recordedmarkup will be small for users with framegrabbers
because they use RTJPEG or MPEG-4 instead of MPEG-2, so they don't have
to work to get a small database. However, IMHO, it's not worth the
effort of transcoding even for MPEG-2 users. I have one transcoded
recording out of 189 total and my 192MiB database is the equivalent of
less than 10 min of video even with the terribly low bitrate (2200kbps
average) I use.
More information about the mythtv-users