[mythtv] [PATCH] IMDB search fix for MythVideo

Tim Harvey tharvey at alumni.calpoly.edu
Fri Jan 16 22:07:43 EST 2004


The attached patches fix the currently broken IMDB search in MythVideo's
VideoManager against current CVS.  This broke sometime before 0.13
apparently.

The problem (of course) was that IMDB changed their format again.  More
than that, they started using redirects which my previous patch added
support for, but the patch that used the support never made it into CVS.
No worries, because now that every search redirects I found a bug in the
redirect logic and have fixed it (hence the httpcomms patch for
libmyth).  The patch to videomanager makes use of the redirect
functionality in httpcomms (and I feel cleans up the code a little) plus
handles the changes made to IMDB's information.  My original patch back
a few months ago added some more intelligent ranking to the data,
however I haven't patched that back in because I am thinking we can
perhaps do the queries in a more abstracted way (see below).

I've noticed that the IMDB search has broken twice (because of IMDB
changing their page layouts) in the 3 or 4 months that I've been
monitoring the MythTV project.  I would be happy to try to implement a
better way to abstract grabbing data via unrelated http pages if I can.
Anyone have any good ideas?  My thoughts are:

   - abstract logic into an executable (called out in mythconverg) that
would be passed the name of the file, and would return a list of IMDB
number/movie name pairs (ie. A perl script)
   - abstract logic into a QSA script.  QSA is Trolltech's fairly new
scripting language that I've played with when it was in beta, but I'm
thinking that it requires Qt3.2?
   - other ideas?

If these queries are abstracted into scripts it could be much easier to
make them pluggable to offer different types of searches, or different
data sources, and make them updatable by a broader group of people.  I'm
thinking perl would be nice but my perl is very rusty.  There must be a
nice perl library out there for doing http requests?

Thoughts?

Please let me know if I've submitted the patch incorrectly (or correctly
for that matter).  Its been a while and this would only be my 2nd patch.

Tim
-------------- next part --------------
Index: mythvideo/mythvideo/videomanager.cpp
===================================================================
RCS file: /var/lib/mythcvs/mythvideo/mythvideo/videomanager.cpp,v
retrieving revision 1.23
diff -u -r1.23 videomanager.cpp
--- mythvideo/mythvideo/videomanager.cpp	10 Dec 2003 21:09:23 -0000	1.23
+++ mythvideo/mythvideo/videomanager.cpp	17 Jan 2004 02:53:06 -0000
@@ -26,6 +26,8 @@
 {
     db = ldb;
     updateML = false;
+    debug = 0;
+    isbusy = false;    // ignores keys when true (set when doing http request)
 
     RefreshMovieList();
 
@@ -49,14 +51,7 @@
     listCountMovie = 0;
     dataCountMovie = 0;
 
-    GetMovieListingTimeoutCounter = 0;
-    stopProcessing = false;
-
     m_state = SHOWING_MAINWINDOW;
-    httpGrabber = NULL;
-
-    urlTimer = new QTimer(this);
-    connect(urlTimer, SIGNAL(timeout()), SLOT(GetMovieListingTimeOut()));
 
     theme = new XMLParse();
     theme->SetWMult(wmult);
@@ -100,13 +95,6 @@
 
 VideoManager::~VideoManager(void)
 {
-    if (httpGrabber)
-    {
-        httpGrabber->stop();
-        delete httpGrabber;
-    }
-    delete urlTimer;
-
     delete theme;
     delete bgTransBackup;
 
@@ -120,6 +108,9 @@
     QStringList actions;
     gContext->GetMainWindow()->TranslateKeyPress("Video", e, actions);
 
+    if (isbusy) // ignore keypresses while doing http query 
+       return;
+
     for (unsigned int i = 0; i < actions.size() && !handled; i++)
     {
         QString action = actions[i];
@@ -247,12 +238,12 @@
     }
 }
 
+// returns text within 'data' between 'beg' and 'end' matching strings
 QString VideoManager::parseData(QString data, QString beg, QString end)
 {
-    bool debug = false;
     QString ret;
 
-    if (debug == true)
+    if (debug > 2)
     {
         cout << "MythVideo: Parse HTML : Looking for: " << beg << ", ending with: " << end << endl;
     }
@@ -264,13 +255,13 @@
 
         replaceNumCharRefs(ret);
 
-        if (debug == true)
+        if (debug > 2)
             cout << "MythVideo: Parse HTML : Returning : " << ret << endl;
         return ret;
     }
     else
     {
-        if (debug == true)
+        if (debug > 2)
             cout << "MythVideo: Parse HTML : Parse Failed...returning <NULL>\n";
         ret = "<NULL>";
         return ret;
@@ -279,10 +270,9 @@
 
 QString VideoManager::parseDataAnchorEnd(QString data, QString beg, QString end)
 {
-    bool debug = false;
     QString ret;
 
-    if (debug == true)
+    if (debug > 2)
     {
         cout << "MythVideo: Parse (Anchor End) HTML : Looking for: " << beg << ", ending with: " << end << endl;
     }
@@ -295,13 +285,13 @@
 
         replaceNumCharRefs(ret);
 
-        if (debug == true)
+        if (debug > 2)
             cout << "MythVideo: Parse HTML : Returning : " << ret << endl;
         return ret;
     }
     else
     {
-        if (debug == true)
+        if (debug > 2)
             cout << "MythVideo: Parse HTML : Parse Failed...returning <NULL>\n";
         ret = "<NULL>";
         return ret;
@@ -395,27 +385,15 @@
     QString host = "www.imdb.com";
     QString path = "";
 
-    QUrl url("http://" + host + "/title/tt" + movieNum + "/posters");
-
-    //cout << "Grabbing Poster HTML From: " << url.toString() << endl;
-
-    if (httpGrabber)
-    {
-        httpGrabber->stop();
-        delete httpGrabber;
-    }
-
-    httpGrabber = new HttpComms(url);
-
-    while (!httpGrabber->isDone())
-    {
-        qApp->processEvents();
-        usleep(10000);
-    }
-
-    QString res;
-    res = httpGrabber->getData();
-
+    QString url = "http://" + host + "/title/tt" + movieNum + "/posters";
+    if (debug > 0)
+        cout << "Grabbing Poster HTML From: " << url.latin1() << endl;
+    isbusy = true;
+    QString res = HttpComms::getHttp(url);
+    isbusy = false;
+    if (debug > 0)
+        cout << "Got " << res.length() << " byte result: " << res.latin1() 
+             << endl;
 
     QString beg, end, filename = "<NULL>";
 
@@ -429,34 +407,21 @@
     {
 
         impsite = "http://www.impawards.com" + impsite;
-	//cout << "Retreiving poster from " << impsite << endl; 
-	    
-        QUrl impurl(impsite);
-
-        //cout << "Grabbing Poster HTML From: " << url.toString() << endl;
-
-        if (httpGrabber)
-        {
-            httpGrabber->stop();
-            delete httpGrabber;
-        }
-
-        httpGrabber = new HttpComms(impurl);
-
-        while (!httpGrabber->isDone())
-        {
-            qApp->processEvents();
-            usleep(10000);
-        }
+	if (debug > 0)
+            cout << "Retreiving poster from " << impsite << endl; 
+        isbusy = true;
+        QString impres = HttpComms::getHttp(impsite);
+        isbusy = false;
+	if (debug > 0)
+            cout << "Got " << impres.length() << " bytes: " 
+                 << impres.latin1() << endl; 
 
-	QString impres;
-	
-        impres = httpGrabber->getData();
         beg = "<img SRC=\"posters/";
 	end = "\" ALT";
 
 	filename = parseData(impres, beg, end);
-	//cout << "Imp found: " << filename << endl;
+	if (debug > 0)
+            cout << "Imp found: " << filename << endl;
 
 	host = parseData(impsite, "//", "/");
 	path = impsite.replace(QRegExp("http://" + host), QString(""));
@@ -515,70 +480,47 @@
     movieNumber = movieNum;
     QString host = "www.imdb.com";
 
-    QUrl url("http://" + host + "/title/tt" + movieNum + "/");
-
-    //cout << "Grabbing Data From: " << url.toString() << endl;
-
-    if (httpGrabber)
-    {
-        httpGrabber->stop();
-        delete httpGrabber;
-    }
-
-    httpGrabber = new HttpComms(url);
-
-    while (!httpGrabber->isDone())
-    {
-        qApp->processEvents();
-        usleep(10000);
-    }
-
-    QString res;
-    res = httpGrabber->getData();
+    QString url = "http://" + host + "/title/tt" + movieNum + "/";
+    if (debug > 0)
+        cout << "Grabbing Data From: " << url.latin1() << endl;
+    isbusy = true;
+    QString res = HttpComms::getHttp(url);
+    isbusy = false;
 
     //cout << "Outputting Movie Data Page\n" << res << endl;
 
     ParseMovieData(res);
 }
 
+// Obtain a movie listing via popular website(s)
 int VideoManager::GetMovieListing(QString movieName)
 {
     int ret = -1;
     QString host = "us.imdb.com";
     theMovieName = movieName;
 
-    QUrl url("http://" + host + "/Tsearch?title=" + movieName + 
-             "&type=fuzzy&from_year=1890" +
-             "&to_year=2010&sort=smart&tv=off&x=12&y=14");
-
-    //cout << "Grabbing Listing From: " << url.toString() << endl;
-
-    if (httpGrabber)
-    {
-        httpGrabber->stop();
-        delete httpGrabber;
-    }
-
-    httpGrabber = new HttpComms(url);
-
-    urlTimer->stop();
-    urlTimer->start(10000);
-
-    stopProcessing = false;
-    while (!httpGrabber->isDone())
-    {
-        qApp->processEvents();
-        if (stopProcessing)
-            return 1;
-        usleep(10000);
-    }
-
-    urlTimer->stop();
-
-    QString res;
-    res = httpGrabber->getData();
+    QString url = "http://" + host + "/Tsearch?title=" + movieName 
+       + "&from_year=1890&to_year=2010&sort=smart&tv=off&x=12&y=14";
 
-    QString movies = parseData(res, "<A NAME=\"mov\">Movies</A></H2>", "</TABLE>");
+    if (debug > 0) 
+        cout << "Grabbing Listing From: " << url.latin1() << endl;
+    isbusy = true;
+    QString res = HttpComms::getHttp(url);
+    isbusy = false;
+
+    // If URL has been redirected to a movie then it was an only match 
+    if (url.find("title/tt") != -1) {
+        int fnd = url.find("title/tt") + 8;
+        movieNumber = url.mid(fnd, url.findRev("/") - fnd);
+        return 1;  // this does a re-request but simplest for now
+    }
+
+    QString exact = parseData(res, "<b>Exact Matches</b>", "</table>");
+    QString partial = parseData(res, "<b>Partial Matches</b>", "</table>");
+    QString movies = exact + partial;
+    if (debug > 0)
+        cout << "Got " << movies.length() << " bytes of movies:" 
+             << movies.latin1() << endl;
 
     movieList.clear();
 
@@ -1087,7 +1029,6 @@
         backup.end();
         update(fullRect);
         noUpdate = false;
-        urlTimer->stop();
     }
     else
         emit accept();
@@ -1639,27 +1580,3 @@
     curitem->updateDatabase(db);
     RefreshMovieList();
 }
-
-void VideoManager::GetMovieListingTimeOut()
-{
-    //Increment the counter and check were not over the limit
-    if(++GetMovieListingTimeoutCounter != 3)
-    {
-        //Try again
-        GetMovieListing(theMovieName);
-    }
-    else
-    {
-        GetMovieListingTimeoutCounter = 0;
-        cerr << "Failed to contact  server" << endl;
-
-        //Set the stopProcessing var so the other thread knows what to do
-        stopProcessing = true;
-
-        //Let the exitWin method take care of closing the dialog screen
-        exitWin();
-    }
-
-    return;
-}
-
Index: mythvideo/mythvideo/videomanager.h
===================================================================
RCS file: /var/lib/mythcvs/mythvideo/mythvideo/videomanager.h,v
retrieving revision 1.8
diff -u -r1.8 videomanager.h
--- mythvideo/mythvideo/videomanager.h	28 Nov 2003 19:00:53 -0000	1.8
+++ mythvideo/mythvideo/videomanager.h	17 Jan 2004 02:53:06 -0000
@@ -58,6 +58,7 @@
   private:
     bool updateML;
     bool noUpdate;
+    int  debug;
 
     QPixmap getPixmap(QString &level);
     QSqlDatabase *db;
@@ -73,7 +74,6 @@
     QMap<QString, QString> parseMovieList(QString);
     void ResetCurrentItem();
 
-    HttpComms *httpGrabber;
     void RefreshMovieList();
     QString ratingCountry;
     void GetMovieData(QString);
@@ -132,13 +132,9 @@
     QString movieRating;
     int movieRuntime;
     QString movieNumber;
-    
-    QTimer *urlTimer;
-    int GetMovieListingTimeoutCounter;
-    bool stopProcessing;
     QString theMovieName;
-
     bool allowselect;
+    bool isbusy;
 };
 
 #endif
-------------- next part --------------
Index: mythtv/libs/libmyth/httpcomms.cpp
===================================================================
RCS file: /var/lib/mythcvs/mythtv/libs/libmyth/httpcomms.cpp,v
retrieving revision 1.4
diff -u -r1.4 httpcomms.cpp
--- mythtv/libs/libmyth/httpcomms.cpp	23 Oct 2003 19:57:17 -0000	1.4
+++ mythtv/libs/libmyth/httpcomms.cpp	17 Jan 2004 02:53:25 -0000
@@ -16,8 +16,8 @@
 {
     init(url);
     m_timer = new QTimer();
-    m_timer->start(timeoutms, TRUE);
     connect(m_timer, SIGNAL(timeout()), SLOT(timeout()));
+    m_timer->start(timeoutms, TRUE);
 }
 
 HttpComms::HttpComms(QUrl &url, QHttpRequestHeader &header)
@@ -63,6 +63,7 @@
     m_responseReason = "";
     m_timer = NULL;
     m_timeout = false;
+    m_url = url.toString();
 
     connect(http, SIGNAL(done(bool)), this, SLOT(done(bool)));
     connect(http, SIGNAL(stateChanged(int)), this, SLOT(stateChanged(int)));
@@ -85,8 +86,9 @@
 {
     if (error)
     {
-       cout << "MythVideo: NetworkOperation Error on Finish: "
-            << http->errorString() << ".\n";
+       cerr << "HttpComms::done() - NetworkOperation Error on Finish: "
+            << http->errorString() << " (" << error << ": url: " 
+            << m_url.latin1() << endl;
     }
     else if (http->bytesAvailable())
         m_data = QString(http->readAll());
@@ -113,7 +115,7 @@
             case QHttp::Reading: cerr << "reading\n"; break;
             case QHttp::Connected: cerr << "connected\n"; break;
             case QHttp::Closing: cerr << "closing\n"; break;
-            default: break;
+            default: cerr << "unknown state: " << state << endl; break;
         }
     }
 }
@@ -143,6 +145,7 @@
 
 void HttpComms::timeout() 
 {
+   cerr << "HttpComms::Timeout for url: " << m_url.latin1() << endl;
    m_timeout = true;
    m_done = true;
 }
@@ -158,10 +161,18 @@
     QString res = "";
     HttpComms *httpGrabber = NULL; 
     int m_debug = 0;
+    QString hostname = "";
 
     while (1) 
     {
         QUrl qurl(url);
+        if (hostname == "")
+           hostname = qurl.host();  // hold onto original host
+        if (!qurl.hasHost())        // can occur on redirects to partial paths
+           qurl.setHost(hostname);
+        if (m_debug > 0)
+           cerr << "getHttp: grabbing: " << qurl.toString() << endl;
+
         if (httpGrabber != NULL)
             delete httpGrabber; 
         httpGrabber = new HttpComms(qurl, timeoutMS);
@@ -197,13 +208,12 @@
         // Check for redirection
         if (!httpGrabber->getRedirectedURL().isEmpty()) 
         {
+            if (m_debug > 0) 
+                cerr << "redirection:" 
+                     << httpGrabber->getRedirectedURL().latin1() << " count:" 
+                     << redirectCount << " max:" << maxRedirects << endl;
             if (redirectCount++ < maxRedirects)
-            {
                 url = httpGrabber->getRedirectedURL();
-                if (m_debug > 0)
-                    cerr << "redirect " << redirectCount << "/" << maxRedirects
-                         << " to url:" << url.latin1() << endl;
-            }
 
             // Try again
             timeoutCount = 0;
@@ -219,6 +229,8 @@
     if (m_debug > 1)
         cerr << "Got " << res.length() << " bytes from url: '" 
              << url.latin1() << "'" << endl;
+    if (m_debug > 2)
+        cerr << res;
 
     return res;
 }
Index: mythtv/libs/libmyth/httpcomms.h
===================================================================
RCS file: /var/lib/mythcvs/mythtv/libs/libmyth/httpcomms.h,v
retrieving revision 1.3
diff -u -r1.3 httpcomms.h
--- mythtv/libs/libmyth/httpcomms.h	23 Oct 2003 19:57:17 -0000	1.3
+++ mythtv/libs/libmyth/httpcomms.h	17 Jan 2004 02:53:25 -0000
@@ -50,6 +50,7 @@
     QHttp *http;
     bool m_done;
     QString m_data;
+    QString m_url;
     QTimer* m_timer;
     bool m_timeout;
     int  m_debug;


More information about the mythtv-dev mailing list