User talk:Dekarl

From MythTV Official Wiki
Jump to: navigation, search

What do you mean by: as its violating site "TOSs" -- SilentButeo2 12:24, 8 January 2014‎ (UTC)

That grabber appears to collect information from web sites in a way that is against the wishes of the site owner. Usually the EPG/Metadata sites have something like "no automated scraping" in their Terms of Service (TOS).
For example it appears to support scraping the site that is talked about on http://www.bucksch.org/1/projects/various/xmltv/
Also it appears to access IMDB in a way that violates http://www.imdb.com/help/show_article?conditions -- Dekarl 12:58, 8 January 2014 (UTC)

Enable WebGrab+Plus again

Please, could you enable WebGrab+Plus again?

And for the next reasons:

1. It satisfy the site owner TOS, by implementing the "the ROBOTS exclusion standard".
2. When a site owner has complains about WG++ grabbing their site, the site is removed at first question.
3. IMDB is not accessed default by WG++. This is only an option that can be enabled by the user. Also work is in progress, to satisfies the MDB conditions.

I don't know which site you refer to by:

"For example it appears to support scraping the site that is talked about on http://www.bucksch.org/1/projects/various/xmltv/"

reply

We do not want to promote legaly questionable behaviour and tools.

  • You have got it the wrong way around. Manually programming a scraper to violate the terms of services as they are written on the web site in plain speech is not ok. That has nothing to do with a robots.txt targeted at generic web crawlers. Also I don't understand "Conforms to the ROBOTS exclusion standard through a screen warning." at http://webgrabplus.com/features. It sounds like greenwashing to me.
  • I was randomly picking tvmovie.de as it has been documented at bucksch.org that they don't want a grabber. Their current ToS has strange legalese instead of a plain "do not scrape" though.
  • The site before them at http://www.webgrabplus.com/epg-channels, tvinfo.de states very clear "(3) Zugriffe/..., die auf tvinfo.de befindliche Inhalte/... extern speichern, ... (insbesondere durch automatisierte Scripte/Programme) sind untersagt." at http://www.tvinfo.de/agb (translates to "requests, which save content from tvinfo.de externally, especially by automated programs or scripts, are forbidden.")
  • If there is work in progress to comply with IMDB's rules that sounds as if the current implementation violates the IMBD conditions but must be manually enabled? Just saying "but you have to push an extra button!" is not enough to greenwash the misbehaviour.

There are companies and communities that are cool with open source and free (as in beer) data. May I suggest to work together with the people that want to be part of a solution? e.g. the first response to "what direction should webgrab+plus take" at http://webgrabplus.com/content/poll-what-should-wg-developers-focus is "add support for Atlas". Atlas is run by such cool people. As Atlas is a middleware / service someone could help guide creation communities (like OZTivo / Kazer / TV-Browser / etc) to make their guide available at this service. (I'm looking into that myself) And help generalize the toolset of these communities to make it easier to provide guide data themselves. The Atlas middleware also supports merging of multiple data sets, e.g. a multilingual tv content database, with a single language tv guide. (six votes for translating post processor)