Some Israeli news sites object too loudly to being included in Google News

This is actually a too-common problem with quite a few news sites around the world. They see their pages being included in a search engine, or a news portal, as someone stealing their content. Instead of seeing it as someone helping them get more readers.

And now a few of the big Israeli news sites are joining the fry, making a lot of noise, a lot more than they need to, about this.

The letter went on to say that collating news items from leading sites in Israel crossed boundaries. “All over the world, the issue of copyright infringement is gaining momentum, with an emphasis on the Internet. We believe there is no place to injure original Israeli content, which, to the contrary, should be encouraged. I am confident that the other leading sites in Israel will not lend a hand to injury of their property and will demand that Google refrain from using their content.”

Dimwits. Being included in a search engine index, or in a popular news aggregation service, doesn’t injure anyone’s property. It doesn’t hurt anyone. On the contrary, it helps them.

They don’t want their content copied, because they want readers to go to their own sites to read it. That’s fine. But that’s exactly what will happen. Sites like Google News don’t usually show the full stories anyway, they show headlines and briefs. Anyone who wants to read the story will have to go to the site which published it.

And a lot more people go to places like Google News, Yahoo News, etc, than directly to the news sites. And for a very good reason.

If someone is searching for a story, or for coverage of an issue, they originally don’t know which paper covered it best, if at all. So option one is to go to one news site, search there, go to another, search there, go to a third, search there, etc. And to go on until something good enough was found, or until the searcher is tired.

But if there are sites that allow to search for the story in a few of the papers at the same time, and show enough of of the story to decide which is the most interesting or relevant version, or even to directly open all the stories, that’s a much more appealing destination.

So true, if the story is bad, people won’t go to read it. But any paper which believe they’re in the business of writing bad stories probably can’t expect too many readers to go to them directly anyway.

By being excluded from the index a newspaper just assures that less people will come to them, because they will only get the readers who wanted them specifically to begin with. Anyone else will not find them, will not stumble upon their stories, will not discover that they covered the issues. That’s an attitude that doesn’t make much sense.

Additionally, search engines have covered some of these Israeli news sites for years now. It’s possible to run a search on a general search engine, in Hebrew, and get news results. Not from all of them, some Israeli sites don’t play for a long time now, but from the rest.

So this new outburst is because of the localized Hebrew version of Google News which is coming up. But it’s not all that different from what was available before, beyond presenting a page dedicated to news in Hebrew. It does make for a more obvious entry point for people looking for news, but it won’t index any content which Google didn’t index anyway.

The way these protests were made is also telling. There’s a very simple way to ask for civilized search engines not to include your pages in their index. And all the big players, Google included, are civilized this way. Put a robots.txt file on the site, and exclude either all web crawlers, or the ones you specifically object to.

Most page crawlers, of the types search engines use to go over sites and index them, look for the presence of this file, and check in it what parts of the site they’re asked not to index. It’s very simple to do, and it works.

Feder said that the Ynet site manager, Yacov Netzer, had written to Google Israel manager Meir Brand asking that the site refrain from using Ynet content.

One of the news sites mentioned in the article, Ynet, already have that file:

User-agent: *

Disallow:

This file explicitly says that all crawlers and web robots are allowed to access each and every page of the site. They’re saying that explicitly. Come index us, they say. It’s right there.

All they need to change is to add a single character:

User-agent: *

Disallow: /

This would be different, it would be blocking access to the entire site, news section included, by all crawlers. This means that their content will not appear on Google News. It’s that simple. Not only that, it’s nearly done. They don’t have to do anything else but adding that slash character. They don’t have to appeal to Google directly. Their manager doesn’t need to waste his time writing to the manager of Google Israel. There’s no point to it. They’re making the wrong choice, but they made it, and Google will indeed refrain from using their content. Problem solved before it started.

The letter on Walla!’s behalf was sent by the prestigious law firm of Herzog, Fox & Neeman. The letter said that as Google knew, articles appearing on the Walla! Web site were Walla!’s exclusive, copyright-protected property. “Therefore, unauthorized use that your company is making of these items on its Web site constitutes a grave infringement on my client’s property rights, by infringing copyright,” the letter said.

And that letter from a law firm on behalf of Walla!, what about it? I bet it took a lot of time, and money, to draft and present. Lawyers charge for consulting with them, for their work, for writing letters. Getting the site designer to write a robots.txt file would have been much simpler, much cheaper, and much quicker. As of right now, however, the Walla and Walla! News sites do not have it.

On today’s Internet, not having a robots.txt file is the equivalent of saying, but implicitly instead of explicitly as Ynet is currently doing, “Please come, index me, and allow to search my content, thank you” to the entire world. So Walla! are doing that, while at the same time having their lawyers billing them for talking with Google’s lawyers.

Brilliant. Just brilliant. And we’re supposed to trust these guys as our news source. Newsflash, people, robots.txt file is an old, old, standard by these days. And Google also respects newer meta tags that do the same thing, which can be (but are not, in Walla!’s case) embedded on individual pages.

They’re all so clueless that it’s quite staggering…

Leave a Reply

You must be logged in to post a comment.