It's now over a month since Google released its open source web browser, Chrome. An interesting theory we heard recently is that Google will use Chrome to index the password protected Web - a.k.a. the 'dark web'.
Right now the Chrome Terms of Service (TOS) prevents Google from indexing private data. But when you consider that Chrome was initially presented as a browser for applications, instead of just web pages, this theory begins to make more sense.
Most web apps are password-protected and so there's no way for a normal search engine to index the data - even data that's generalized and doesn't identify individual users. But with a full-fledged browser to complement its search engine, Google now theoretically has the means to index this previously inaccessible data.
So is Google planning to use Chrome in the future to index password protected data on the Web? This needn't be a sinister question to ask, because the Web has evolved into something that is not easily indexed. Neil McAllister wrote a great article back in July entitled Is the Web still the Web? (via Slashdot), that delved into this subject. Neil wrote:
"Is it still the Web if it's not really hypertext? Is it still the Web if you can't navigate directly to specific content? Is it still the Web if the content can't be indexed and searched? Is it still the Web if you can only view the application on certain clients or devices? Is it still the Web if you can't view source?"
As he also pointed out, RIA Flash and Silverlight content can now be searched - see our own writeup of this in July.
So the next step is to be able to search and index web applications that rely on user-generated content. Chrome is the perfect vehicle to do that. There would have to be a change in the TOS to allow it, because indexing private data is of course still a no-no among search engines - especially the market leader Google. And there would be a big privacy issue with indexing your personal browsing history. But what if Google could convince users of the value of indexing web app data without identifying the individual user...
What do you think of this theory - too far out? Remember that Chrome has already become by most accounts the 4th leading browser, after IE, Firefox and Safari. It's already usurped Opera and it's only 1 month old, still in beta and there's no Mac version. In ReadWriteWeb's stats for September, Chrome was used by 6.3% of our readers - not bad when you consider we have a higher proportion of Mac users than mainstream sites.
When Chrome is 2nd or 3rd in the browser market, then it may be in a position to start implementing some grand plans - like indexing password protected data. Let us know if this is too crazy, or you can forsee a socially acceptable use case for this scenario.
Update: Chris Messina notes that Flock already does this:
"Flock already DOES index every page you visit with Lucene and keeps the data in an offline cache. I could imagine that if I were to want to use Flock on another computer, I wouldn't want to limit my search result to only what I visited on THAT machine -- I'd want to pull from my entire browsing history.
We simply need protections to enable this kind of circumstance to be offered safely -- or at least with minimized risk."
Comments
Subscribe to comments for this post OR Subscribe to comments for all Read/WriteWeb posts
you are crazy.
why would i want search engines to index password-protected data?
how the web is not the web if you can't reach some areas? (there's always redundant connections in any network, esp of this sort)
is this article some kind of tryout by google to see what the core web community thinks about their plans? or you just had this idea but were afraid to be negative about it because google would lower rww in their search results?
Posted by: deemeetree | October 5, 2008 2:32 PM
strange coincidence. I'm currrently working on a firefox extension that crawl password protected websites. But for strictly private purpose.
Posted by: Jo | October 5, 2008 2:40 PM
Sounds like spyware to me.
Even if Chrome convinces users to let it index their password protected data, I doubt they'd be so belligerent to website owners in providing what would essentially be a distributed leeching network for private data.
Then again...
Posted by: Hamish | October 5, 2008 2:41 PM
deemeetree, I do realise it is a controversial suggestion and I fully appreciate the privacy implications. But I think it's worth asking the question: what if Google could index password protected data with promises / guarantees that no personally identifiable data will be indexed. I admit I'm not entirely sure what the use case(s) would be, but it would have to be valuable data and certainly would extend their search index.
But this post was an open question, that's why I didn't focus on the negative bits (privacy etc). I think this is entirely viable, although I don't quite know why at this point...
Posted by: Richard MacManus
|
October 5, 2008 2:45 PM
Of course this is already happening, but it doesn't require Chrome. Consider Google Desktop or Gears. It's very challenging to think through exactly what a document that I create online in Google Docs is when it's synced to my desktop and stored via Google Gears, and then I search for it within the Google Docs web application and it retrieves the document from the local storage.
I understand that you're talking about a slightly different use case, but the reality is that people are going to want to search for documents, media and web pages that they've seen. To the extent that the distinction between the web and local storage gets in their way in pursuing a document, they will likely prefer success over inhibition due to online/offline data indexing policies.
Flock already DOES index every page you visit with Lucene and keeps the data in an offline cache. I could imagine that if I were to want to use Flock on another computer, I wouldn't want to limit my search result to only what I visited on THAT machine -- I'd want to pull from my entire browsing history.
We simply need protections to enable this kind of circumstance to be offered safely -- or at least with minimized risk. The way that you phrased this situation makes it sound increasingly nefarious, but if you actually consider the use case, from a *personal* service provider's perspective, having access to your browsing history can help make their service vastly more useful, and that's true for anyone, beyond Google.
Posted by: factoryjoe.com
|
October 5, 2008 2:49 PM
I don't think this would fly. There would be too many organizations worried about indexing their intranets and sensitive data. Once Chrome started getting banned from a few private sites, the browser would fall out of favor from its users. Bans from certain private websites would be inevitable in this situation, regardless of the safeguards put in place.
Posted by: Nick Molnar | October 5, 2008 2:53 PM
Chris, excellent point and I'm glad you jumped in. I've updated the post to note the Flock example.
Posted by: Richard MacManus
|
October 5, 2008 2:58 PM
Hi Richard, controversial but I wonder why would the search engine want to crawl password-protected sites? They are password-protected, so the visitors won't be able to reach them. Maybe for some evil reasons? But even in this case, I don't think anyone can afford the social implications and the costs of that.
However as a site owner you may want your password-protected pages to be crawled; and there are ways of doing that. But as far as I know, Google even penalizes such sites; because it implies to poor search experience for its users.
Posted by: Emre Sokullu | October 5, 2008 3:02 PM
Richard,
Crawling engines do not take the detour of tunneling through client machines to index site content. This is a misunderstanding of how crawling and indexing works.
Crawling is a server to server business, and there are plenty of engines that poke form values and yes, some types of pre-authorized passwords or administrator permitted access.
There is not need to use Chrome, or any other client to get behind a password protected gateway. There are robots for cracking passwords and engines for getting by 99% of captchas.
You need some schoolin' buddy. Actually, the act of tunneling through client browsers has a name - Malware.
Posted by: Alan Wilensky | October 5, 2008 3:16 PM
This is nonsense for two reasons:
1. What's the use of showing data in the SERPs that isn't accessible?
They already created ways to allow their Mediabot (for AdSense) to get behind registration walls, so it's absolutely possible from a technical standpoint.
2. Web startups that could see their data in SERPs will do everything they can to make that happen, because of the resulting traffic.
Even Facebook created public search results for the registered users, and I expect them to do the same with stuff like events.
Posted by: Sebastian | October 5, 2008 3:29 PM
Interesting thought. I even think this could be a good thing. If there is user controll over which data and what happens to it. It could be a way of liberating data stuck behind login walls and exporting it as a RSS feed. That'd open up Facebook. Of course it could also be of use to spammers browsing profiles. Then again, that's all ready happening.
Posted by: Jonas | October 5, 2008 3:57 PM
Alan, thanks for your kindly condescending comment. But I'm not at all suggesting "tunneling through client browsers". What I was driving at is that Chrome's browser history could be used in some way to expand Google's search index. Note that it may not even be presenting that data to consumers of Google's search engines. It may be just collecting all the data, then using that to enhance their PageRank algorithm. In other words, perhaps Google uses Chrome user data to expand its index, but it's still not accessible to end users, however in aggregate the data is used to enhance search results.
The other option is the one Chris referred to, where users can actually search on their own data - it's still password protected, but it's more easily found across machines and devices.
Posted by: Richard MacManus
|
October 5, 2008 4:07 PM
I don't think so. Why would Google use Chrome to index password-protected content when Google Desktop can already do the same, whether you are using Chrome or FireFox or IE.
Also, the idea of the web no longer becoming the web because of fancy sites and password protected areas makes no sense. Just because there are lots of new services allowing people to store their personal information in the cloud doesn't mean the end to public information.
As long as there is public information and products to sell and sites that want to make money off of published content, the regular search engines will do just fine.
So, Google clearly didn't create Chrome to supposedly fix an "indexing issue." The reality is, Google made Chrome to make more powerful web applications to be a possibility. If you have seen the JavaScript benchmarks, you will see that Chrome runs JavaScript an order of magnitude faster and more efficiently than any other browser. Also, the design of the browser allows it to more flexibly blur the line between a desktop application and a web application.
By making this push, even if Chrome does not become a dominant browser, all of the other browsers are likely going to play catch-up. This means more and more powerful web browsers. As a result, Google can commence making more and more powerful web applications that require more and more power from the browser.
In their push to possibly allow people to unlock the data in password protected sites and, perhaps, move this data into Google's services, this will likely be accomplished by individual applications on a per-service basis, or will be accomplished by Google Desktop which is already understood to be designed to index private data. Google Chrome is not an indexer... it is merely meant to process and display applications. The fact that it stores this data in an index is merely a by-product of previous browsers and the fact that most people expect to be able to browse their viewing history.
Once Chrome has been expanded, it will likely be plugged into Google's cloud more and more (Google Bookmarks, Google Web History, etc...) and then Google Desktop can be combined to take care of local data and password protected data (as long as the user has opted into this)... so that wherever you are, you will be able to access public data as well as your own personal data. When the personal data was data which originated in a Google service, this will of course be utilized in Gears which was specifically designed for Google's services or any service that chooses to use it.
Posted by: StareClips.com | October 5, 2008 5:49 PM
Hasn't Google Web Accelerator (GWA) been suspected of this, too?
Posted by: christefano | October 5, 2008 6:12 PM
It's a little bit (actually, quite a lot) looney. Most user generated content-based sites allow you to read their pages regardless of whether you're logged in or not. I'm having a hard time thinking of a site that provides a considerable amount of (free) content and that is only accessible to registered users. Those that do, do it to guard people's privacy (see Facebook - but even they have profile bits google can index).
Also, content that is only available to registered (and logged in) users is likely to be personalised. In this case, what would google show - whatever, say, Last.fm provided as MY recommendations? How would they link to such pages?
And then there's the privacy issues. Would they be indexing by internet banking? Private health data? Private communication? Nope, this will never, ever happen.
The "hidden" web is hidden for a reason.
Posted by: Vlad B | October 5, 2008 6:52 PM
Please just leave MY machine & data alone!!!! I'll take care of it MYSELF, thank you!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
Besides, don't you think there just MAY BE A REASON there is a password on there??????? GET REAL!!!!!!!!!!
Posted by: Buford Jones | October 5, 2008 6:54 PM
Indexing password protected sites could violate many TOS agreements for other services, not to mention it could be a breach of company's NDA policies. Chrome has huge potential, but I don't want every URL I type in to be sent to google.
The web is becoming a monster data mining project for corporations like Google and Governments to track people and information. Control, regulate and archive the information, you can control the people.
http://www.hackerforums.org
Posted by: UNiHacker | October 5, 2008 8:05 PM
Automatic TrackBacking doesn't seem to work here. So I'm doing it manually: Beyond Web Standards.
RWW Webmaster, please erase this comment if the TrackBack starts working.
Posted by: Ziv Levin | October 6, 2008 1:25 AM
Most web apps are password-protected and so there's no way for a normal search engine to index the data - even data that's generalized and doesn't identify individual users. But with a full-fledged browser to complement its search engine, Google now theoretically has the means to index this previously inaccessible data.
-----------------
Samflutch
internet marketing
Posted by: samflutch | October 6, 2008 3:15 AM
Non-issue. Chrome is open source. Duh.
Posted by: J. Adam Moore | October 6, 2008 4:47 AM
Hope this should not be true. Because, I have accessed the password protected page using chrome.
Posted by: Raj S | October 6, 2008 5:09 AM
Well you don't have to hypothesize whether google is doing that or not since the source code for Chrome is GPL and freely available - check it out and stop spreading rumors.
Paul/
Posted by: Paul | October 6, 2008 5:22 AM
The reason people are strongly and negatively responding to this article (and rightfully so) has little to do with how they feel about Google but everything to do with the fact that Google chrome is open source and GPL, and there is absolutely no way you can do this with or without users consent without the world having to know about it.
Even for a speculation this is absolutely absurd.
Posted by: pavs | October 6, 2008 5:32 AM
Wow, that is scary. I am telling you dude, one day Google is going to rule the world! Just watch and see.
Jiff
www.privacy-center.ru.tc
Posted by: Jim Jensen | October 6, 2008 5:33 AM
this made ma laugh. you funny little peope with such a big and still developing paranoia.
yeah, sure. google = the beast, sure.
Posted by: savocado | October 6, 2008 5:40 AM
deemeetree is absolutely right.
Suggesting that google could or would index private data is ludicrous.
1. It would be highly illegal, in the UK at least.
2. you could not present it to others (password required)
3. not even possible (the index would only contain pages visited)
4. protected sites are often applications, where the data is user driven (rarely the same, so worthless)
Posted by: timbo | October 6, 2008 5:46 AM
This wouldn't make sense at all.
Why would they want to index it if people couldn't access it when searching Google?
Posted by: Shane | October 6, 2008 5:54 AM
Google has had issues like this many times. If you remember the first few additions of Google had to fight security issues then too! They will figure it out and make it better, I have no doubts.
Jesse W
http://churchofcowherd.wordpress.com/
Posted by: Jesse Wojdylo | October 6, 2008 6:12 AM
Not going to happen. If webmasters want to reveal their content to the SERPS then Google provides webmaster tools for doing just that, but it would be of limited use to the Google to index the deep web without some sort of affiliate agreement with the sites. If Google could index the deep web, and say 3 of the top 10 results were pages that required me to pay for an account to view a page then 3 of the top 10 results are as useless as spam as far as I'm concerned.
Would ditch Google for a search engine for a search engine which indexes content I can access for free and without having to sign up. There may be some specialist uses for searching behind a login (scientific journals, porn sites, etc) but these are already covered by specialist search engines.
What Google needs is better spam filtering, not more - essentially useles - results for inaccessable pages.
Posted by: strix | October 6, 2008 6:24 AM
Google does already index pages you visit. I use firefox all day, every day and some subfolders that I use for a development platform get indexed quite quickly. One previous project got 10 pages indexed in 5 days ... not a great feat but not terrible either seeing there were 0 links pointing to these subfolders on the domain or anywhere else on the web.
Yes, I can see google doing this
Posted by: cash index | October 6, 2008 6:27 AM
IE8 and Chrome are the first browsers to run each tab in a separate process. This multi-process design makes these two browsers superior platforms for applications (I'm taking rendering engines out of the argument; that's a different ball of wax).
As a developer, when I read "Chrome was initially presented as a browser for applications", I translate that to "it's a stable platform to run my web app"; not "it's going to index my web app someday".
Posted by: Jeremy | October 6, 2008 6:49 AM
Erm... source code snippet for proof?
Blatantly stupid article. Richard MacManus needs to go back to journalism school.
Posted by: john | October 6, 2008 6:50 AM
The answer is no.
Posted by: zeeol | October 6, 2008 7:07 AM
Or Google is more likely to use it to get data to better serve adwords, their cashcow
e.g.
http://seosnafu.blogspot.com/2008/09/google-chrome-is-it-spyware.html
Posted by: eric | October 6, 2008 7:09 AM
We have to be careful in distinguishing between private and personalized content. Both are information accessible only post-login, but private content should be indexed and searched privately (not necessarily locally) where as personalized content should always review a URI "stub" that can be indexed publicly.
Posted by: Q dub
|
October 6, 2008 8:49 AM
Actually, to the people who are writing this article off, we've already had a problem with this where I work. I work at a company that deals with some big name clients and we have several webapps that are locked down with every security method, in authentication, coding, etc, but all of that was circumvented when Google toolbar cached a client's instance of one of our programs and all his company's info was available in the google search cache. We've also had problems with register_globals in php being exploited by google "logging in" with username and password fields as url parameters.
To say that Google isn't going after people's private info is ludacris, as that's how they make money. It used to be freaky that you could see the top of your neighbors house (and in sun lights)... Then we could see inside their front windows. 3D scans of the planet's surface and objects next? A phonebook to address to google maps to facebook to picture connection? You'd be able to see exactly who lives in each house when perusing google earth. Scary thing is- everything is in place for it to be done.
Posted by: Dan | October 6, 2008 9:02 AM
google can pretty much do anything they want right now because we are too dependent on them.
Posted by: web hustler | October 6, 2008 9:46 AM
I don think so...
gOOGLE cannot do that ...
Posted by: mnvamsi | October 6, 2008 10:51 AM
Actually, I think that Google Chrome will be adapted to a robot that will shop for you at the grocery store. Think I'm joking? Just do a search for GOOGLE GROCERY STORE ROBOT. When they are ready to take over the world, these robots will poison our food and will begin attacking. When we try to fight back, we will use email and IM and phones to try to gather ourselves together, but Google will already own it all... Gmail, Google Talk, and GrandCentral. We will be powerless. And it all starts with Google Chrome.
Honestly, this is about as silly as the original article this is a reply to. Google Chrome is exactly what its name implies. It is "chrome" for web applications. Its minimalistic design is meant to make web applications more like applications and less like websites. That is all. If Google wants to do something else like crawl password-protected sites, they'll release software made just for this, or use existing technology which is closer to doing this (like Google Desktop.) They won't try to "slip it by" in a piece of open source software.
On a side note, to those who are saying that Chrome is "open source"... I'm pretty sure Chrome is closed source. Chromium, which Chrome is based on, IS open source, however.
Posted by: StareClips.com | October 6, 2008 10:55 AM
Richard, I'm sorry, but I see no use in Google browsing password-protected data. Absolutely no use.
The only reason when it could happen is when a publisher (for example some magazine, which has online archives) allows them to index their archive, but it could be done through their Webmaster interface.
Introducing this feature in a browser is a disaster. If should be content publisher deciding whether their content should be viewable to others, not the readers. I think in the end it will just result in some sites blocking Chrome from viewing them. Well, good thing it was just your fantasy and is not really implemented (yet?)
Posted by: deemeetree | October 6, 2008 12:50 PM
"Richard, I'm sorry, but I see no use in Google browsing password-protected data. Absolutely no use."
Obiously you dont think with a wide mind, private data is required more than public data, not for a common user, but yes for hidden interest.
Posted by: John | October 6, 2008 2:46 PM
Sorry but no way... this would be the biggest security threat the world ever faced. Peoples private data secured behind usernames and passwords should never be indexed - it basically would mean that any of that datas security would be circumvented and render usernames / passwords and "private" data compromised.
I think you are simply scare mongering and trying to get more hits on your own website here - and it's working!
Any web browser that records data or access details to private data should have its offices and staff be bombed by the nearest available fighter jet.
What you are describing is a kind of cyber-terrorism. No one for any good reasons would ever want that data or dare to store it.
Scenario:
Mr X works for a nuclear facility and uses Google Chrome to login to a web based email client and a secure site run my his company "nuclear power international". Then Mr Y does a Google search and can see all of the operational information of a nuclear facility (albeit without "personal information").
Crazy - stupid nonsense... completely 100% infeasible...
Unless Osama Bin Laden has a major share in Google this is total end of the world science fiction...
Anyway well done on driving up the traffic to your website LOL.
Posted by: Dan | October 7, 2008 5:04 AM
wow! i like the way of your paranoid thinking!
shure indexing password protected sites is very interesting for google! they can make alot of cash with it! of course they would not add those sites to the public seach index. but for their own use like insider knowhow for stock exchange things and so on. and of course intresting for federal burreaus :)
Posted by: tom | October 7, 2008 7:50 AM
You have created a doubt and fear in web user. and this article can harm chrome in spreading on net. Many may think google is already so powerful and using its browser can really be harmful to me!
Posted by: Satya Prakash Karan | October 8, 2008 10:12 AM
Google Chrome is open source, no? So why do you bother about it? Just do another chrome, minus the password protected crawler?
Posted by: Julien | October 8, 2008 6:13 PM
It is a good article all right, but it's a bit too long
Posted by: px234 | October 9, 2008 7:06 PM
This is great news, information should always be free and available to the public.
Posted by: Lebat | October 10, 2008 3:56 PM
My opinion is Google, aside from being a money making industry, is benign. They will squeeze money if they can but not at the risk of alienating their users
Posted by: Mike | October 14, 2008 12:14 PM