Trouble In the House of Google
Let's look at where stackoverflow.com traffic came from for the year of 2010.
When 88.2% of all traffic for your website comes from a single source, criticizing that single source feels … risky. And perhaps a bit churlish, like looking a gift horse in the mouth, or saying something derogatory in public about your Valued Business Partnertm.
Still, looking at the statistics, it's hard to avoid the obvious conclusion. I've been told many times that Google isn't a monopoly, but they apparently play one on the internet. You are perfectly free to switch to whichever non-viable alternative web search engine you want at any time. Just breathe in that sweet freedom, folks.
Sarcasm aside, I greatly admire Google. My goal is not to be acquired, because I'm in this thing for the long haul – but if I had to pick a company to be acquired by, it would probably be Google. I feel their emphasis on the information graph over the social graph aligns more closely with our mission than almost any other potential suitor I can think of. Anyway, we've been perfectly happy with Google as our de-facto traffic sugar daddy since the beginning. But last year, something strange happened: the content syndicators began to regularly outrank us in Google for our own content.
Syndicating our content is not a problem. In fact, it's encouraged. It would be deeply unfair of us to assert ownership over the content so generously contributed to our sites and create an underclass of digital sharecroppers. Anything posted to Stack Overflow, or any Stack Exchange Network site for that matter, is licensed back to the community in perpetuity under Creative Commons cc-by-sa. The community owns their contributions. We want the whole world to teach each other and learn from the questions and answers posted on our sites. Remix, reuse, share – and teach your peers! That's our mission. That's why I get up in the morning.
However, implicit in this strategy was the assumption that we, as the canonical source for the original questions and answers, would always rank first. Consider Wikipedia – when was the last time you clicked through to a page that was nothing more than a legally copied, properly attributed Wikipedia entry encrusted in advertisements? Never, right? But it is in theory a completely valid, albeit dumb, business model. That's why Joel Spolsky and I were confident in sharing content back to the community with almost no reservations – because Google mercilessly penalizes sites that attempt to game the system by unfairly profiting on copied content. Remixing and reusing is fine, but mass-producing cheap copies encrusted with ads … isn't.
I think of this as common sense, but it's also spelled out explicitly in Google's webmaster content guidelines.
However, some webmasters attempt to improve their page's ranking and attract visitors by creating pages with many words but little or no authentic content. Google will take action against domains that try to rank more highly by just showing scraped or other auto-generated pages that don't add any value to users. Examples include:Scraped content. Some webmasters make use of content taken from other, more reputable sites on the assumption that increasing the volume of web pages with random, irrelevant content is a good long-term strategy. Purely scraped content, even from high-quality sources, may not provide any added value to your users without additional useful services or content provided by your site. It's worthwhile to take the time to create original content that sets your site apart. This will keep your visitors coming back and will provide useful search results.
In 2010, our mailboxes suddenly started overflowing with complaints from users – complaints that they were doing perfectly reasonable Google searches, and ending up on scraper sites that mirrored Stack Overflow content with added advertisements. Even worse, in some cases, the original Stack Overflow question was nowhere to be found in the search results! That's particularly odd because our attribution terms require linking directly back to us, the canonical source for the question, without nofollow. Google, in indexing the scraped page, cannot avoid seeing that the scraped page links back to the canonical source. This culminated in, of all things, a special browser plug-in that redirects to Stack Overflow from the ripoff sites. How totally depressing. Joel and I thought this was impossible. And I felt like I had personally failed all of you.
The idea that there could be something wrong with Google was inconceivable to me. Google is gravity on the web, an omnipresent constant; blaming Google would be like blaming gravity for my own clumsiness. It wasn't even an option. I started with the golden rule: it's always my fault. We did a ton of due diligence on webmasters.stackexchange.com to ensure we weren't doing anything overtly stupid, and uber-mensch Matt Cutts went out of his way to investigate the hand-vetted search examples contributed in response to my tweet asking for search terms where the scrapers dominated. Issues were found on both sides, and changes were made. Success!
Despite the semi-positive resolution, I was disturbed. If these dime-store scrapers were doing so well and generating so much traffic on the back of our content – how was the rest of the web faring? My enduring faith in the gravitational constant of Google had been shaken. Shaken to the very core.
Throughout my investigation I had nagging doubts that we were seeing serious cracks in the algorithmic search foundations of the house that Google built. But I was afraid to write an article about it for fear I'd be claimed an incompetent kook. I wasn't comfortable sharing that opinion widely, because we might be doing something obviously wrong. Which we tend to do frequently and often. Gravity can't be wrong. We're just clumsy … right?
I can't help noticing that we're not the only site to have serious problems with Google search results in the last few months. In fact, the drum beat of deteriorating Google search quality has been practically deafening of late:
Anecdotally, my personal search results have also been noticeably worse lately. As part of Christmas shopping for my wife, I searched for "iPhone 4 case" in Google. I had to give up completely on the first two pages of search results as utterly useless, and searched Amazon instead.
People whose opinions I respect have all been echoing the same sentiment -- Google, the once essential tool, is somehow losing its edge. The spammers, scrapers, and SEO'ed-to-the-hilt content farms are winning.
Like any sane person, I'm rooting for Google in this battle, and I'd love nothing more than for Google to tweak a few algorithmic knobs and make this entire blog entry moot. Still, this is the first time since 2000 that I can recall Google search quality ever declining, and it has inspired some rather heretical thoughts in me -- are we seeing the first signs that algorithmic search has failed as a strategy? Is the next generation of search destined to be less algorithmic and more social?
It's a scary thing to even entertain, but maybe gravity really is broken. Posted by Jeff Atwood View blog reactions
I'm pretty sure I would buy a stress ball shaped like your head.
William Stevenson on January 3, 2011 3:41 AMAnecdotally I had the same problem the other day. Horror of horrors Bing was better in the end!
Alexandronov on January 3, 2011 3:47 AMThis is a really interesting article and reflects what I have been feeling for a while, that the relentless and exponential rise in SEO activity would eventually start to affect the usefulness of Google.
In a weird way it feels analogous to when people say that Windows gets more virus attacks because it has a bigger audience than other operating systems rather than being less secure necessarily (Im not saying it isn't)
In other words, perhaps Bing has an advantage here as it is less targeted by SEO activity, the same way that non-Windows operating systems dont suffer the same level of virus attacks?
You’re confusing a model with reality: Google isn’t gravity, they’re trying to *model* it using their ranking algorithm. And as your experience clearly shows, it is currently modelling gravity (i.e. the “reality” of which sites are relevant and which aren’t) badly.
To stay in your metaphor, it’s clearly time for a paradigm shift, a kind of Einstein of web ranking algorithms. Sounds like a very interesting thesis topic. cstheory.SE, anyone?
Konrad on January 3, 2011 3:53 AMI recall that for a while mirror sites started edging out Wikipedia. Google apparently added a bonus for Wikipedia just to force it to the top over its clones. Is it reasonable to expect them to do that for everyone? I don't know.
Peter Da Silva on January 3, 2011 3:58 AMThis is exactly the reason why I supported webmasters.se so adamantly. Where else could someone go to get great expert help for problems like this?
I've often wondered about the case of Wikipedia, and how scrapers that simply syndicate it are quickly penalized. I think Google has _some_ sort of 'special intervention' when it comes to Wikipedia. I just hope they afford the same luxury to Stack Exchange.
Tinkertim on January 3, 2011 4:06 AMI have been frustrated with this in recent months also. A scraping website has recently copied the top posts from my blog verbatim without even attributing the original. Google somehow ranks the copies with a higher page rank, causing a significant drop in my stats.
The worst thing is that you are mostly powerless in this situation; who do you complain to, google?
Meekrosoft.wordpress.com on January 3, 2011 4:06 AMI was recently shown an application for blind testing search engines ( http://blindsearch.fejus.com/ ), and was surprised to find that Bing and Yahoo often delivered better results than Google.
Krisjoh on January 3, 2011 4:11 AMThere are search terms that do not list Wikipedia content on the first page, but scraped content as the first result, e.g. http://www.google.com/search?q=elvett+semic
I have tested it from Germany, if it makes any difference.
Residuum on January 3, 2011 4:13 AMIt is because contrary to wikipedia people don't link back to StackExchange, they just want the info and close the tab, so the original content provider doesn't gain anything.
But we should try to find why spammers are better than you, what are they doing better ? You should ask them.
Hokkos on January 3, 2011 4:23 AMIn a techcrunch post (http://techcrunch.com/2011/01/01/why-we-desperately-need-a-new-and-better-google-2), Vivek Wadhwa said the same thing. Trouble over google head ?
Tyseo on January 3, 2011 4:55 AMThis is a really interesting article. More and more recently these sites are taking priority but I have only seen it with stack overflow related content. I always end up ignoring them and trying to find the stack overflow link as ultimately the number of adverts show you the content didn't originate on that site but I think as other people have said it would be interesting to find out why these sites are taking priority.
Scsmith on January 3, 2011 4:56 AMWait - "broken" doesn't mean "broken beyond repair", let alone that somebody else would be able to provide a better fix.
At first sight this is merely a problem of Google neglecting to use a specific bit of information in their filtering, namely the information who copied content from who. It won't be possible to do this automatically with 100% accuracy, but I guess if you take all content from two webpages as prepared for indexing (that is, basically the plaintext conversion), compare them for similarity, and when found to be similar, downgrade the ranking of the one to appear most recently (which is not so very easy to determine) then you can mostly fix this problem.
The basic problem I see is that even with smart optimization techniques (e.g. limit comparisons to pages with similar sets of keywords) the similarity testing required probably won't scale. However there are ways around that too (e.g. don't calculate it at indexing time but spread the calculation over queries at querying time, to make ranking progressively smarter, but I have no idea if Google's infrastructure allows stuff like this).
reinierpost on January 3, 2011 4:59 AMWhat we need is a quality measure for search result and this can only be provided by the user. The best way to collect user feedback is by making the process social.
Google previous vote up/down is lacking in social interaction, it could for example shows how many users vote up/down for a certain site.
Charles Gunawan on January 3, 2011 5:10 AMIt's something annoying indeed. There will always be ways of cheating the system, but I think Google should close the holes in a faster pacing.
Leniel Macaferi on January 3, 2011 5:12 AMGoogle should buy delicious and add relevance based on number of bookmarks and it's tags. =)
When search results show only a bunch of add based with no real content sites I go to delicious and the same search most of the time brings helpful results.
Thank you for this. For near two years I've seen an increasing deterioration in the usefulness of Google results and thought that was what was happening to the web in general ("blogging is dead," for example). The few times I used another search engine, the results were better and that confused me, given my past experience with how good Google once was. At least now I know it's not the web in general and it's not been my imagination: It's been Google.
twitter.com/mikecane on January 3, 2011 5:25 AMI used to work at a search engine company, so I know this is a very difficult problem, and I honestly do believe Google is trying their hardest, but clearly not enough, and you have to wonder if it's partly because they are making money in many cases from those scraper sites (IE they have Google ads).
My site - http://www.ausedcar.com used to be in the top 10 on Google for the very popular keywords "used cars" but I've been knocked back to around #45 over the years simply because everyone in front of me uses black hat SEO. Including major million dollar companies! The only difference is that Google is not going to blacklist a major company, while they probably would blacklist me, so there is essentially no way for the little guy to do anymore then hope Google will give some scraps off the table.
The fact that you simply can't do a product review search on Google anymore without getting spam is serious trouble for Google. Unfortunately Bing seems to copy the same algorithm, if they were the "Anti-spam" engine, they could gain real ground.
Phil Anderson on January 3, 2011 5:26 AMThere's one thing that I never understand by any critic of Google (especially in the search domain), when you say:
"Still, looking at the statistics, it's hard to avoid the obvious conclusion. I've been told many times that Google isn't a monopoly, but they apparently play one on the internet. You are perfectly free to switch to whichever non-viable alternative web search engine you want at any time. Just breathe in that sweet freedom, folks."
What do you mean?
Do you expect that a better search engine just materialize out of thin air? What are you expecting? That perhaps Google should fund their competition? Maybe they should hold back a bit on improving their technology and let the other poor souls catch up?
One thing I dislike very much is when people feel entitled, like it's their birthright, to something they should work hard for. Have all the free internet services gotten you into the mindset of a spoiled consumer? This is not how entrepreneur thinks, this kind of thinking is not going to make the world go round.
Yes, you are right, you are being "churlish", show me a better search engine and then we'll talk.
Victorbstan on January 3, 2011 5:28 AMI'm a member of StackOverflow, but I got into the habit of using Google to search for results on StackOverflow. When I remember, I just scope the search to limit to only the stackoverlow.com site.
But, as your Amazon search experience, I'd just search StackOverflow directly if I felt the search results would be as good as Google. (I haven't compared recently)
Bikeoid on January 3, 2011 5:30 AMMoving away from content-based ranking feels scary to me. I'd rather have things stay as they are. Or, if you will, give the social search engine as an optional approach to enrich the algorithmic search.
I feel the problem with Google however is not the algorithms, but the absence of essential information that can no longer be ignored; i.e. Google has to stop presenting results as a veritable shopping list and seriously consider the introduction of categories into its search engine.
This much was attempted by Cuil and was my favorite feature of that otherwise failed attempt at producing an alternative to Google. Backed up by intelligent algorithms like Google is capable of doing, scrapers wouldn't be able to avoid being moved to their own category away from normal searches.
Mario Figueiredo on January 3, 2011 5:39 AMI remember very well that i asked some friends if they've noticed how google results deteriorated the day google instant search was deployed.
Another clear difference is in the way apple sdk documentation don't appear anymore among google search results ( although in that case it may very well be apple's decision).
this may be overly simplistic but could the magic dial google needs to turn simply be to adjust how the date something is published adds to the ranking?
on most searches say for "lady gaga" they should return the content with the most current date.
on a search for "binding a select list with MVC.Net" the scraped site is going to have a more recent date but should be ranked lower.
either way this problem seems to show up when using google to research technical solutions more so than other things.
I link to StackOverflow quite a bit in my Buzz. I wonder if Google has though of using Buzz as an input? :)
Peter Da Silva on January 3, 2011 6:07 AMGoogle has been giving terrible results for product searches for a long time now. I tried to research a new dishwasher a few years ago and Google was a mess.
Google's main problem right now is that they think they can stay ahead of the black hat SEO with algorithms. Take it from somebody who used to work in the anti-spam and anti-virus world ... you can't! You need to use more than algorithms, in particular you need to use feedback from your users.
Google has a huge database of data on it's users. Google knows I've been using google and gmail for years, that I'm a real person, I use Google search dozens of times a day and I'm pretty technical. If Google had a "this is spam" or "this is not useful" button the would have millions of pre-validated curators to help them filter results.
Robert Osborne on January 3, 2011 6:22 AMI've been seeing a ton of efreedom.com & questionhub.com data dumps of stackoverflow outranking you all through the holiday break.
Comforteagle on January 3, 2011 6:25 AMI don't think "social" is the answer to better search results. In fact, I think it makes it worse.
- First of all, there is no reason to believe that spammers cannot manipulate a social search engine. Just look at sites like Digg.com where at one point people were getting paid to blindly vote content up.
- Even with educated, moral users minus the spammers social search ranking would most likely result in search result popularity indicators, which is not necessarily actually the best search result.
- It is questionable in the first place whether users at all would rank search engine results. They just want the result.
I'm hoping this recent problem can be solved by an algorithm, as I place more trust into that than mankind itself.
Fchristant on January 3, 2011 6:26 AMThe scrapers are probably doing lot of SEO optimization. It is time for stack overflow to hire some SEO services. Wikipedia is not monetizing in anyway other than donations whereas stackoverflow does display ads of its own so why not hire someone to do SEO and stay on top?
Nilesh Jethwa on January 3, 2011 6:43 AMAs I understood you're talking about scrapers, that try to cheat google algorithms (if the algos change, the cheating will eventually evolve) but you're blaming google, and not the scrapers??
What about SPAM, whose fault is it?
(just to be clear, I see google in this case more as a victim than as a villain)
Leonardo de Oliveira Martins on January 3, 2011 6:56 AMI hate to play devil's advocate here, but I've noticed that efreedom.com, one of the SO scrapers, actually provides significantly better search and related question generation than SO does. There's a fine line between being a leach and being a value-added content aggregator.
Erikengbrecht.blogspot.com on January 3, 2011 7:07 AMI've had a bad taste in my mouth for all things Google for a looong time.
At first there were some growing pains with it, but I've moved to Bing as a search engine and at this point, to me it feels speedy and lean like Google used to.
When I have gone to Google to look something up, I find that the result list is a near useless mess of ads, spam, and a general waste of time.
me.yahoo.com/a/wFcHGRpgnua_OqaxYt1WsXU_M_Lw1A-- on January 3, 2011 7:07 AMAs I understood you're talking about scrapers, that try to cheat google algorithms (if the algos change, the cheating will eventually evolve) but you're blaming google, and not the scrapers??
Google has always been about organizing oceans of chaos into something manageable and searchable. Lately they've failed in two respects. First, their searches are finding more noise and less signal. Google is supposed to be chock full of the best minds on the Internet and their algorithms are being beaten soundly. Not only has SO been s
Clintp on January 3, 2011 7:24 AM(continued... damned mouse touchpad error)
...Not only has SO been squeezed out of the rankings, but I found Christmas shopping online to be much worse this year than last because what I wanted was buried under crap search results. Maybe they've stopped trying or caring about search.
Secondly, Google is the 800lb Gorilla of the Internet. If they wanted, they could simply crush anything that opposed them. I've been dying for a "never show me results from this site again" button in Google's search results. One click and the offensive scrapers go away.
Bing, on the other hand, seems active in tweaking and refining results...
Clintp on January 3, 2011 7:30 AM>> I've been dying for a "never show me results from this site again" button in Google's search results.
Indeed. A content-based algorithmic search with the addition of user tools should be the way to go. I'd really like to manage my search results, and I don't mean in the way of voting for links like Google has implied sometime ago with their social searching services. That won't solve the problems and will introduce new ones (like social engineering or regional/cultural encroachment).
The current model is becoming expired and the "market" of content consumers is becoming less relevant in Google search results. This was once the great novelty of Google and what elevated them to their present status.
Mario Figueiredo on January 3, 2011 8:11 AMReminds me of Ben Croshaw's comments on user-created video game DLC: "...and don't tell me user ranking is the answer, because anything that references Naruto will automatically get five stars."
Chris Doherty on January 3, 2011 8:43 AM@Clintp Interestingly that "never show me results from this site again" button used to be in google's results but isn't anymore. I miss it! Meanwhile we can try this extension: https://chrome.google.com/extensions/detail/ddgjlkmkllmpdhegaliddgplookikmjf
Scott Willeke on January 3, 2011 8:44 AMI've noticed Google results being gamed for the last 6 months at least. Too many sites devoid of actual content being listed at or near the top of the results. software.informer.com was the first I noticed, but it's only gotten worse.
Lately I've switched to duckduckgo.com and msdn for any technical searches. MSDN even includes stackoverflow results!
Jobu on January 3, 2011 8:45 AMThis is an interesting twist on the software monoculture problem. By being the overwhelming favorite, Google gives spammers a single target to focus on.
On the other hand, unlike viruses, there are no inherent platform differences that prevent spammers from tweaking their content scrapers to poison other search engines as well. So if Bing were to gain more market share its results would likely start to drown in noise as well unless they have some secret sauce (or armies of content reviewers) to defend the walls against the barbarian hordes of spammers.
The problem with adding a button that says "this web site is useful" or "report abuse" is that it shifts the battle to gaming that metric instead.
The other problem is that from the algorithm's perspective, information is information, so who cares if some content scraper serves up information copied from someone else as long as you the searcher get the answer you're looking for, right? In some cases, the sites in question could be legitimate mirrors. When discussing flaws in the ranking algorithm, this goes to the heart of how you phrase the question -- assuming there are no flaws in the software, it is probably performing as intended, and this is essentially a garbage in/garbage out problem. Unfortunately, there is a lot of garbage on the Internet.
Peter Amstutz on January 3, 2011 8:46 AMI do find that if I search for a product I end up getting drek back, but on the flip side it is drek offering to sell me the product, which is a reasonable assumption. The search for iphone cases mentioned above is a good case in point.
I will admit to being somewhat of a Google fanboy, but then I also have a lot of patience and am willing to venture as far as page 20 in search of useful results, I'm also making friends with Google shopping, not to mention continuing to read magazines. I find that helps me find products to search for.
Personally I've never had a problem with searching for technical data, but I understand that if you're producing it, that could be an issue due to the amount of drek and scraping that is served up.
I'm sure they'll get on top of it, otherwise people will vote with their feet so to speak.
John Doh on January 3, 2011 8:52 AMIt seems like search result quality went down with the recent real-time searching update. Maybe stuff like real-time twitter search beating Google made them rush out a non bulletproof real-time algorithm. Most likely this will improve with time as long as search is still the highest priority at Google.
Ryan Christensen on January 3, 2011 9:06 AMGoogle is not gravity, it is a part (only a part) of the mechanics of natural selection (Darwin is The Dude, not Newton). The ecosystem has changed, which is all too predictable, but the law (force?) is still there. That something else will have to be added to the picture, possibly on top of Google - probably inside it too - is pretty obvious: we a still at the rehearsal phase of the Web. And Google still has a lot to show in the infrastructural front. Also, we need to think differently about our own data, and we will.
But I have to say, agreeing with you, that it is not the semantics, the secret is still on the syntax. Syntax is our stuff.
Aslemos on January 3, 2011 9:27 AMMaybe it's time to use blekko?
Adam Rich on January 3, 2011 9:32 AMJust thinking out loud, but Google did make some changes this year... adding Caffeine and all.
Ian Philpot on January 3, 2011 10:01 AMI wholly agree as to the poor recent performance of Google's search results. I was just recently looking for binaural audio (I have my doubts, but was curious) and the results Google returned were garbage, with many results hosting identical content. The scammers are currently winning, or Google has failed to continue to implement its core concept.
That being said, in the search realm, this seems similar to what happened in the earlier days when Lycos promoted itself as having more indexed pages than any other search engine. That was a relatively easy hallmark to beat, so Lycos died rapidly. If someone else comes along with a much better search algorithm than Google at this stage, they just might have inertia like the early Google did. Or, more likely, the wealthy Bank of Google will buy them out.
Greg Webster on January 3, 2011 10:06 AMGreat post, and a great idea IraĂŞ to add Delicious into the algorithm.
I agree that we should be using other search engines and also like comparison engines such as http://blindsearch.fejus.com/ but believe it will take a huge upheaval for what we have learned as a species over the last 10 years to change behaviour on the sort of scale needed to alter the statistics as above.
Google has fallen foul of it's own algorithmic success and is at a stage where diversification seems to be their strategy rather than adding in other methods for improving results. Personally I can't see Google winning and am keen to see what new players will come into this space. I believe more complex algorithms that search multiple resources and think/calculate longer before results are returned could be the way to go.
timaldiss on January 3, 2011 10:07 AMWhile annoying when it happens, this is really nothing new. Several years back Google search results were inundated with parked domains only serving ads and other such useless pages. Google eventually cleaned up their algorithm. Sometimes their ranking system is gamed, other times they may make changes reducing the quality of rankings.
I considered switching to a new search engine and before I could find a better one, they fixed the problem.
Complain and then give them a chance to improve. It's a well established cycle that works very well for them.
Michael Silver on January 3, 2011 10:45 AMI've been noticing for some time now that the results have not been as good as they were and I regularly find myself either trying more and more detailed terms or other search filters to help find what I'm looking for, and often it is not in the first 3-5 results.
Right now, my default engine is Bing and I'm finding better results there for some queries but they haven't indexed as much of the web as Google has so Bing's results are good, but still needs more time to develop.
Google on the other hand has been getting less and less reliable and I'm finding more and more content scrappers showing up in the top results and from what I can tell I've been noticing the decline in quality ever since the Mayday Update from last year. Since that time the quality of results for longer tail stuff have been a lot more inconsistent and there seems to be a lot more non-relevant data showing up in the longer tail searches.
Also, I don't like Google trying to show me results before I even typed in what I want. If I'm looking for a new car, and type in 'new cars' in Google, they end up giving me 'netflix' for the first letter n, 'netflix' again with the second letter added (ne), 'New York Times' when I type in the third letter (new), 'New York Times' when I add the space (new ), and if I add the first letter for the next work I still don't get what I'm looking for 'New Century Bus' for (new c), 'New Carrollton Metro Station' for the next letter (new ca), and it still doesn't get the search right even if I type in the last letter so now I've typed in 'new car' and the results that I'm getting in the instant are for 'New Carrollton' and noting about new cars is coming up. However, if I look at their suggestion list the actual result that I wanted is showing up as #3 on their list, but the problem is that I typed in the exact term I wanted to search for and instead of giving me what I wanted, I'm getting some other search that is not even close to being relevant or of any quality for me. With this in mind, I personally think their instant search is a joke, waist of time, and a big nuisance because I got 7 different search results for what I typed in and not one of them is actually relevant or what I'm looking for. What a big mistake Google made on releasing what I call their 'Crystal Ball' search where they are trying to predict what you want but are not doing a very good job of it.
SalSurra on January 3, 2011 11:12 AMGreat blog post. I can't imagine what effect an increase of that 88.2% will have if Google makes a better model.
Brian R. Bondy on January 3, 2011 11:24 AMJeff,
Do you continue to interact with Matt Cutts on this matter? Matt's opinion would be most interesting and most convincing.
You have not provided any material evidence any particular search, etc. to prove what you said in this article. Do you own any statistics? I admire StackOverflow, I know that you are a well know person in Web / programming circles, but it lately became very fashionable to attack Google.
Somewhat related article was published on TechCrunch (no real data either) at http://techcrunch.com/2011/01/01/why-we-desperately-need-a-new-and-better-google-2/
I'm pretty sure there will be related discussion in Google Buzz, hopefully with Google employees, including Matt Cutts at
http://goo.gl/6eVTw
http://goo.gl/xCDsD
Thanks
Vladimir Kelman on January 3, 2011 11:26 AMOver the last year or so I've noticed Google's results getting worse. I basically taught myself design and front-end development by googling. Now, when I try to google for the most basic of searches I have a hard time finding the god content that was once right in front of me.
I've had to type more detailed searches, use the timeline features on the left sidebar and I started using delicious as a search engine for web related stuff more and more.
An article I read last night mentioned blekko.com (often times much easier to find what I'm looking for there) - I tried it out a few months ago but added it to my bookmarks bar recently and have been getting used to using it the last few days.
I'm glad other people are starting to notice google's bad results and hopefully the more people talk about it they'll the more google will work on improving and getting back to the search results of a year or two ago when I was actually able to find stuff...quickly.
I also noticed 'stack scrapers' recently and figured you guys would be a little upset. It's not their users creating the content, why the hell should they make money piggy backing off of you. But yes, if someone scrapes your content the original content should always be placed first in the results.
Good luck, hope you guys get it resolved and hope I get my google back.
hav0k on January 3, 2011 11:27 AMI don't think we should be surprised by this. Google might say that they're in business to make the world a better place, but let's be honest... they have a responsibility to their stock holders to be as profitable as possible, and I believe that letting some of the scrapers move to the top of the list can only be padding the bottom line at Google. After all, how many of those scraper sites have placed AdSense ads on their sites?
AJ Rabe on January 3, 2011 11:29 AMSo, let's see:
1. Google's primary income comes from AdSense ads.
2. StackOverflow doesn't have AdSense ads.
3. efreedom have Google AdSense ads.
If you were Google, which site do you want people to go to?
You do the math.
InsomniacGeek on January 3, 2011 11:45 AMPerfect example: I just searched "Android tablet" and clicked on the Google News tab. The 1st link I'm offered is for a site "TMCnet.com". It talks about Toshiba's new tablet being launched later this year. But throughout the article, it constantly links back to the "REAL" sources of the article, engadget and crunchgear.
So, how does an article by TMCnet.com, which is basically regurgitating what the other 2 more legitimate sources are saying, JUMP AHEAD of the actual sources? How is that possible? Interestingly, I didn't find any Google Adwords on their site, so that's not the motivation in this case.
RobertNaum on January 3, 2011 11:48 AMYes, converging feelings about social vs algorithmic search, I would say it is a recurrent rethoric now at each year's end.
Last year's passage (2009/2010) we had predictions and high praise of "real-time search" which could make Google bite the dust.
Where is real-time search now ? Where is real-time social search either ? the point of it and the results of it ?
thierryl on January 3, 2011 11:54 AMI have long held that Search (ie, Google) would eventually bend to an anti-network-effect, where the SEO-gamers would eventually win and smaller search engines would flourish. I'm fond of DuckDuckGo right now. However, Google has gotten their edge back several times over the last few years, and I wouldn't count them out. I agree that Google is currently losing, and I recommend people try the blind search tests themselves.
I believe Google's strength will come in personalized search results, and I don't think Google is using personalization as much as they need to. If I bypass all the other links and go direct to StackOverflow every time, it would seem that - for me - this should work itself out quickly. I would be interested in whether your experiments with StackOverflow were with "clean accounts" or crusty accounts like mine, where the searcher is a known technologist.
Regarding someone who posted about categorization, Google's Caffine architecture, and the search results currently being returned, show a high amount of diversity. They're clearly categorizing and showing "best in category" in the front page results. This effect is positive for a number of kinds of search, but negative for "tight searches" (like iphone 4 covers, where you might end up with a ipad cover taking a slot due to a diversity algorithm, to the point where you're only delivering 4 or 5 results that were tight).
An improvement I would like to see in Google is a Google Labs experiment with a prominent "more like this" button. I'd rather do my first search and drill down, and it's clear Google has the categories and pre-calculated math to do so.
If I were out to game Google right now, I'd be building a very human-like browser (or using mechanical turk) to search for terms and click on my links. I suspect Google has greatly raised the priority of link-click in their reputation scheme, and gaming that system wouldn't be terribly hard. The benefit of blending personalization is hopefully I don't look like most mechanical turks.
Bbulkow on January 3, 2011 12:18 PM@Konrad: "To stay in your metaphor, it’s clearly time for a paradigm shift, a kind of Einstein of web ranking algorithms."
That's not staying in any metaphor, bud. It's all over the place. =)
Tony on January 3, 2011 12:19 PMI'd written about what is the intrinsic flaw of algorithmic search a while back, this may be of interest:
http://lesswrong.com/lw/28r/is_google_paperclipping_the_web_the_perils_of/
AlexanderM on January 3, 2011 1:10 PMNot meant as an insult: BUT it's very very difficult for Google to decide if a site is a content farm, a ripoff or "valid" content. How should Google decide if a link at stackoverflow is a link that comes from an SEO idiot or valid? If a link on del.icio.us/digg/reddit is valid or simple SEO?
But I'd agree that Google should react faster. Especially ripoffs that do not confirm to the cc-license could be detected (at least mostly) automatically.
Ulrichvoss on January 3, 2011 1:13 PMA social approach is the solution, but it MUST be designed so that it may not be gamed. The best way to do that is to allow me to vote up or down the search results and to allow me to blacklist/whitelist sites and to OPTIONALLY include my friends black and white lists.
It is this component of including friends, i.e. people I already trust, that ensures that it won't be gamed. If a friend of mine tries to game me, he/she won't be my friend for long. So it's self-policing.
Charles Scalfani on January 3, 2011 1:15 PMShoot me a DM next time you need a case, I'll send you a good one for free, save you from needless Googling and Amazoning! :) http://www.myGearStore.com
Benvanderbeek on January 3, 2011 1:17 PMGoogle does a great job filtering spam in gmail. I'm not sure how important the "report spam" button is in this, but it is certainly somewhat satisfying to press it.
Where is the will to do the same thing for their search results?
I'd like to see a similar button in Chrome (for starters) for social rating of spammy websites. Other comments have noted that this would shift the goal posts to gaming the social rating system.
Maybe one solution to this would be to weight ratings by reputation. So Google detects that you are someone who rates spammy web sites highly, and devalues the rating you have applied to all other sites.
Jasonharrop on January 3, 2011 1:18 PMYou're right on except for this one statement:
"when was the last time you clicked through to a page that was nothing more than a legally copied, properly attributed Wikipedia entry encrusted in advertisements?"
On a growing number of search queries in Google, I'm seeing results from Ask.com that are Wikipedia articles with ads outranking the original Wikipedia posting.
Jakeludington on January 3, 2011 1:26 PMI've been running into a flood of these scraper sites in my search results, and more than anything I just want to exclude them. I would like to click a link next to the result to exclude that site from future searches; there's no content I'm interested in on that site that shouldn't show up as a hit on the original source.
Providing that feature might solve the problem for two reasons: I don't see the scraper sites so my searches are more to my liking (and google works for me so I come back to it,) and also, google can use a large number of explicit "exclusions" to affect the rankings. They could treat it as feedback, equivalent to users saying "this site is not relevant."
Jim Rogers on January 3, 2011 1:51 PMI actually have been hoping for a change in Google for a while. While Blekko shows promise, it isn't exactly what I was hoping for. And, though I know some of the following are a bit of a stretch now, they will be invaluable in the future.
First, I want to be able to filter results from my search. (This is the opposite of what Bbulkow suggested), but his option would be good too) I want to be able to click something which says, "This site is bogus and should not be in this result set" or "That has nothing to do with what I am looking for". When I look for a legitimate answer for a question, I want to be able to tell Google to take about.com and shove it.
I want to be able to search for symbols. I mean seriously, if I'm trying to find an email, why does it need to be changed from "foo@bar.com" to "foo bar com" (I'm a bit sensitive here, my last name is Allen-Poole).
True Boolean logic. I want to look for ((this and that) or (that and another)) and not (some-other-thing).
I want a means to search for linguistic constructs. For example, if I am looking up John Smith, I want to have a search which looks for the name (two words in close proximity, separated by a middle name or a middle initial). This is more than possible.
I want regexp. That is just insane though. I don't expect to grep the web any time in the near (or maybe even distant) future.
And what am I willing to trade? Time. I remember the 90's. I remember preferring AltaVista because its results were just slightly better and its logic seemed more reliable. But the amount of time I will save in proper results is invaluable and worth far more than whatever extra seconds that it takes crunching the numbers on their end (even extra minutes!).
Just think about this: it takes at least a second to read the title of a google link. It will take another couple of seconds to evaluate the text beneath. It is also not unreasonable for a website to take 3-5 seconds to load completely (though this can be optimized with tabbed browsing (though that can also decrease as the full
Now, if you were to have a questionable search (say, the dishwasher ratings example) and the first result is bad, the second and third result are maybes, the forth result is Amazon, and the fifth result is one which is relatively useful. This means that you will waste at least 5 * link-text + 4 * subtext + 2 * site-viewing to get to the result (assuming you stay on the good result). That makes a minimum of 19 seconds of completely wasted time before getting to something truly useful. In this case probably more because you likely will stay on the mediocre results for a while longer than 3 seconds.
If Google were to give us these options, if it were to make our searches better, even if it were at 100% increase in search time, we would end up with a net benefit (I've not had too many searches take 10 seconds recently). The first point alone could net some extreme benefits and it reminds me of (http://en.wikipedia.org/wiki/Travelling_salesman_problem#Ant_colony_optimization) a solution to the travelling salesman problem. And, while this is still something which advertisers could use to our disadvantage, it would be a lot harder for them to do so, especially if these were implemented on a per-user basis.
Now, I know that I am a lowly voice in a sea of spam, but seriously. Google has the ability to implement this. I've read their specs and I think that, if they wanted, they could even make a way to grep the web. For the first task, it wouldn't even need to involve stored data -- it could all be tracked within one session. The next question is whether Google will care.
Amusingly, I feel it obligatory to add a link to http://allen-poole.com so that some day Google may look upon me and smile.
Cwallenpoole.wordpress.com on January 3, 2011 1:52 PMThe only feature I need Google to implement right now is giving me the ability to blacklist sites in all my queries. I've long wished that I could blacklist experts-exchange, and with the proliferation of scraping sites over the last year, that desire has become even greater.
You could possibly make it social (my "friends" blacklist can be added to my own), but don't use blacklists to influence rankings. And no, don't do any peer votings for ranks either, as these will lead to more abuse and just be added to the list of SEO techniques.
This can't be that hard. I can already add "-site:experts-exchange.com" to my queries to remove the sites. Why can't it be an option in my google account setting to add that to all my searches.
Evan Morgoch on January 3, 2011 1:55 PMI find it hard to believe that this isn't intentional by Google, although I imagine the attention gathered by this article will change things dramatically.
Davidsimbroglio on January 3, 2011 1:57 PMBtw, there is an extension button for Google Chrome to report spam. It automates a part of filling in their report form.
HenkPoley on January 3, 2011 2:10 PMIts not you dude. Google is becoming the new Yahoo, one spam result at a time.
The sad part is without google the web is nothing. With all the technology improvements, nothing has really improved.
Its time to get VC out of tech and start building things that work.
The web has turned into a get rich quick scheme.
Hi Jeff, I passed on the examples that you sent back in December and the team is actively looking at improvements and changes they can make based on that feedback--thanks for sending it.
I was curious about the link to "Google, Google, Why Hast Thou Forsaken the Manolo?" and so I checked that one out. It's true that our algorithms don't currently think that's a great site, so I looked into it more. The disclaimer says "Manolo the Shoeblogger is not Mr. Manolo Blahnik." It's a *different* Manolo in the shoe industry.
So I picked a url, let's say http://basement.shoeblogs.com/category/bedding/ . Pretty much every post looked like "buy this type of bedding," usually with an affiliate link. And over on the right-hand side are links like "Shop hassle free and buy unique Duvet Covers at thecompanystore.com" that look an awful lot to us like paid links that pass PageRank.
I support the right of this blogger to put whatever they want on their domain, but I also support Google's right to decide how to rank our search results, and I don't think we should be obligated to rank that site highly.
I appreciated the rest of your post and it's safe to say that people inside Google are discussing it and how we can do better.
Matt Cutts on January 3, 2011 3:03 PMI'm no expert but what about taking the new syndication-source and original-source meta tags a step further.
Original content can be pinged, timestamped etc. w these tags. Webmaster tools could be used to report sites that are outranking the original content. Database would verify and adjust rankings.
Rewrites etc. would still happen but should help clean things up a bit in addition to giving content producers (and Google) an easier way of dealing with this problem.
Kevin_Szprychel on January 3, 2011 3:41 PMThis looks pretty inevitable.
Two aspects come to mind:
1) It's an algorithm not human thought that's at work here. That gives an arms-race, the dark-part-of-SEO will catch up even if they started out pretty dumb.
2) Google makes it's money from adverts. A site that has a lot of adverts is working for Google. I really can't imagine them stamping on such sites like they're bugs. For me some of them are just that bugs, so we have a disconnect! In the absence of a published algorithm (even if it needs updating daily or more often) this sort of suspicion can't be resolved.
Human judged content (DMOZ anyone!) looks like an answer. Many times I look at what "social" delivers I shudder. A great average of everybody, it seems to me, is not the answer.
Maybe the web just needs to fracture. Personal control over how your own search works, sharing data with people who's opinions you respect, sites that work your way, less rubbish, less time waste, more productivity.
We could end up with different worlds, as sketched in some SciFi books for a long time. Those who live on the web, consuming, following, never creating. Those who disconnect, think for themselves enough that they deliver new and valuable work.
The web has altered our lives. It's time those who care get back into the loop. Control your web so that your life is yours, not a side-effect of a cacaphony of "important" web companies.
Mike Gale on January 3, 2011 4:03 PMSeems to me that relying purely on content for indexing isn't going to work any more. Each web site comes from a hierarchical division of address blocks. The existence of a "bad" web site within a given address block can and should impair the score of every other web site within that address block, to a lesser degree as we ascend the address block hierarchy. The same concept should be applied to registrars.
In other words, if my ISP hosts a lot of spam sites or there are a lot of them in my address block, my site is going to take a penalty, regardless of its content. I therefore have an incentive to seek out a reputable ISP, and reputable ISPs have a very solid reason to push out spam sites.
Eradicating this trash means making it harder and harder for it to find a "home". I can't think of a better way to do that than to have ISPs actively working on the problem, to retain their wider customer base. If they don't have a wider customer base, and it's all spam? Page ranks from that ISP will snuggle up to each other at the bottom of the pit.
The basic problem here is that its too hard to keep adapting like crazy to all the ways of restructuring content, times all the possible web sites. The number of ISPs and address blocks is, however, entirely tractable for this kind of problem.
Of course, this can punish entirely innocent web sites, until the system as a whole shakes itself out. It would be nice to have this particular omelette be break-free, but I don't see how to do that.
Ross Judson on January 3, 2011 4:12 PMI use Bing at work and Google at home (don't ask) and as odd as it is, I do get better results with Bing.
Craig Deubler on January 3, 2011 4:21 PMThere was an algorithmic thing back in 2006 that included TrustRank. While it wasn't exactly a social recommendation type of thing, it did distribute GoogleJuice based on links from trusted sources.
http://weblogs.asp.net/jgalloway/archive/2006/01/11/435076.aspx
If there's an element of TrustRank in the current algorithms, it seems like that probably needs both a reset and a higher weighting.
Jongalloway on January 3, 2011 5:06 PM@Matt Cutts
I can see you took the time to read, analyze and post a comment. That's very decent of you. Unfortunately I can also see you only addressed Jeff, ignored any comments from commenters in here and approached the matter purely as a ranking issue.
Since Google Search is meant to be a service to the "user who searches" and not a service to the "user who publishes", I'm unsatisfied by your comment. But not surprised.
Mario Figueiredo on January 3, 2011 5:18 PMThe whole PageRank conundrum reminds me of the parable in Gödel Escher Bach about the phonograph that breaks when you play a specific well designed record. GEB was referring to incompleteness but it's an equally good metaphor for computer security and quality-algorithms like PageRank.
If there's sufficient motivation to find your algorithm's weak points and exploit them, it's going to happen. Complicated algorithms just require more complicated and better designed inputs.
Justin Scheiner on January 3, 2011 5:19 PMHi Mario, it's actually my 11 year anniversary this week. I'm out of town with my wife, so I only have limited time to slip away and post responses. Suffice it to say that plenty of people in Google have read this article and the other articles Jeff mentioned, and lots of people will be discussing what we need to do next to improve things.
Matt Cutts on January 3, 2011 5:28 PM@Matt Cutts
Yes, Manolo the Shoeblogger's site is like a lot of fashion blogs, in that it has decent number of affiliate links.
However, you didn't answer the central question posed by the Manolo in the post you've referenced, "why are the scrapers ranking higher than the original content?"
You, and others at Google, have harped on for years about the need to produce interesting and original content, and yet, if a site which produces plenty of original content doesn't throw exactly the right levers in Google's Rube Goldberg system, you'll preference a dozen content scrapers over it.
Manolo is very well known among fashion people and in the fashion press, in fact he pretty much invented the Fashion Blog...
http://en.wikipedia.org/wiki/Fashion_blog#Early_fashion_blogs
So, again, why should the content scrapers who are stealing his work be ranked higher than he is?
Del Davis on January 3, 2011 5:54 PMLets hope they do and things do get improved, Matt. There's been a growing disconnect between Google Search and its users for the past... couple of years, I'd say. To the point that previously very rare statements like "Google search engine isn't good anymore" are becoming more prevalent. Something that would be unthinkable before.
Being that this is also the period in which Google introduced the most relevant new features and changes to the search engine UI since its inception, maybe it's time (and excuse me the bluntness) Google realizes that may not be what users actually require the most.
I'm prepared to accept also we are simply a non representative minority. But I do seem to witness a growing cry of protest. With alternative search engines taking their place in the market offering competitive possibilities, all care is not enough. Remember how Google itself rose.
And my congratulations, BTW!
Mario Figueiredo on January 3, 2011 5:57 PMLike it was said way upthread... I see this as a manifestation of the Windows/Mac malware thing. All the bad guys are optimizing their dark SEO for Google, not Bing. If they decided to focus on Bing, given time to catch up to their extensive knowledge of Google internals, the same would happen to our bingy buddy.
A thought: does Google factor in domain registration time to its algorithm? This seems like a reasonably accurate heuristic for tracking original content vs. scrapers. Obviously a "reused" scraper domain would be the problem.
0xabad1dea on January 3, 2011 7:01 PMI remember seeing this auction on flippa a while back:
https://flippa.com/auctions/102189/1-iPhone-case-site-11kmonth-profits--2-million--pageviewsmonth
Thats the number one search result for "iphone 4 case" above apple, above amazon...crazy. Their auction description gives more detail on their seo efforts.
I troll AM forums like wickedfire where I often find insightful threads like overstock dominating the SERPs for very generic keywords. Why is the #1 result for "watches" "luggage" "crib sets" Overstock? arent there more deserving and relevant results? is google and overstock profit sharing?
http://www.wickedfire.com/shooting-shit/111865-fuck-overstock.html
As a web developer and SEO enthusiast, I've been increasingly surprised at how hard it is to find anything on Google anymore. This holiday season was particularly frustrating. After three or four attempts to find a Tiffany's bracelet, I gave up and went over to Bing, where I actually found several pages of relevant content to choose from.
One of my professors in college told about some theory (can't remember the name right now) where as you try and narrow down your hypothesis to get more and more specific, at some point, you actually become less and less effective and what you're trying to achieve. He used the visualization of an hourglass. As you narrow your results, you get to a finite point (the actual apex where the sand drops into the next chamber), after which point you get further away from what you're trying to achieve.
To me, this is where Google is right now. They're trying WAY too hard to continue to generate revenue while delivering the most personalized, specific results to the user. People are catching on and their gaming the system, without penalty. The fact is Google is broken and it will take a while before it's fixed.
Crash Override on January 3, 2011 7:25 PMWell, my blog / site is a lot smaller than yours, and I have put zero effort so far into SEO, but I get very little traffic from search. Most of my traffic comes from Twitter, Hacker News and DZone, with DZone being the biggest contributor.
In any event, I do think you're right about the content farms and other "spam" clogging Google. It seems we need some kind of reverse Turing Test. A *person* can easily tell a chatbot from a human conversation partner, but can a few cubic miles of MapReduce engines?
Well, unless you count sports reporting, that is. ;-)
http://borasky-research.net/2010/12/30/sure-why-not-five-predictions-for-2011/
Znmeb on January 3, 2011 11:00 PMReally useful article,, thanks google chrome..
Indian Sarees
sarees on January 3, 2011 11:17 PMI have been disturbed by Google for sometime and in actual fact never use it as a search engine. From a hidden program installer on my computer to the new Google chrome which outright says its keeping all and I mean all your information on a cloud. It wont even let you download programs. I also noticed recently that Microsoft gave the makers of a game information with regards to how many players were playing the game for the month as well as how long they played it. Seems to me every iota of privacy is disappearing. Quite frankly it scares me silly that people continue to believe it will do no harm. What if the USA Government or some other controlling force demand the info be handed over? The same with the new Apple patent with the application they intend putting in your phones? There is really only one person you can trust with your personal information and that is yourself. I am not a scare monger but I have seen what harm a dictator can do. The only way to keep yourself and your information safe is to totally keep it off the net.
Nicki on January 4, 2011 12:02 AMWhen I have a programming problem, I google for '{error message} {platform}', get the useless results, and then google for '{error message} {platform} stackoverflow' and i get a lot of good SO results and nothing else. In that respect, the system works, but yeah - you guys are missing out on a hell of a lot of good traffic.
John Senner on January 4, 2011 12:10 AMInternet would be a much better place if we finally deploy some cryptography based ranking/kudos solution. I don't know -- like http://www.bitcoin.org/ but for rankings. Something that would made SEO impossible at all.
dpc on January 4, 2011 1:09 AMSocial search will work until the minute it wont work anymore. It' won't take long before spammers and scrapers find a way to beat the system like they managed to beat algoritmic search engines.
It's allow more easy to beat a social sistem than a algorithmic one. I'm just saying, humans like to see the dancing bunies.
Pop Catalin on January 4, 2011 2:00 AM+1 to personal website blacklists: I've felt a need for them for years. If I take the time to click a link, and understand that a website is somewhere I never wish to come again, I would like to be able to leverage that investment.
+1 also to being able to "follow" other people's blacklists. It would be very dangerous, though, to make the stats about following public: it would probably create a gravity effect towards the most followed blacklisters, who could then become too powerful - and be tempted to monetize that power, as it happens on many social networks. This should be of interest to Google - it would also allow them to have better social graph data.
If enough "trust communities" will grow, this should also lower the incentive to game the system, making SEO-only websites less lucrative.
Talking about the scrappers, there are two very different situations here. The first one is where the content is Creative Common (or similar) and legitimately reproduced. In this case, there is no reason why Google should automatically give the first publisher a better ranking: if some of the scrappers published it in a "better" (whatever the metrics) way for the searcher, why shouldn't it get a higher placement? It would be very nice, though, if Google aggregated the similar pages, like it does in news: it is very annoying when you click several copycat links in a search.
The other case is when someone steals the copyrighted content. This should definitely be penalized, and in theory it should be the law to do so. Considering the reality of things, though, it would also be very much in the interest of Google to help the original content producers protect themselves - giving them an incentive to produce even more good content. Considering how fast, and how often they crawl the web, Google could very often find who really published first, and if there were a meta tag about the copyright and license, it could at least warn the original publisher, if not find a way to penalize the stealer.
Daniele Mazzini on January 4, 2011 2:39 AMI think what this shows is that over the long term, Search as we know it is broken.
The only way out will be "hybrid curation" -- basically back to the Yahoo model at the highest level -- with algorithmic results (ala Google) into the depths of the curated high level web sites.
Amazon is a great place to research products even if you don't shop there because there is curation and great reviews.
[dc]
Dave Chapin on January 4, 2011 3:32 AMwell Matt C did say thatgoogle had taken resource away from some aspects of antispam and that would be returing this year.
Hauntingthunder.wordpress.com on January 4, 2011 3:54 AMOne of the biggest flaws of Google is that it gives way too much importance to domain names. If you search for 'iphone 4 case' you see plenty of websites like iphone4case.com, getiphonecase.com, iphone4gcasereview.com, www.iphone-4g-case.net, www.4iphonecases.com etc. I don't understand why domain name is such an important criteria for ranking search results. That needs an immediate fix.
Vasuadiga on January 4, 2011 4:27 AMInteresting article. My responses to a couple of the comments:
First: "[...] where the content is Creative Common (or similar) and legitimately reproduced. In this case, there is no reason why Google should automatically give the first publisher a better ranking: if some of the scrappers published it in a "better" (whatever the metrics) way for the searcher, why shouldn't it get a higher placement?"
As a consumer I would rather reward (with traffic) the content creators for making knowledge available to mankind than I would the scrapers who have not generated anything new. By doing this, I assume, I am encouraging them to continue to create content - looking at the scrapers instead is less likely to have that effect. So, the scrapers are by definition not "better" and if the metrics think they are then the metrics are broken.
Second point: the categorisation into "social" and "algorithmic" search seems to me terminologically inexact when a key element of the "algorithmic" search is which sites have incoming links from other people. If those links are put there by people, that's a pretty social algorithm ;-) Perhaps the distinction would better be drawn between "anonymous" and "personal social" search.
@Vasuadiga
Well, it really isn't the domain name that is influencing the results. Along with that domain there's a legion of SEO techniques that are the actual responsible for the website placement. There never was, and still there isn't, any reason to believe the domain name factors in a website rank. Neither it would make any sense. What happens instead is that a domain name like iphone4case.com facilitates the creation of a link anchor text that may be more relevant to Google's algorithms (it is believed that a link anchor text is important).
So with a domain like that, the owner is effectively creating a commercial name that goes like "iPhone 4 Case". Contrast that with the same business, had it been named mobileshell.com. When someone links to their business, the link anchor text and surrounding text could read as:
- Find your iPhone cases at [u]iPhone 4 Case[/u]
- Find your iPhone cases at [u]Mobile Shell[/u]
On the first case, both commercial name and anchor text accurately reflect the business, whereas the more creative second option will however produce an anchor text that doesn't. So when searching for the company name, "Mobile Shell" may produce a lot of false positives with links to the military or engineering areas, whereas "iPhone 4 Cases" will not. On the other hand, when searching for the more generic term "iPhone cases", the first company is at an advantage because there's a real chance that the vast majority of anchor text that link to their website include these exact terms (the plural form is largely ignored by google).
Mario Figueiredo on January 4, 2011 5:26 AMHave you tried Googling "sugar bowl" lately? Very misleading first listing.
StarTrekRedneck on January 4, 2011 6:15 AMHow does Bing fare in all this??
Is Bing equally scraper-infested??
Should and could Google and Bing et al. create scraper blacklists similar to anti-spam blacklists??
I have not yet experienced this scraping etc. -- are such issues related to how general or specific are one's search terms??
Thank you, Tom
Tom Lyczko on January 4, 2011 6:23 AMCrowdsourcing is the answer, IMO. If there's one thing Google has, it's a lot of users. Whatever happened to SearchWiki (http://googleblog.blogspot.com/2008/11/searchwiki-make-search-your-own.html)? It had a pleasant user interface, well-integrated with the results. I feel that such a mechanism, with a reputation system of some sort (perhaps subscribing to weighting results as edited by trusted groups of users) could drastically improve search result quality.
Goran Zec on January 4, 2011 6:50 AMFrom google perspective :
How about letting users vote to bury sites that just copy content?
From web browsers perspective:
How about making an plugin to preprocess google result filtering out sites on a black list?
From a user perspective:
Use alternatives to google (yahoo, bing, etc), the less people use google, the more google is forced to improve. Google replaced yahoo, but it can be replaced if they don't hear their users.
@Scott Willeke, Thanks! I installed that blacklist plugin. I've been wanting such an extension for some time: https://chrome.google.com/extensions/detail/ddgjlkmkllmpdhegaliddgplookikmjf
I want to acknowledge Google as an innovative company that almost single-handedly made the world wide web useful. As of last year they'd crawled over 1 trillion unique URLs, an astounding amount of noise to sift through. I admire their engineering ethos and feel their business largely adheres to "don't be evil".
That said, there is a real, serious problem with result quality. Google is a victim of its own success. The ecosystem they created is so profitable that it requires Google to spend inordinate time (possibly 50% of engineering?) keeping webmasters honest. Pick your metaphor -- traders gaming the stock market or bacteria growing antibiotic resistant -- bad websites are out-evolving Google.
Ranking knowledge has become ubiquitous, and sadly knowledge of gaming an engine has become more important to content sites that writing valid, expert content. It's not just the spammers, malware sites, and scraper sites writing worthless keyword stuffed content and buying links.
Google also made a deal with the McContent devil, Demand Media: http://techcrunch.com/2009/12/13/the-end-of-hand-crafted-content/
Demand buys up search queries and pays writers a paltry sum (dollars) to write poorly-researched content on subject areas they often have little to no experience in. Demand makes a few ad dollars per article, with traffic exclusively driven by search (I've never met anyone who goes directly to eHow.com to browse.) In turn, Google takes a cut of Adwords dollars. In the short run, Google's bottom line looks better, especially on a Youtube site they've had trouble monetizing. Demand runs eHow, but you'll equally vapid content on Q/A sites Wikia, Yahoo Answers.
Google needs to respond or their flagship search will suffer. The solution will be complex and multifaceted. In addition to small, incremental changes, I think Google will need to make some seismic ones. Google will face cries of injustice from "content producers" in the gray areas, but they need to stand tough.
I run SEO program at a large US media organization, NPR. From the beginning, we've stayed above board - fixing coding issues, worked on syndication, and trained our writers on the very basics. We write first for humans. That ensures that Google crawls us adequately, but we do lose traffic to sites that out-SEO us, legitimately or otherwise. My long view is that this current state of search is not sustainable, and any efforts we spend beyond the basics are at the expense of other products we can build.
It's easy for my organization to take this tack, however, because we're a well known brand and we can focus on other channels, such as social media and viral sites. Content producers should think about the tradeoffs they make when going broke for SEO -- it's impossible to quantify the traffic you don't get from Facebook/Twitter when you water down your content.
More comments»
Comments