Prevent duplicate content by blocking archives from search engines
Why would you want to block archive pages anyway? The answer is to prevent Google from penalizing your blog due to duplicate content. You see archive pages do not have contents of their own. What you see on them are duplicates, taken from individual post pages. And if both pages (the individual post and archive pages having the same content) are indexed by Google, that constitutes duplicate content.
That’s exactly the case for this blog –duplicate content due to archive pages. And I paid the price for it during the last Google PageRank update. This blog’s PR went down from 4 to 3.
The problem probably started when I added Blogger Archive widget to the sidebar. This widget provided access for web crawlers to the archive pages, which eventually led to the pages being indexed by Google.
Preventing duplicate content
You can prevent duplicate content by telling search engines not to index archive pages. This can be achieved by adding a “noindex” robots meta tag to the archive pages (and archive pages only).
Here’s how:
- Login to your Blogger account.
- Go to Dashboard > Design > Edit HTML.
- Find the
<head>
tag and add the following code below it:
<b:if cond='data:blog.pageType == "archive"'> <meta content='noindex,noarchive' name='robots'/> </b:if>
What happens to the archive pages already listed in SERPs?
The archive pages will eventually drop off from search result pages. However if you want them removed quickly, remove them from SERPs using the URL removal tool.
94 comments to "Prevent duplicate content by blocking archives from search engines"
Hi Greenlava..
This is very useful. So many times founded contents duplicated, thank you for sharing this!
Wait.. what if you put "NOINDEX" means your blog will not indexed by Google???
salam..
saya pernah buat satu entri mengenai ini dahulu kerana saya pernah terbaca di laman forum barat mengatakan entri kita akan lebih kekal lama sekiranya kita off blog arkib..saya tak tahu dan tak berani buat, sehinggalah saya baca entri anda ini terima kasih Bro..
http://www.zulkbo.com/2011/01/no-archive-perlukah-tip-blogging.html
Great tip! Used it right away on our village newspaper !
@Yana
This code will only prevent indexing of archives pages. The rest of the blog will be indexed as usual.
@zulkbo
Cara Zulkbo tu pun boleh jugak. Tapi kalau nak pasang archive widget, kena onkan pastu apply meta tag spt di atas.
@ Greenlave: Are you sure? What's the different between of them? I still get confused, youre meta tag contains a robot, and NOINDEX. You know, I have asked on Google webmaster forum, ans have asked about that "No Index" meta tag robot, then John from Google webmaster said, I shouldn't put meta tag robot with titled "NOINDEX".. so that I removed and I replaced them immediatelly...
Please let me know which one is right? :S
Terima kasih sebelumnya!
i see. I never know about that. This is new info to me.
I just added this code to my blog. How much take it for activate. Because Google still show my blog "archive pages" It really good trick. Many thanks!
@Yana
Yes I'm sure. The meta root code is wrapped in an archive page conditional tag, so it won't affect other pages.
@Chris
The code activates instantly, but SERPs will only update the next time Google bot crawl those pages.
If you're in a hurry, use the URL removal tool.
Achieve page manually removing is hard process. Never-mind thanks for your reply!
Does this effect your alexa page rank? I added it and my rank has been dropping. I am going to remove it and see what happens. I will keep you posted.
Thanks for this useful tip!
Best Regards
Classier Corn
Thanks Greenlava For Tips :)
any idea how to use this on wordpress?
@Kun
Install All in One SEO plugin. You then simply tick the "Use noindex for Archives" checkbox.
If you use a table of content it would be the same thing as the archive no?
Can you use this on a static page for a table of content as well?
@Hannah
No it wouldn't be the same thing. To the search engine, your table of content IS unique. (And even if it isn't, the table of content is just one page, so it won't harm your site like archive does).
This Is A GREAT Blog...
Nice and helpful tutorial. I am going to use this. Thanks for share.
Thanks for sharing nice tutorial. Keep it up dude
Thanks for this tip. I have been wondering how to prevent Google from indexing the archives.
I was sceptical when I saw this issue mentioned in the help-forum first, but recently saw evidence of the very problem, and just applied your fix to the problem blog now.
Isn't it rather crazy that we have to do this? It seems odd that Google manages to tell itself not to index based on Labels, but we have to tell it not to do Archive.
the same experience here encountered this problem also.. having duplicate content for my articles
Thanks Greenlava For Tips :)
I use all in one SEO pack for this in my wordpress blog
Its absolutely incredible sharing, and honestly before read your post i was unaware about it, now after read your post i can say that i have good knowledge about it.
Does this effect your alexa page rank? I added it and my rank has been dropping. I am going to remove it and see what happens. I will keep you posted.
@Garold walker
No, it doesn't affect Alexa.
Thank you for this info, but I am confused because Google says the following on their site: "Google no longer recommends blocking crawler access to duplicate content on your website, whether with a robots.txt file or other methods. If search engines can't crawl pages with duplicate content, they can't automatically detect that these URLs point to the same content and will therefore effectively have to treat them as separate, unique pages. A better solution is to allow search engines to crawl these URLs, but mark them as duplicates by using the rel="canonical" link element, the URL parameter handling tool, or 301 redirects. In cases where duplicate content leads to us crawling too much of your website, you can also adjust the crawl rate setting in Webmaster Tools."
Doesn't this mean that we should handle the noindex / noarchive of the Archive pages in this fashion?
@LC Hunt
I believe the excerpt refers to blocking crawlers from accessing a web page using "Disallow" in robots.txt as explained here. This method is discouraged because it doesn't really prevent the duplicate page from being indexed (ie. appear in search results). You need to to worry about this robots thing Blogger don't give access to it.
Noindex meta tag on the other hand allows the bots to crawl the page but FORCE them not to index it. Search engines won't see the unindexed page, hence duplication won't take place.
As for rel="canonical", the canonical link only points to the original page. It doesn't promise to do anything. As I understand it, the duplicate page will get indexed and the page may or may not appear in search results. Besides, it's nearly impossible to implement rel="canonical" in this particular case (and due to Blogger's limitation).
So in the end, I believe the noindex meta tag is the best tool for the job.
found very useful article i am going to use this for my blog
Great one I will use it , my blog still new and google never visit it..
I follow your steps and i completed it successfully. I hope it will prevent duplicate content problem in my blog. And today i implemented also post title before blog title code in my blog.
Salam greenlarva.. Nice info. Saya pun mengelakkan duplicate content, mmg patut buat begini. anyway saya terus matikan fungsi archives, dan guna iniisiatif lain.
More extra, dalam satu2 keadaan boleh juga letak rel="canonical" ;)
nice nice dude.
I have followed your instruction and now i can see that the number of google index is drastically reduced. Still I am happy that only quality links will appear in search engine of my blog. thank you.
This is such helpful information. THANK YOU for writing this and sharing your knowledge with the world. You just made my day. :) Keep up the great work and have a great day!
Thanks for the tip. I have added this code in my blog.
Thanks for the tip. Well-written and helpful.
Thanks for this tip. I never knew that widgets could affect the SERP like that.
Great help. I'd been wondering why people kept hitting up my archive pages instead of my main post pages from searches, and this should solve that issue :) Many thanks!
Thanks for the tips, I hope after implementing this trick, The index will not show my archive anymore.
my site todayprice.in ip ut code in my html but serach engine appearing archieves
terimakasih atas infonya yang bermanfaat dan mudah dicerna
this is a good suggestion it prevent our site from downgrading
Hi, I've wondered this issue for quite some time. I didn't want to remove my Blog Archives widget to make the blog hard to browse. Thanks for a great tip! I'm putting this on the test right now :)
Thank you sir, Now I will just give it a test to see what will happen.
How do I block search engines from indexing labels?
@Sagar Nargolka
Label-search pages are blocked by Blogger by default in robots.txt
Hi, I really need your help. I want to avoid several pages on my blogger blog from getting indexed and crawl by robot. Could you give me some tips about this?
@napnipnop
Use this for each page you don't want to index:
[b:if cond='data:blog.url == "PUT_PAGE_URL_HERE"']
[meta content='noindex,noarchive' name='robots'/]
[/b:if]
thanks green lava but how to block label page from search engine ?
@nptechs.blogspot.com
Label-search pages are already blocked via Blogger's robots.txt by default.
@ Greenlava if you think so see this in my blog there label search pages are indexed by google see here See This
@nptechs.blogspot.com
This is your robots.txt:
User-agent: *
Allow: /
which is different from Blogger default's:
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
It looks like you're using a custom robots.txt (via Settings > Search Preferences > Crawlers and indexing).
Hey Greenlava,
Will this help hide the archive from being shown instead of individual posts on google search?
e.g. http://i.imgur.com/DAVqO.jpg
@Anonymous
Yes exactly. Once you add the meta tag, those monthly archive links will be removed (albeit not immediately) from search results.
^ great! thank you very much.
Excellent article. Seems i have been penalized by Google on account of having archives in search index. Now i have removed them..Hoping to see my traffic rise again :-) Earlier i associated this with Penguin/panda penalty.. little worries BTW
This is exactly the same, which I was looking.
Thanks for your tip.
Hi,
I understand the point very well that archive pages cause duplicate content to be indexed by search engines and we can eliminate this problem by no-index. But I have another question regarding this. Has it anything related with PR? Will it cause decrease or increase in PR?
@Adam from KeywordLuv
No-index has no effect on PR.
Nice Article.Thank you for sharing.
I found out that "archve months" are my top pages in google index so i'm going to use this for better seo optimization, it's a pitty that google blogger templates won't use this automatically as archive months deosn't seem to generate much traffic in google analytics.
awesome.. so usee full.. thank's ^^
Hello!
At SETTINGS> SEARCH PREFERENCES> CUSTOM HEADER TAGS,
I already ticked 'noindex' at "Archive and Search pages"..
I also entered the code you posted.
will it have conflicts?
Thanks
@carlo
No it will no conflict.
why you are not using noindex. noarchive yourself?? i have seen your archive pages but couldn't find any.
@Dracuula
I am using them, but with another method: Custom robots header tags which is accessible via Settings > Search preferences.
Blogger introduced this feature in March 2012. It basically does the same thing, so you can use either method.
hi greenlava. Is it necessary to put this piece of code right after the head tag ?
Can i move this code a little bit down within the head tag ?
plz reply
@AbHi Shek
You can place it anywhere within the head tag.
I hope this will work for me. Thanks!!!
you suggested 'noarchive' in the robots tag, what does noarchive means? is this necessary?
@Nina Octoviana
It will prevent Google from showing a cache copy of your archive pages in search results.
This is very nice tool for webmastes. Duplicate content is a big issue in search engine. I have applied this in my blog. Thanks a lot for sharing this.
Thanks much. This post has been very helpful for my and my blogger blog.
excelente :D gracias
awesome, this is what i'm looking for, luckily my archive still not to many indexed by google (only two of them) thanks for the nice info
Hello
i have same duplicate issue in HTML Suggestion in Webmaster Tool
. i am also using blogger but Duplicate tag issue automatically create my post end url examples
christmas-twister-2012.html?m=0
christmas-twister-2012.html?m=1
when i configure the url parameter i shocked m parameter already added in my webmaster tool and set into let’s google decided and no delete option to remove this m parameter. could you please tell me What is best setting for this parameter ” m ” in URL Parameter in webmaster tool. can i set this parameter m into ” No URL ”
Really i am very confuse so what i do now what is best setting the remove all these duplicates .
Please Assist me …………
Sania
@Anonymous
Yes setting it to "No URLs" should solve the duplicate issue.
hello,
thank you for sharing.i really need this information. yesterday when i checked my url in google webmaster tool it says you have 15 duplicate description and 32 duplicate title. All these duplicates are because of lebels and archieves. let me to copy the code and paste into my template. again thank you for helping.
Could it be fixed if we choose not to display the post body on archive pages by using css??
@Waseem Rahmani
With CSS you only hides the content (from human), search engine spiders will still be able to see it.
@Greenlava
What about using instead of on archive and label pages? The post.snippet tag brings in only a few characters of the blog post.
or you could also choose not to include either of and on archive pages and the result would be archive pages containing only links to the blogposts. Would it give any SEO advantage, I mean the link juice?
Sorry for a lot of questions, I'm quite new on blogger and advises from blogger ninjas like you would be great.:)
Thanks a lot for sharing this info. I hope it helps.
Really great blog with all the seo optimized posts..
This site really help my blog to improve its visibility in search results...
How to prevent duplicate content by blocking label links?
I have a lot of label links indexed in google.
@Ankur Choudhary
To exclude Archive and Label pages from index:
1. Go to Settings > Search Preferences > Crawlers and Indexing > Custom robots header tags.
2. Click Edit and enable it by checking "Yes" radio button.
3. Check 'noindex' and 'noarchive' checkboxes under "Archive and Search pages".
Hello Greenlava, First thanks for your helpful article for newbie like me. I successfully use it on my blog. Now i have another question that how i block my label from search engines. I found that label index is not helpful for google.
@Shariful Islam Razu
Read reply #90.
Very informative and helpful.
Thanks for sharing.
Shariful islam says blogger indexd labels arent good for google, what's your take on that GREENLAVA, I will go with your recommendation, though I excluded archive and labels from being indexd on google as per your given approach to comment number 90, but still I want to make sure its not good for google.
Kindly share your thoughts on this one and I AM YOUR FAN FOREVER!!!
And last thing applying your recommendation as you suggested in comment number 90 for excluding archives and labels, I did that. Though in Settings > Search Preferences > Crawlers and Indexing > Custom robots header tags was Disabled before that.
I just want to know if I need to check any other boxes in there or leave everything else as it is ? Please don't mind I just don't know about all these things being a newbie :)
@Arshad Amin
Indexed labels add more pages to your SERP. However they risk duplicate contents and add extra steps for Googlers before reaching the post.
With labels page: SERP > label page > scroll and find > the post
Without labels page: SERP > the post
Just check the 'noindex' and 'noarchive' checkboxes under "Archive and Search pages". Don't touch anything else.
Comments on this post are closed.