Showing posts with label Robots.txt. Show all posts
Showing posts with label Robots.txt. Show all posts

Wednesday, May 9, 2012

40 Very Important Factors On SEO

Here is the list of the most influential factors on SEO. I found this on another site. 


Sorted by Priority.






1 * HIGH * Title Tag - Description of Website


2 * HIGH * Domain Name


3 * HIGH * H1 tag or first headline of document content


4 * MID * Description Meta Tag


5 * MID * Keyword Meta Tag


6 * MID * Body of Text Italic


7 * MID * Body Of Text Bolded


8 * MID * Body of text generic


9 * LOW * Keyword Density 5-20%


10 * HIGH * Latent Symantec Indexing - Related words on topic


11 * MID * Sub headlines H2, H3 etc


12 * MID * Phrase order within the page


13 * MID * Keyword proximity of eachother


14 * MID * Font size +2 for sub-topics


15 * MID * Keywords within your alt text of image descriptions


16 * HIGH * Keyword early within the page


17 * HIGH * Keyword in links to other pages (on or off site)


18 * HIGH * Quality of other sites you link to


19 * HIGH * Topic of other sites you link to


20 * HIGH * Tree like structure of navigation


21 * MID * Internal links valid?


22 * HIGH * Number of links on the page itself (less is better)


23 * MID * Domain names you link to (Gov is best, then .edu, .org etc..)


24 * LOW * Web page size (IE: under 100k)


25 * HIGH * Hyphens in domain or file names more than 4 is bad


26 * MID * Page changing, more often updated pages prefered


27 * HIGH * Domain age


28 * MID * Page itself Age


29 * MID * Sites with more internal pages (IE: over 100 internal pages)


30 * MID * Page Theme


31 * MID * Frequency of updates themselves


32 * HIGH * Interesting title tag - Gets more SERP clicks than another


33 * MID * Appropriate links between itself and other pages


34 * LOW * Having a robots.txt


35 * LOW * Stating a physical address (Trust)


36 * LOW * Stating a support email address


37 * LOW * Describing every image


38 * LOW * Naming the images themselves thematically


39 * MID * Keyword in name of page itself


40 * MID * Document within a related folder or subdomain

Read more @ bestblackhatforum.com : 40 Very Important Factors On SEO http://bestblackhatforum.com/Thread-40-Very-Important-Factors-On-SEO#ixzz1uN9nZt7W
bestblackhatforum.com 

Friday, February 10, 2012

What is Robots.txt

Robots.txt

It is great when search engines frequently visit your site and index your content but often there are cases when indexing parts of your online content is not what you want. For instance, if you have two versions of a page (one for viewing in the browser and one for printing), you'd rather have the printing version excluded from crawling, otherwise you risk being imposed a duplicate content penalty. Also, if you happen to have sensitive data on your site that you do not want the world to see, you will also prefer that search engines do not index these pages (although in this case the only sure way for not indexing sensitive data is to keep it offline on a separate machine). Additionally, if you want to save some bandwidth by excluding images, stylesheets and javascript from indexing, you also need a way to tell spiders to keep away from these items.

One way to tell search engines which files and folders on your Web site to avoid is with the use of the Robots metatag. But since not all search engines read metatags, the Robots matatag can simply go unnoticed. A better way to inform search engines about your will is to use a robots.txt file.

What Is Robots.txt?

Robots.txt is a text (not html) file you put on your site to tell search robots which pages you would like them not to visit. Robots.txt is by no means mandatory for search engines but generally search engines obey what they are asked not to do. It is important to clarify that robots.txt is not a way from preventing search engines from crawling your site (i.e. it is not a firewall, or a kind of password protection) and the fact that you put a robots.txt file is something like putting a note “Please, do not enter” on an unlocked door – e.g. you cannot prevent thieves from coming in but the good guys will not open to door and enter. That is why we say that if you have really sen sitive data, it is too naïve to rely on robots.txt to protect it from being indexed and displayed in search results.

The location of robots.txt is very important. It must be in the main directory because otherwise user agents (search engines) will not be able to find it – they do not search the whole site for a file named robots.txt. Instead, they look first in the main directory (i.e. http://mydomain.com/robots.txt) and if they don't find it there, they simply assume that this site does not have a robots.txt file and therefore they index everything they find along the way. So, if you don't put robots.txt in the right place, do not be surprised that search engines index your whole site.

The concept and structure of robots.txt has been developed more than a decade ago and if you are interested to learn more about it, visit http://www.robotstxt.org/ or you can go straight to the Standard for Robot Exclusion because in this article we will deal only with the most important aspects of a robots.txt file. Next we will continue with the structure a robots.txt file.

Structure of a Robots.txt File

The structure of a robots.txt is pretty simple (and barely flexible) – it is an endless list of user agents and disallowed files and directories. Basically, the syntax is as follows:

User-agent:

Disallow:

“User-agent” are search engines' crawlers and disallow: lists the files and directories to be excluded from indexing. In addition to “user-agent:” and “disallow:” entries, you can include comment lines – just put the # sign at the beginning of the line:

# All user agents are disallowed to see the /temp directory.

User-agent: *

Disallow: /temp/

The Traps of a Robots.txt File

When you start making complicated files – i.e. you decide to allow different user agents access to different directories – problems can start, if you do not pay special attention to the traps of a robots.txt file. Common mistakes include typos and contradicting directives. Typos are misspelled user-agents, directories, missing colons after User-agent and Disallow, etc. Typos can be tricky to find but in some cases validation tools help.

The more serious problem is with logical errors. For instance:

User-agent: *

Disallow: /temp/

User-agent: Googlebot

Disallow: /images/

Disallow: /temp/

Disallow: /cgi-bin/

The above example is from a robots.txt that allows all agents to access everything on the site except the /temp directory. Up to here it is fine but later on there is another record that specifies more restrictive terms for Googlebot. When Googlebot starts reading robots.txt, it will see that all user agents (including Googlebot itself) are allowed to all folders except /temp/. This is enough for Googlebot to know, so it will not read the file to the end and will index everything except /temp/ - including /images/ and /cgi-bin/, which you think you have told it not to touch. You see, the structure of a robots.txt file is simple but still serious mistakes can be made easily.
----------------------------------------------------------------------------------
Article Resource:http://www.webconfs.com/what-is-robots-txt-article-12.php
----------------------------------------------------------------------------------