How To Create The Perfect Robots Scanning My Site? How Do I Prevent Robots Scanning My Site


That’s why the technique I’m going to lớn tell you about today is one of my absolute favorites. It’s a legitimate SEO haông chồng that you can start using right away.

Bạn đang xem: How To Create The Perfect Robots Scanning My Site? How Do I Prevent Robots Scanning My Site

It’s a way khổng lồ increase your SEO by taking advantage of a natural part of every website that rarely gets talked about. It’s not difficult to implement either.

It’s the robots.txt tệp tin (also called the robots exclusion protocol or standard).

This teeny tiny text tệp tin is part of every website on the Internet, but most people don’t even know about it.

It’s designed to lớn work with search engines, but surprisingly, it’s a source of SEO juice just waiting to be unlocked.

I’ve seen client after client bend over backward trying lớn enhance their SEO. When I tell them that they can edit a little text tệp tin, they almost don’t believe sầu me.

However, there are many methods of enhancing SEO that aren’t difficult or time-consuming, và this is one of them.

You don’t need khổng lồ have any technical experience to leverage the power of robots.txt. If you can find the source code for your trang web, you can use this.

So when you’re ready, follow along with me, và I’ll show you exactly how lớn change up your robots.txt tệp tin so that tìm kiếm engines will love sầu it.

Why the robots.txt file is important

First, let’s take a look at why the robots.txt file matters in the first place.

The robots.txt tệp tin, also known as the robots exclusion protocol or standard, is a text file that tells website robots (most often search engines) which pages on your site to crawl.

It also tells website robots which pagesnot tocrawl.

Let’s say a search engine is about to lớn visit a site. Before it visits the target page, it will check the robots.txt for instructions.

There are different types of robots.txt files, so let’s look at a few different examples of what they look lượt thích.

Let’s say the search engine finds this example robots.txt file:


This is the basic skeleton of a robots.txt tệp tin.

The asterisk after “user-agent” means that the robots.txt file applies to all web robots that visit the site.

The slash after “Disallow” tells the robot khổng lồ not visit any pages on the site.

You might be wondering why anyone would want lớn stop website robots from visiting their site.

After all, one of the major goals of SEO is to get search engines to lớn crawl your site easily so they increase your ranking.

This is where the secret to lớn this SEO haông chồng comes in.

You probably have sầu a lot of pages on your site, right? Even if you don’t think you vì chưng, go kiểm tra. You might be surprised.

If a tìm kiếm engine crawls your site, it will crawl every single one of your pages.

And if you have sầu a lot of pages, it will take the tìm kiếm engine bot a while to lớn crawl them, which can have negative effects on your ranking.

That’s because Googlebot (Google’s tìm kiếm engine bot) has a “crawl budget.”

This breaks down into two parts. The first is crawl rate limit. Here’s how Google explains that:


The second part is crawl demand:


Basically, crawl budget is “the number of URLs Googlebot can & wants to crawl.”

You want to lớn help Googlebot spover its crawl budget for your site in the best way possible. In other words, it should be crawling your most valuable pages.

There are certain factors that will, according lớn Google, “negatively affect a site’s crawling & indexing.”

Here are those factors:


So let’s come baông chồng khổng lồ robots.txt.

If you create the right robots.txt page, you can tell tìm kiếm engine bots (và especially Googlebot) to avoid certain pages.

Think about the implications. If you tell search engine bots to lớn only crawl your most useful nội dung, the bots will crawl & index your site based on that nội dung alone.

As Google puts it:

“You don’t want your VPS lớn be overwhelmed by Google’s crawler or khổng lồ waste crawl budget crawling unimportant or similar pages on your site.”

By using your robots.txt the right way, you can tell tìm kiếm engine bots khổng lồ spkết thúc their crawl budgets wisely. And that’s what makes the robots.txt tệp tin so useful in an SEO context.

Intrigued by the power of robots.txt?

You should be! Let’s talk about how to find and use it.

Finding your robots.txt file

If you just want a quiông chồng look at your robots.txt file, there’s a super easy way khổng lồ view it.

In fact, this method will work for any site. So you can peek on other sites’ files và see what they’re doing.

All you have khổng lồ vị it type the basic URL of the site into your browser’s tìm kiếm bar (e.g.,,, etc.). Then add /robots.txt onto lớn the kết thúc.

One of three situations will happen:

1) You’ll find a robots.txt tệp tin.


2) You’ll find an empty file.

For example, Disney seems to lớn laông xã a robots.txt file:


3) You’ll get a 404.

Method returns a 404 for robots.txt:


Take a second & view your own site’s robots.txt tệp tin.

If you find an empty tệp tin or a 404, you’ll want to lớn fix that.

If you vì find a valid file, it’s probably set to default settings that were created when you made your site.

I especially like this method for looking at other sites’ robots.txt files. Once you learn the ins & outs of robots.txt, this can be a valuable exercise.

Now let’s look at actually changing your robots.txt file.

Finding your robots.txt file

Your next steps are all going khổng lồ depover on whether or not you have sầu a robots.txt file. (Cheông xã if you vày byusing the method described above sầu.)

If you don’t have sầu a robots.txt tệp tin, you’ll need lớn create one from scratch. xuất hiện a plain text editor like Notepad (Windows) or TextEdit (Mac.)

Only use a plain text editor for this. If you use programs like Microsoft Word, the program could insert additional code into lớn the text. is a great không tính tiền option, and that’s what you’ll see me using in this article.


Bachồng to robots.txt. If you have a robots.txt tệp tin, you’ll need to lớn locate it in your site’s root directory.

If you’re not used khổng lồ poking around in source code, then it might be a little difficult khổng lồ locate the editable version of your robots.txt tệp tin.

Usually, you can find your root directory by going to lớn your hosting account website, logging in, and heading to lớn the tệp tin management or FTPhường. section of your site.

You should see something that looks like this:


Find your robots.txt file and open it for editing. Delete all of the text, but keep the tệp tin.

Note: If you’re using WordPress, you might see a robots.txt tệp tin when you go to, but you won’t be able khổng lồ find it in your files.

This is because WordPress creates a virtual robots.txt tệp tin if there’s no robots.txt in the root directory.

If this happens lớn you, you’ll need to lớn create a new robots.txt file.

Creating a robots.txt file

You can create a new robots.txt tệp tin by using the plain text editor of your choice. (Rethành viên, only use a plain text editor.)

If you already have sầu a robots.txt tệp tin, make sure you’ve deleted the text (but not the file).

First, you’ll need khổng lồ become familiar with some of the syntax used in a robots.txt file.

Xem thêm: 10 Trang Web Up Ảnh Lấy Link Trên Những Website Chất Lượng Nhất

Google has a nice explanation of some basic robots.txt terms:


I’m going to lớn show you how to set up a simple robot.txt file, & then we’ll take a look at how to customize it for SEO.

Start by setting the user-agent term. We’re going lớn phối it so that it applies to all website robots.

Do this by using an asterisk after the user-agent term, like this:


Next, type “Disallow:” but don’t type anything after that.


Since there’s nothing after the disallow, website robots will be directed khổng lồ crawl your entire site. Right now, everything on your site is fair game.

So far, your robots.txt file should look lượt thích this:


I know it looks super simple, but these two lines are already doing a lot.

You can also links lớn your XML sitemaps, but it’s not necessary. If you want to, here’s what to type:


Believe sầu it or not, this is what a basic robots.txt file looks like.

Now let’s take it lớn the next cấp độ và turn this little tệp tin inlớn an SEO booster.

Optimizing robots.txt for SEO

How you optimize robots.txt all depends on the content you have on your site. There are all kinds of ways to use robots.txt to lớn your advantage.

I’ll go over some of the most common ways to lớn use it.

(Keep in mind that you should not use robots.txt to block pages from tìm kiếm engines. That’s a big no-no.)

One of the best uses of the robots.txt tệp tin is to maximize search engines’ crawl budgets by telling them khổng lồ not crawl the parts of your site that aren’t displayed to the public.

For example, if you visit the robots.txt file for this site (, you’ll see that it disallows the login page (wp-admin).


Since that page is just used for logging inkhổng lồ the backkết thúc of the site, it wouldn’t make sense for search engine bots to waste their time crawling it.

(If you have WordPress, you can use that same exact disallow line.)

You can use a similar directive sầu (or command) khổng lồ prevent bots from crawling specific pages. After the disallow, enter the part of the URL that comes after the .com. Put that between two forward slashes.

So if you want lớn tell a bot lớn not crawl your page, you can type this:


You might be wondering specifically what types of pages khổng lồ exclude from indexation. Here are a couple of comtháng scenartiện ích ios where that would happen:

Purposeful duplicate content. While duplicate nội dung is mostly a bad thing, there are a handful of cases in which it’s necessary and acceptable.

For example, if you have sầu a printer-friendly version of a page, you technically have sầu duplicate nội dung. In this case, you could tell bots khổng lồ not crawl one of those versions (typically the printer-friendly version).

This is also handy if you’re split-testing pages that have the same nội dung but different designs.

Thank you pages. The thank you page is one of the marketer’s favorite pages because it means a new lead.


As it turns out, some thank you pages are accessible through Google. That means people can access these pages without going through the lead capture process, & that’s bad news.

By blocking your thank you pages, you can make sure only qualified leads are seeing them.

So let’s say your thank you page is found at In your robots.txt file, blocking that page would look lượt thích this:


Since there are no universal rules for which pages khổng lồ disallow, your robots.txt file will be chất lượng to your site. Use your judgment here.

There are two other directives you should know: noindex and nofollow.

You know that disallow directive we’ve sầu been using? It doesn’t actually prsự kiện the page from being indexed.

So theoretically, you could disallow a page, but it could still kết thúc up in the index.

Generally, you don’t want that.

That’s why you need the noindex directive. It works with the disallow directive to make sure bots don’t visit or index certain pages.

If you have sầu any pages that you don’t want indexed (lượt thích those precious thank you pages), you can use both disallow & noindex directive:


Now, that page won’t show up in the SERPs.

Finally, there’s the nofollow directive. This is actually the same as a nofollow liên kết. In short, it tells website robots khổng lồ not crawl the link on a page.

But the nofollow directive sầu is going lớn be implemented a little bit differently because it’s actually not part of the robots.txt file.

However, the nofollow directive is still instructing website robots, so it’s the same concept. The only difference is where it takes place.

Find the source code of the page you want khổng lồ change, & make sure you’re in between the tags.


Then paste this line:

So it should look lượt thích this:


Make sure you’re not putting this line between any other tags––just the tags.

This is another good option for thank you pages since web robots won’t crawl link khổng lồ any lead magnets or other exclusive sầu nội dung.

If you want to lớn add both noindex & nofollow directives, use this line of code:

This will give sầu website robots both directives at once.

Testing everything out

Finally, thử nghiệm your robots.txt file to make sure everything’s valid & operating the right way.

Google provides a không tính phí robots.txt tester as part of the Webmaster tools.

First, sign in lớn your Webmasters tài khoản by clicking “Sign In” on the top right corner.


Select your property (i.e., website) and cliông chồng on “Crawl” in the left-hvà sidebar.


You’ll see “robots.txt Tester.” Cliông xã on that.


If there’s any code in the box already, delete it & replace it with your new robots.txt tệp tin.

Clichồng “Test” on the lower right part of the screen.


If the “Test” text changes khổng lồ “Allowed,” that means your robots.txt is valid.

Here’s some more information about the tool so you can learn what everything means in detail.

Finally, upload your robots.txt khổng lồ your root directory (or save it there if you already had one). You’re now armed with a powerful tệp tin, and you should see an increase in your search visibility.


I always love sầu sharing little-known SEO “hacks” that can give you a real advantage in more ways than one.

By setting up your robots.txt tệp tin the right way, you’re not just enhancing your own SEO. You’re also helping out your visitors.

If tìm kiếm engine bots can spend their crawl budgets wisely, they’ll organize & display your content in the SERPs in the best way, which means you’ll be more visible.

It also doesn’t take a lot of effort to lớn set up your robots.txt tệp tin. It’s mostly a one-time thiết đặt, and you can make little changes as needed.

Xem thêm: Top 10 Phần Mềm Photoshop Cho Laptop, Cài Photoshop Cs6 Cho Win 7

Whether you’re starting your first or fifth site, using robots.txt can make a significant difference. I recommkết thúc giving it a spin if you haven’t done it before.