Cyletech
Member
سلام دوستان،
در این آموزش ما دو متد برای محدود کردن روبات های بد توسط .htaccess به شما آموزش خواهیم داد.
متد اول - استفاده از RewriteRules
Alternate RewriteCond Rules
متد دوم - با استفاده از SetEnvIfNoCase
لیست روبات های بد وب
ترجمه: علیرضا اسکندرپور
منبع: askapache.com
با تشکر از: جناب فرمانی
در این آموزش ما دو متد برای محدود کردن روبات های بد توسط .htaccess به شما آموزش خواهیم داد.
متد اول - استفاده از RewriteRules
PHP:
ErrorDocument 403 /403.html
RewriteEngine On
RewriteBase /
# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC,OR]
# STARTS WITH WEB
RewriteCond %{HTTP_USER_AGENT} ^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) [NC,OR]
# ANYWHERE IN UA -- GREEDY REGEX
RewriteCond %{HTTP_USER_AGENT} ^.*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]
# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]
Alternate RewriteCond Rules
PHP:
RewriteEngine on
#Block spambots
RewriteCond %{HTTP:User-Agent} (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|\
BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|\
CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|\
eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|\
EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|\
Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|\
InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|\
larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|\
Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|\
Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|\
NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\***plorer|Offline\sNavigator|Openfind|\
Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWebWalker|ProWebWalker|\
psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|\
Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|\
TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|\
Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|\
WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|\
WebReaper|WebSauger|Website\***tractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|\
Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|\
Xaldon\sWebSpider|Xenu's|Zeus) [NC]
RewriteRule .? - [F]
متد دوم - با استفاده از SetEnvIfNoCase
PHP:
ErrorDocument 403 /403.html
# IF THE UA STARTS WITH THESE
SetEnvIfNoCase ^User-Agent$ .*(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(libwww-perl|aesop_com_spiderman) HTTP_SAFE_BADBOT
Deny from env=HTTP_SAFE_BADBOT
لیست روبات های بد وب
PHP:
1. WebBandit
2. 2icommerce
3. Accoona
4. ActiveTouristBot
5. adressendeutschland
6. aipbot
7. Alexibot
8. Alligator
9. AllSubmitter
10. almaden
11. anarchie
12. Anonymous
13. Apexoo
14. Aqua_Products
15. asterias
16. ASSORT
17. ATHENS
18. AtHome
19. Atomz
20. attache
21. autoemailspider
22. autohttp
23. b2w
24. bew
25. BackDoorBot
26. Badass
27. Baiduspider
28. Baiduspider+
29. BecomeBot
30. berts
31. Bitacle
32. Biz360
33. Black.Hole
34. BlackWidow
35. bladder fusion
36. Blog Checker
37. BlogPeople
38. Blogshares Spiders
39. Bloodhound
40. BlowFish
41. Board Bot
42. Bookmark search tool
43. BotALot
44. BotRightHere
45. Bot mailto:[email protected]
46. Bropwers
47. Browsezilla
48. BuiltBotTough
49. Bullseye
50. BunnySlippers
51. Cegbfeieh
52. CFNetwork
53. CheeseBot
54. CherryPicker
55. Crescent
56. charlotte/
57. ChinaClaw
58. Convera
59. Copernic
60. CopyRightCheck
61. cosmos
62. Crescent
63. c-spider
64. curl
65. Custo
66. Cyberz
67. DataCha0s
68. Daum
69. Deweb
70. Digger
71. Digimarc
72. digout4uagent
73. DIIbot
74. DISCo
75. DittoSpyder
76. DnloadMage
77. Download
78. dragonfly
79. DreamPassport
80. DSurf
81. DTS Agent
82. dumbot
83. DynaWeb
84. e-collector
85. EasyDL
86. EBrowse
87. eCatch
88. ecollector
89. edgeio
90. [email protected]
91. EirGrabber
92. Email Extractor
93. EmailCollector
94. EmailSiphon
95. EmailWolf
96. EmeraldShield
97. Enterprise_Search
98. EroCrawler
99. ESurf
100. Eval
101. Everest-Vulcan
102. Exabot
103. Express
104. Extractor
105. ExtractorPro
106. EyeNetIE
107. FairAd
108. fastlwspider
109. fetch
110. FEZhead
111. FileHound
112. findlinks
113. Flaming AttackBot
114. FlashGet
115. FlickBot
116. Foobot
117. Forex
118. Franklin Locator
119. FreshDownload
120. FrontPage
121. FSurf
122. Gaisbot
123. Gamespy_Arcade
124. genieBot
125. GetBot
126. Getleft
127. GetRight
128. GetWeb!
129. Go!Zilla
130. Go-Ahead-Got-It
131. GOFORITBOT
132. GrabNet
133. Grafula
134. grub
135. Harvest
136. Hatena Antenna
137. heritrix
138. HLoader
139. HMView
140. holmes
141. HooWWWer
142. HouxouCrawler
143. HTTPGet
144. httplib
145. HTTPRetriever
146. HTTrack
147. humanlinks
148. IBM_Planetwide
149. iCCrawler
150. ichiro
151. iGetter
152. Image Stripper
153. Image Sucker
154. imagefetch
155. imds_monitor
156. IncyWincy
157. Industry Program
158. Indy
159. InetURL
160. InfoNaviRobot
161. InstallShield DigitalWizard
162. InterGET
163. IRLbot
164. Iron33
165. ISSpider
166. IUPUI Research Bot
167. Jakarta
168. java/
169. JBH Agent
170. JennyBot
171. JetCar
172. jeteye
173. jeteyebot
174. JoBo
175. JOC Web Spider
176. Kapere
177. Kenjin
178. Keyword Density
179. KRetrieve
180. ksoap
181. KWebGet
182. LapozzBot
183. larbin
184. leech
185. LeechFTP
186. LeechGet
187. leipzig.de
188. LexiBot
189. libWeb
190. libwww-FM
191. libwww-perl
192. LightningDownload
193. LinkextractorPro
194. Linkie
195. LinkScan
196. linktiger
197. LinkWalker
198. lmcrawler
199. LNSpiderguy
200. LocalcomBot
201. looksmart
202. LWP
203. Mac Finder
204. Mail Sweeper
205. mark.blonin
206. MaSagool
207. Mass
208. Mata Hari
209. MCspider
210. MetaProducts Download Express
211. Microsoft Data Access
212. Microsoft URL Control
213. MIDown
214. MIIxpc
215. Mirror
216. Missauga
217. Missouri College Browse
218. Mister
219. Monster
220. mkdb
221. moget
222. Moreoverbot
223. mothra/netscan
224. MovableType
225. Mozi!
226. Mozilla/22
227. Mozilla/3.0 (compatible)
228. Mozilla/5.0 (compatible; MSIE 5.0)
229. MSIE_6.0
230. MSIECrawler
231. MSProxy
232. MVAClient
233. MyFamilyBot
234. MyGetRight
235. nameprotect
236. NASA Search
237. Naver
238. Navroad
239. NearSite
240. NetAnts
241. netattache
242. NetCarta
243. NetMechanic
244. NetResearchServer
245. NetSpider
246. NetZIP
247. Net Vampire
248. NEWT ActiveX
249. Nextopia
250. NICErsPRO
251. ninja
252. NimbleCrawler
253. noxtrumbot
254. NPBot
255. Octopus
256. Offline
257. OK Mozilla
258. OmniExplorer
259. OpaL
260. Openbot
261. Openfind
262. OpenTextSiteCrawler
263. Oracle Ultra Search
264. OutfoxBot
265. P3P
266. PackRat
267. PageGrabber
268. PagmIEDownload
269. panscient
270. Papa Foto
271. pavuk
272. pcBrowser
273. perl
274. PerMan
275. PersonaPilot
276. PHP version
277. PlantyNet_WebRobot
278. playstarmusic
279. Plucker
280. Port Huron
281. Program Shareware
282. Progressive Download
283. ProPowerBot
284. prospector
285. ProWebWalker
286. Prozilla
287. psbot
288. psycheclone
289. puf
290. PushSite
291. PussyCat
292. PuxaRapido
293. Python-urllib
294. QuepasaCreep
295. QueryN
296. Radiation
297. RealDownload
298. RedCarpet
299. RedKernel
300. ReGet
301. relevantnoise
302. RepoMonkey
303. RMA
304. Rover
305. Rsync
306. RTG30
307. Rufus
308. SAPO
309. SBIder
310. scooter
311. ScoutAbout
312. script
313. searchpreview
314. searchterms
315. Seekbot
316. Serious
317. Shai
318. shelob
319. Shim-Crawler
320. SickleBot
321. sitecheck
322. SiteSnagger
323. Slurpy Verifier
324. SlySearch
325. SmartDownload
326. sna-
327. snagger
328. Snoopy
329. sogou
330. sootle
331. So-net” bat_bot
332. SpankBot” bat_bot
333. spanner” bat_bot
334. SpeedDownload
335. Spegla
336. Sphere
337. Sphider
338. SpiderBot
339. sproose
340. SQ Webscanner
341. Sqworm
342. Stamina
343. Stanford
344. studybot
345. SuperBot
346. SuperHTTP
347. Surfbot
348. SurfWalker
349. suzuran
350. Szukacz
351. tAkeOut
352. TALWinHttpClient
353. tarspider
354. Teleport
355. Telesoft
356. Templeton
357. TestBED
358. The Intraformant
359. TheNomad
360. TightTwatBot
361. Titan
362. toCrawl/UrlDispatcher
363. True_Robot
364. turingos
365. TurnitinBot
366. Twisted PageGetter
367. UCmore
368. UdmSearch
369. UMBC
370. UniversalFeedParser
371. URL Control
372. URLGetFile
373. URLy Warning
374. URL_Spider_Pro
375. UtilMind
376. vayala
377. vobsub
378. VCI
379. VoidEYE
380. VoilaBot
381. voyager
382. w3mir
383. Web Image Collector
384. Web Sucker
385. Web2WAP
386. WebaltBot
387. WebAuto
388. WebBandit
389. WebCapture
390. webcollage
391. WebCopier
392. WebCopy
393. WebEMailExtrac
394. WebEnhancer
395. WebFetch
396. WebFilter
397. WebFountain
398. WebGo
399. WebLeacher
400. WebMiner
401. WebMirror
402. WebReaper
403. WebSauger
404. WebSnake
405. Website
406. WebStripper
407. WebVac
408. webwalk
409. WebWhacker
410. WebZIP
411. Wells Search
412. WEP Search 00
413. WeRelateBot
414. Wget
415. WhosTalking
416. Widow
417. Wildsoft Surfer
418. WinHttpRequest
419. WinHTTrack
420. WUMPUS
421. WWWOFFLE
422. wwwster
423. WWW-Collector
424. Xaldon
425. Xenu's
426. Xenus
427. XGET
428. Y!TunnelPro
429. YahooYSMcm
430. YaDirectBot
431. Yeti
432. Zade
433. ZBot
434. zerxbot
435. Zeus
436. ZyBorg
ترجمه: علیرضا اسکندرپور
منبع: askapache.com
با تشکر از: جناب فرمانی