محدود کردن روبات های بد توسط .htaccess

Cyletech

Member
سلام دوستان،

در این آموزش ما دو متد برای محدود کردن روبات های بد توسط .htaccess به شما آموزش خواهیم داد.


متد اول - استفاده از RewriteRules

PHP:
ErrorDocument 403 /403.html
 
RewriteEngine On
RewriteBase /
 
# IF THE UA STARTS WITH THESE
RewriteCond %{HTTP_USER_AGENT} ^(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) [NC,OR]
RewriteCond %{HTTP_USER_AGENT} ^(libwww-perl|widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) [NC,OR]
 
# STARTS WITH WEB
RewriteCond %{HTTP_USER_AGENT} ^web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) [NC,OR]
 
# ANYWHERE IN UA -- GREEDY REGEX
RewriteCond %{HTTP_USER_AGENT} ^.*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures).*$ [NC]
 
# ISSUE 403 / SERVE ERRORDOCUMENT
RewriteRule . - [F,L]

Alternate RewriteCond Rules

PHP:
RewriteEngine on
 
#Block spambots
RewriteCond %{HTTP:User-Agent} (?:Alexibot|Art-Online|asterias|BackDoorbot|Black.Hole|\
BlackWidow|BlowFish|botALot|BuiltbotTough|Bullseye|BunnySlippers|Cegbfeieh|Cheesebot|\
CherryPicker|ChinaClaw|CopyRightCheck|cosmos|Crescent|Custo|DISCo|DittoSpyder|DownloadsDemon|\
eCatch|EirGrabber|EmailCollector|EmailSiphon|EmailWolf|EroCrawler|ExpresssWebPictures|ExtractorPro|\
EyeNetIE|FlashGet|Foobot|FrontPage|GetRight|GetWeb!|Go-Ahead-Got-It|Go!Zilla|GrabNet|Grafula|\
Harvest|hloader|HMView|httplib|HTTrack|humanlinks|ImagesStripper|ImagesSucker|IndysLibrary|\
InfonaviRobot|InterGET|Internet\sNinja|Jennybot|JetCar|JOC\sWeb\sSpider|Kenjin.Spider|Keyword.Density|\
larbin|LeechFTP|Lexibot|libWeb/clsHTTP|LinkextractorPro|LinkScan/8.1a.Unix|LinkWalker|lwp-trivial|\
Mass\sDownloader|Mata.Hari|Microsoft.URL|MIDown\stool|MIIxpc|Mister.PiX|Mister\sPiX|moget|\
Mozilla/3.Mozilla/2.01|Mozilla.*NEWT|Navroad|NearSite|NetAnts|NetMechanic|NetSpider|Net\sVampire|\
NetZIP|NICErsPRO|NPbot|Octopus|Offline.Explorer|Offline\***plorer|Offline\sNavigator|Openfind|\
Pagerabber|Papa\sFoto|pavuk|pcBrowser|Program\sShareware\s1|ProPowerbot/2.14|ProWebWalker|ProWebWalker|\
psbot/0.1|QueryN.Metasearch|ReGet|RepoMonkey|RMA|SiteSnagger|SlySearch|SmartDownload|Spankbot|spanner|\
Superbot|SuperHTTP|Surfbot|suzuran|Szukacz/1.4|tAkeOut|Teleport|Teleport\sPro|Telesoft|The.Intraformant|\
TheNomad|TightTwatbot|Titan|toCrawl/UrlDispatcher|toCrawl/UrlDispatcher|True_Robot|turingos|\
Turnitinbot/1.5|URLy.Warning|VCI|VoidEYE|WebAuto|WebBandit|WebCopier|WebEMailExtrac.*|WebEnhancer|\
WebFetch|WebGo\sIS|Web.Image.Collector|Web\sImage\sCollector|WebLeacher|WebmasterWorldForumbot|\
WebReaper|WebSauger|Website\***tractor|Website.Quester|Website\sQuester|Webster.Pro|WebStripper|\
Web\sSucker|WebWhacker|WebZip|Wget|Widow|[Ww]eb[Bb]andit|WWW-Collector-E|WWWOFFLE|\
Xaldon\sWebSpider|Xenu's|Zeus) [NC]
RewriteRule .? - [F]

متد دوم - با استفاده از SetEnvIfNoCase

PHP:
ErrorDocument 403 /403.html
 
# IF THE UA STARTS WITH THESE
SetEnvIfNoCase ^User-Agent$ .*(aesop_com_spiderman|alexibot|backweb|bandit|batchftp|bigfoot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(black.?hole|blackwidow|blowfish|botalot|buddy|builtbottough|bullseye) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cheesebot|cherrypicker|chinaclaw|collector|copier|copyrightcheck) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(cosmos|crescent|curl|custo|da|diibot|disco|dittospyder|dragonfly) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(drip|easydl|ebingbong|ecatch|eirgrabber|emailcollector|emailsiphon) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(emailwolf|erocrawler|exabot|eyenetie|filehound|flashget|flunky) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(frontpage|getright|getweb|go.?zilla|go-ahead-got-it|gotit|grabnet) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(grafula|harvest|hloader|hmview|httplib|httrack|humanlinks|ilsebot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(infonavirobot|infotekies|intelliseek|interget|iria|jennybot|jetcar) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(joc|justview|jyxobot|kenjin|keyword|larbin|leechftp|lexibot|lftp|libweb) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(likse|linkscan|linkwalker|lnspiderguy|lwp|magnet|mag-net|markwatch) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mata.?hari|memo|microsoft.?url|midown.?tool|miixpc|mirror|missigua) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(mister.?pix|moget|mozilla.?newt|nameprotect|navroad|backdoorbot|nearsite) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(net.?vampire|netants|netcraft|netmechanic|netspider|nextgensearchbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(attach|nicerspro|nimblecrawler|npbot|octopus|offline.?explorer) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(offline.?navigator|openfind|outfoxbot|pagegrabber|papa|pavuk) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(pcbrowser|php.?version.?tracker|pockey|propowerbot|prowebwalker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(psbot|pump|queryn|recorder|realdownload|reaper|reget|true_robot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(repomonkey|rma|internetseer|sitesnagger|siphon|slysearch|smartdownload) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(snake|snapbot|snoopy|sogou|spacebison|spankbot|spanner|sqworm|superbot) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(superhttp|surfbot|asterias|suzuran|szukacz|takeout|teleport) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(telesoft|the.?intraformant|thenomad|tighttwatbot|titan|urldispatcher) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(turingos|turnitinbot|urly.?warning|vacuum|vci|voideye|whacker) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(widow|wisenutbot|wwwoffle|xaldon|xenu|zeus|zyborg|anonymouse) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*web(zip|emaile|enhancer|fetch|go.?is|auto|bandit|clip|copier|master|reaper|sauger|site.?quester|whack) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(craftbot|download|extract|stripper|sucker|ninja|clshttp|webspider|leacher|collector|grabber|webpictures) HTTP_SAFE_BADBOT
SetEnvIfNoCase ^User-Agent$ .*(libwww-perl|aesop_com_spiderman) HTTP_SAFE_BADBOT
Deny from env=HTTP_SAFE_BADBOT

لیست روبات های بد وب

PHP:
   1. WebBandit
   2. 2icommerce
   3. Accoona
   4. ActiveTouristBot
   5. adressendeutschland
   6. aipbot
   7. Alexibot
   8. Alligator
   9. AllSubmitter
  10. almaden
  11. anarchie
  12. Anonymous
  13. Apexoo
  14. Aqua_Products
  15. asterias
  16. ASSORT
  17. ATHENS
  18. AtHome
  19. Atomz
  20. attache
  21. autoemailspider
  22. autohttp
  23. b2w
  24. bew
  25. BackDoorBot
  26. Badass
  27. Baiduspider
  28. Baiduspider+
  29. BecomeBot
  30. berts
  31. Bitacle
  32. Biz360
  33. Black.Hole
  34. BlackWidow
  35. bladder fusion
  36. Blog Checker
  37. BlogPeople
  38. Blogshares Spiders
  39. Bloodhound
  40. BlowFish
  41. Board Bot
  42. Bookmark search tool
  43. BotALot
  44. BotRightHere
  45. Bot mailto:[email protected]
  46. Bropwers
  47. Browsezilla
  48. BuiltBotTough
  49. Bullseye
  50. BunnySlippers
  51. Cegbfeieh
  52. CFNetwork
  53. CheeseBot
  54. CherryPicker
  55. Crescent
  56. charlotte/
  57. ChinaClaw
  58. Convera
  59. Copernic
  60. CopyRightCheck
  61. cosmos
  62. Crescent
  63. c-spider
  64. curl
  65. Custo
  66. Cyberz
  67. DataCha0s
  68. Daum
  69. Deweb
  70. Digger
  71. Digimarc
  72. digout4uagent
  73. DIIbot
  74. DISCo
  75. DittoSpyder
  76. DnloadMage
  77. Download
  78. dragonfly
  79. DreamPassport
  80. DSurf
  81. DTS Agent
  82. dumbot
  83. DynaWeb
  84. e-collector
  85. EasyDL
  86. EBrowse
  87. eCatch
  88. ecollector
  89. edgeio
  90. [email protected]
  91. EirGrabber
  92. Email Extractor
  93. EmailCollector
  94. EmailSiphon
  95. EmailWolf
  96. EmeraldShield
  97. Enterprise_Search
  98. EroCrawler
  99. ESurf
 100. Eval
 101. Everest-Vulcan
 102. Exabot
 103. Express
 104. Extractor
 105. ExtractorPro
 106. EyeNetIE
 107. FairAd
 108. fastlwspider
 109. fetch
 110. FEZhead
 111. FileHound
 112. findlinks
 113. Flaming AttackBot
 114. FlashGet
 115. FlickBot
 116. Foobot
 117. Forex
 118. Franklin Locator
 119. FreshDownload
 120. FrontPage
 121. FSurf
 122. Gaisbot
 123. Gamespy_Arcade
 124. genieBot
 125. GetBot
 126. Getleft
 127. GetRight
 128. GetWeb!
 129. Go!Zilla
 130. Go-Ahead-Got-It
 131. GOFORITBOT
 132. GrabNet
 133. Grafula
 134. grub
 135. Harvest
 136. Hatena Antenna
 137. heritrix
 138. HLoader
 139. HMView
 140. holmes
 141. HooWWWer
 142. HouxouCrawler
 143. HTTPGet
 144. httplib
 145. HTTPRetriever
 146. HTTrack
 147. humanlinks
 148. IBM_Planetwide
 149. iCCrawler
 150. ichiro
 151. iGetter
 152. Image Stripper
 153. Image Sucker
 154. imagefetch
 155. imds_monitor
 156. IncyWincy
 157. Industry Program
 158. Indy
 159. InetURL
 160. InfoNaviRobot
 161. InstallShield DigitalWizard
 162. InterGET
 163. IRLbot
 164. Iron33
 165. ISSpider
 166. IUPUI Research Bot
 167. Jakarta
 168. java/
 169. JBH Agent
 170. JennyBot
 171. JetCar
 172. jeteye
 173. jeteyebot
 174. JoBo
 175. JOC Web Spider
 176. Kapere
 177. Kenjin
 178. Keyword Density
 179. KRetrieve
 180. ksoap
 181. KWebGet
 182. LapozzBot
 183. larbin
 184. leech
 185. LeechFTP
 186. LeechGet
 187. leipzig.de
 188. LexiBot
 189. libWeb
 190. libwww-FM
 191. libwww-perl
 192. LightningDownload
 193. LinkextractorPro
 194. Linkie
 195. LinkScan
 196. linktiger
 197. LinkWalker
 198. lmcrawler
 199. LNSpiderguy
 200. LocalcomBot
 201. looksmart
 202. LWP
 203. Mac Finder
 204. Mail Sweeper
 205. mark.blonin
 206. MaSagool
 207. Mass
 208. Mata Hari
 209. MCspider
 210. MetaProducts Download Express
 211. Microsoft Data Access
 212. Microsoft URL Control
 213. MIDown
 214. MIIxpc
 215. Mirror
 216. Missauga
 217. Missouri College Browse
 218. Mister
 219. Monster
 220. mkdb
 221. moget
 222. Moreoverbot
 223. mothra/netscan
 224. MovableType
 225. Mozi!
 226. Mozilla/22
 227. Mozilla/3.0 (compatible)
 228. Mozilla/5.0 (compatible; MSIE 5.0)
 229. MSIE_6.0
 230. MSIECrawler
 231. MSProxy
 232. MVAClient
 233. MyFamilyBot
 234. MyGetRight
 235. nameprotect
 236. NASA Search
 237. Naver
 238. Navroad
 239. NearSite
 240. NetAnts
 241. netattache
 242. NetCarta
 243. NetMechanic
 244. NetResearchServer
 245. NetSpider
 246. NetZIP
 247. Net Vampire
 248. NEWT ActiveX
 249. Nextopia
 250. NICErsPRO
 251. ninja
 252. NimbleCrawler
 253. noxtrumbot
 254. NPBot
 255. Octopus
 256. Offline
 257. OK Mozilla
 258. OmniExplorer
 259. OpaL
 260. Openbot
 261. Openfind
 262. OpenTextSiteCrawler
 263. Oracle Ultra Search
 264. OutfoxBot
 265. P3P
 266. PackRat
 267. PageGrabber
 268. PagmIEDownload
 269. panscient
 270. Papa Foto
 271. pavuk
 272. pcBrowser
 273. perl
 274. PerMan
 275. PersonaPilot
 276. PHP version
 277. PlantyNet_WebRobot
 278. playstarmusic
 279. Plucker
 280. Port Huron
 281. Program Shareware
 282. Progressive Download
 283. ProPowerBot
 284. prospector
 285. ProWebWalker
 286. Prozilla
 287. psbot
 288. psycheclone
 289. puf
 290. PushSite
 291. PussyCat
 292. PuxaRapido
 293. Python-urllib
 294. QuepasaCreep
 295. QueryN
 296. Radiation
 297. RealDownload
 298. RedCarpet
 299. RedKernel
 300. ReGet
 301. relevantnoise
 302. RepoMonkey
 303. RMA
 304. Rover
 305. Rsync
 306. RTG30
 307. Rufus
 308. SAPO
 309. SBIder
 310. scooter
 311. ScoutAbout
 312. script
 313. searchpreview
 314. searchterms
 315. Seekbot
 316. Serious
 317. Shai
 318. shelob
 319. Shim-Crawler
 320. SickleBot
 321. sitecheck
 322. SiteSnagger
 323. Slurpy Verifier
 324. SlySearch
 325. SmartDownload
 326. sna-
 327. snagger
 328. Snoopy
 329. sogou
 330. sootle
 331. So-net” bat_bot
 332. SpankBot” bat_bot
 333. spanner” bat_bot
 334. SpeedDownload
 335. Spegla
 336. Sphere
 337. Sphider
 338. SpiderBot
 339. sproose
 340. SQ Webscanner
 341. Sqworm
 342. Stamina
 343. Stanford
 344. studybot
 345. SuperBot
 346. SuperHTTP
 347. Surfbot
 348. SurfWalker
 349. suzuran
 350. Szukacz
 351. tAkeOut
 352. TALWinHttpClient
 353. tarspider
 354. Teleport
 355. Telesoft
 356. Templeton
 357. TestBED
 358. The Intraformant
 359. TheNomad
 360. TightTwatBot
 361. Titan
 362. toCrawl/UrlDispatcher
 363. True_Robot
 364. turingos
 365. TurnitinBot
 366. Twisted PageGetter
 367. UCmore
 368. UdmSearch
 369. UMBC
 370. UniversalFeedParser
 371. URL Control
 372. URLGetFile
 373. URLy Warning
 374. URL_Spider_Pro
 375. UtilMind
 376. vayala
 377. vobsub
 378. VCI
 379. VoidEYE
 380. VoilaBot
 381. voyager
 382. w3mir
 383. Web Image Collector
 384. Web Sucker
 385. Web2WAP
 386. WebaltBot
 387. WebAuto
 388. WebBandit
 389. WebCapture
 390. webcollage
 391. WebCopier
 392. WebCopy
 393. WebEMailExtrac
 394. WebEnhancer
 395. WebFetch
 396. WebFilter
 397. WebFountain
 398. WebGo
 399. WebLeacher
 400. WebMiner
 401. WebMirror
 402. WebReaper
 403. WebSauger
 404. WebSnake
 405. Website
 406. WebStripper
 407. WebVac
 408. webwalk
 409. WebWhacker
 410. WebZIP
 411. Wells Search
 412. WEP Search 00
 413. WeRelateBot
 414. Wget
 415. WhosTalking
 416. Widow
 417. Wildsoft Surfer
 418. WinHttpRequest
 419. WinHTTrack
 420. WUMPUS
 421. WWWOFFLE
 422. wwwster
 423. WWW-Collector
 424. Xaldon
 425. Xenu's
 426. Xenus
 427. XGET
 428. Y!TunnelPro
 429. YahooYSMcm
 430. YaDirectBot
 431. Yeti
 432. Zade
 433. ZBot
 434. zerxbot
 435. Zeus
 436. ZyBorg

ترجمه: علیرضا اسکندرپور
منبع: askapache.com
با تشکر از: جناب فرمانی
 

جدیدترین ارسال ها

بالا