Automating Proxy Rotation With Bash And Cron

提供: 炎上まとめwiki
2025年9月18日 (木) 07:46時点におけるCarmenLefroy266 (トーク | 投稿記録)による版 (ページの作成:「<br><br><br>To prevent IP blocking during web scraping, proxy rotation is essential.<br><br><br><br>You can achieve seamless proxy rotation without heavy frameworks by le…」)
(差分) ← 古い版 | 最新版 (差分) | 新しい版 → (差分)
ナビゲーションに移動 検索に移動




To prevent IP blocking during web scraping, proxy rotation is essential.



You can achieve seamless proxy rotation without heavy frameworks by leveraging bash and cron.



It avoids the complexity of third-party libraries, hackmd perfect for minimalistic environments or advanced users seeking granular control.



First, prepare a plain-text configuration file listing all available proxies.



Every proxy entry must follow the pattern: IP address, port, and optionally username and password for authenticated proxies.



Valid entries might look like 192.168.10.5:8080 or 192.168.1.20 9090 john doe.



Maintain the integrity of your proxy file by removing dead IPs and adding new working ones.



Next, write a simple bash script to select a random proxy from the list and update a configuration file or environment variable that your scraper uses.



It parses the proxy list, determines the total count, randomly selects a line, and outputs the result to current_proxy.txt.



Here’s a sample implementation:.



bash.



PROXY_FILE=.



to.



if ! -f "$PROXY_FILE" ; then.



echo "Proxy list not found".



exit 1.



fi.



LINE_COUNT=$(wc -l

$LINE_COUNT -eq 0 ]]; then.



echo "No proxies available in the list".



exit 1.



fi.



RANDOM_LINE=$(((RANDOM % LINE_COUNT) + 1)).



tail -n 1 > "$OUTPUT_FILE".



Set execution permissions using chmod +x rotate_proxy.sh.



Run the script manually to verify it outputs exactly one proxy to current_proxy.txt.



Use cron to automate proxy rotation at your desired frequency.



rotate_proxy.sh.



This configuration rotates proxies hourly.



Tune the cron timing to match your scraping rate: 0,30 for half-hourly, or .



Be mindful not to rotate too frequently if your scraper runs continuously, as it may cause connection drops.



Your scraper must reload the proxy from current_proxy.txt on each request to stay updated.



Most tools like curl or wget can use the proxy by reading from a file or environment variable.



In your scraper, set proxy=$(cat .



Maintain proxy quality by automating checks and pruning non-responsive entries.



Enhance the script to test proxy connectivity and auto-remove failed entries.



proxy_rotation.log.



This method is simple, reliable, and scalable.



No heavy dependencies are needed—it runs on any Linux, macOS, or BSD system.



A simple bash script and a cron entry are all you need to sustain uninterrupted, undetected scraping.