Automating Proxy Rotation With Bash And Cron

2025年9月18日 (木) 07:46時点におけるCarmenLefroy266 (トーク | 投稿記録)による版 (ページの作成:「<br><br><br>To prevent IP blocking during web scraping, proxy rotation is essential.<br><br><br><br>You can achieve seamless proxy rotation without heavy frameworks by le…」)
(差分) ← 古い版 | 最新版 (差分) | 新しい版 → (差分)




To prevent IP blocking during web scraping, proxy rotation is essential.



You can achieve seamless proxy rotation without heavy frameworks by leveraging bash and cron.



It avoids the complexity of third-party libraries, hackmd perfect for minimalistic environments or advanced users seeking granular control.



First, prepare a plain-text configuration file listing all available proxies.



Every proxy entry must follow the pattern: IP address, port, and optionally username and password for authenticated proxies.



Valid entries might look like 192.168.10.5:8080 or 192.168.1.20 9090 john doe.



Maintain the integrity of your proxy file by removing dead IPs and adding new working ones.



Next, write a simple bash script to select a random proxy from the list and update a configuration file or environment variable that your scraper uses.



It parses the proxy list, determines the total count, randomly selects a line, and outputs the result to current_proxy.txt.



Here’s a sample implementation:.



bash.



PROXY_FILE=.



to.



if ! -f "$PROXY_FILE" ; then.



echo "Proxy list not found".



exit 1.



fi.



LINE_COUNT=$(wc -l

$LINE_COUNT -eq 0 ]]; then.



echo "No proxies available in the list".



exit 1.



fi.



RANDOM_LINE=$(((RANDOM % LINE_COUNT) + 1)).



tail -n 1 > "$OUTPUT_FILE".



Set execution permissions using chmod +x rotate_proxy.sh.



Run the script manually to verify it outputs exactly one proxy to current_proxy.txt.



Use cron to automate proxy rotation at your desired frequency.



rotate_proxy.sh.



This configuration rotates proxies hourly.



Tune the cron timing to match your scraping rate: 0,30 for half-hourly, or .



Be mindful not to rotate too frequently if your scraper runs continuously, as it may cause connection drops.



Your scraper must reload the proxy from current_proxy.txt on each request to stay updated.



Most tools like curl or wget can use the proxy by reading from a file or environment variable.



In your scraper, set proxy=$(cat .



Maintain proxy quality by automating checks and pruning non-responsive entries.



Enhance the script to test proxy connectivity and auto-remove failed entries.



proxy_rotation.log.



This method is simple, reliable, and scalable.



No heavy dependencies are needed—it runs on any Linux, macOS, or BSD system.



A simple bash script and a cron entry are all you need to sustain uninterrupted, undetected scraping.