You have a list of most common English words and want to check all domains like https://WORD.com. It's a good chance that such a domain is registered. Check it with curl.
Obviously, this is a problem for parallel processing. Run hundred curl processes in parallel. Go PL book has many examples like that. They do this via Go channels.
Of course, Go PL is cool. But my philosophy is to use simplest possible tool for a task. I know that Go channel is sophisticated/advanced Unix pipe. Can I use bash and pipes for that?
Here I create pipe, run workers in background, process a list of English words and feed to pipe. Then I send 'stop' string to pipe, for each worker and wait for complete shutdown.
#!/usr/bin/env bash # run it as: cat words.txt | ./main.sh WORKERS=128 PIPE_FNAME=testpipe mkfifo $PIPE_FNAME mkdir out # run all workers: for i in $(seq 1 $WORKERS) do ./worker.sh $PIPE_FNAME $i & done # read from stdin and pass it to workers while read line do #echo main: $line echo $line > $PIPE_FNAME # flush pipe. https://stackoverflow.com/questions/3348614/how-to-flush-a-pipe-using-bash dd if=$PIPE_FNAME iflag=nonblock of=/dev/null 2> /dev/null done # send stop to workers: for i in $(seq 1 $WORKERS) do echo stop > $PIPE_FNAME dd if=$PIPE_FNAME iflag=nonblock of=/dev/null 2> /dev/null done # wait all workers to exit: wait rm $PIPE_FNAME
And this is worker.sh, sucking from pipe and run curl. Exits if it got 'stop' string:
#!/usr/bin/env bash while true; do tmp=$(cat $1) if [ "$tmp" = "stop" ]; then echo $2 stopping exit fi if [ -z "$tmp" ] then true #echo empty string received else echo $2 received: $tmp DOMAIN=https://$tmp.com FNAME=out/$tmp.com if [ ! -f $FNAME ]; then curl --connect-timeout 5 $DOMAIN > $FNAME 2>/dev/null #curl --connect-timeout 5 $DOMAIN > $FNAME fi fi done
This may be a simplest possible solution, but problems exist:
ps a | grep worker.sh | awk '{print $1}' | xargs kill -9
As seen on reddit: 1, 2, HN, twitter.
Yes, I know about these lousy Disqus ads. Please use adblocker. I would consider to subscribe to 'pro' version of Disqus if the signal/noise ratio in comments would be good enough.