Multithread url fetch using PHP.

To be honest, that’s not a good idea to use fork in php since it eats memory very aggressively. However, in some situations this can be really useful or can be a temporary solution.

Here is an example of using php function pcntl_fork:


$urls = file('urls.txt');
foreach ($urls as &$url) {

$pid=pcntl_fork();
if ($pid == -1) {
die('could not fork');
}
elseif ($pid)
{
}
else
{
// child
file_put_contents(mt_rand(1, 999999), file_get_contents($url));
exit;
}
}

//parent
while (pcntl_wait($status)>0) {};

?>

Another useful function that can be used for downloading from several urls at the same time:


function multi_request($urls)
{
$curly = array();
$result = array();
$mh = curl_multi_init();

foreach ($urls as $id => $url) {
$curly[$id] = curl_init();
curl_setopt($curly[$id], CURLOPT_URL, $url);
curl_setopt($curly[$id], CURLOPT_HEADER, 0);
curl_setopt($curly[$id], CURLOPT_RETURNTRANSFER, 1);
curl_setopt($curly[$id], CURLOPT_TIMEOUT, 30);
curl_setopt($curly[$id], CURLOPT_FOLLOWLOCATION, true);
curl_setopt($curly[$id], CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($curly[$id], CURLOPT_SSL_VERIFYHOST, 0);
curl_multi_add_handle($mh, $curly[$id]);
}

$running = null;
do {
curl_multi_exec($mh, $running);
} while($running > 0);

foreach($curly as $id => $c) {
$result[$id] = curl_multi_getcontent($c);
curl_multi_remove_handle($mh, $c);
}
curl_multi_close($mh);
return $result;
}
?>

A kind of multithreading can be also achieved with popen function and curl or wget:

foreach($urls as $url)
{
$pps[] = popen(”wget -O - “.escapeshellarg($url) , ‘r’);
}
foreach($pps as $pp)
{
$out = fread($pp,1024*1024);
// do something with the output
}

Of course the last example is not a real multithreading since have to process the output in a linear manner, but the downloading occurs in parallel.
In addition, this allows using external commands like wget and all their features.