cURL is an abbreviation for client URL and a command-line tool to send and receive data from a server. It is distributed to modern Operating Systems, including Windows 10 and Linux distributions. It is a convenient library that allows you to send and receive data to and from websites and is a vital tool for your web scraping needs. Before looking at a simple example, let’s find out what you need to know in order to install it.
sudo apt install curl.
In Windows, open your terminal or command prompt and type:
On the console, it would print the HTML of the page.
cURL transfers data to and from web pages with the help of Internet Protocols. Although initially, cURL was developed to work with HTTP protocols, it currently supports many network protocols such as FTP, IMAP, IMAPS, SMTP, POP3, POP3S, and others.
It also supports POST, GET, PUT, and some of the other methods out there when sending requests. Let’s look at an example of sending some data with the post data.
The above piece of code -d denotes that you’re using the post method to pass your name and some value to the post page of examplewebsite.com.
Now you know what cURL is, and let’s move into its usage with proxies.
Configuring cURL with a proxy address will help people to enhance their data communication with all those proxy features.
Using cURL with a proxy will ensure the users can hide their identity from the server. If the users prefer retrieving information without letting others know their actual identity can configure a proxy address with their cURL command request. In this case, the proxy will forward the user’s request on their behalf, and the actual identity hidden. Proxyscrape provides proxies of all protocol types like HTTPs, Socks4, and Socks5 that can maintain anonymity for all types of requests
When users of one location are restricted from scraping content from sites of other geographical boundaries, proxies will help them to bypass those restrictions. Proxyscrape is providing proxies of multiple countries so that users can choose the required one to bypass the geo-blocks.
You can use proxies to connect with a website using cURL. For instance, proxies are essential in circumstances when you use cURL to scrape data. Then you remain anonymous to the target website that you’re scraping from.
To connect with proxies, you would need the proxy server address, port number, and protocol type, and if authentication is required, you would need to enter the username and password. Let’s look at a simple example: we assume the proxy address is 127.0.0.1 and the port number is 8920. The example mentioned below are fundamentals of connection proxies with cURL, which would work for any proxy service.
The syntax to connecting to a proxy would be:
would replace with:
The above command will route your connection via a proxy to examplewebsite.com.
Now we shall look into an example that requires authentication where username is username and password is password.
Now you can find out which commands to use when connecting cURL with a proxy protocol, using :
Undoubtedly it would return a huge list, and we would focus on the most fundamental command listed below:
In this command, x along with –proxy denotes the proxy details, where you could use either of them as both are correct. However, be mindful that x is case-sensitive.
Also, to be sure that you’re using proxies, you could use the following command:
This command would usually return the IP address of the origin. So if you’re using a proxy server, it would return the IP address of the proxy server instead of yours.
So now, putting it all together, you could send the request as follows:
Also, the below command would be the same as above:
An important fact to keep in mind here is that you should use quotes for both the proxy URL and the target URL as best practice. It is due to the presence of special characters in the URL.
Also, if you get any SSL certificate errors, you need to add the lowercase -k to the end of the command as shown below:
This will allow insecure connections to pass through when using the SSL connection.
When using proxies, the default protocol is HTTP unless otherwise explicitly specified. Therefore both the below commands are correct:
If you wish to have a proxy for cURL, you can create a curl-config file in the following manner.
If you’re on macOS or Linux, first of all, you have to open the terminal and go to your home directory. If there is a .curlrc, you need to open it and create a new empty file. You can use the below commands to navigate to the file:
So then you need to add this line in the file:
Save the file, and now you can use the cURL with proxies. Simply you have to run the cURL normally, and it will read the proxy from the above file:
So the above command will return the path, and you have to navigate to it. Then you need to create _curlrc file and set the proxy the same as with macOS or Linux.