When it comes to fetching data from the web, few tools are as powerful and versatile as cURL. Short for "Client URL", cURL is a command-line tool for transferring data using a variety of protocols. It lets you make HTTP requests, download files, set headers and options, and work with many types of data – including XML.
In this guide, you‘ll learn how to leverage cURL to fetch and parse XML data from the web. We‘ll cover making HTTP requests, setting headers, parsing response data, saving data to files, and lots of handy tips and tricks. By the end of this post, you‘ll be a pro at slurping up XML with cURL!
Prerequisites
Before we dive in, make sure you have cURL installed on your system. On most Unix-like systems (Linux and macOS), cURL is installed by default. You can check if it‘s available by opening a terminal and running:
curl --version
If you get version output, you‘re good to go! If the curl
command isn‘t found, check the official cURL download page for installation instructions.
What is cURL?
As mentioned, cURL is a command-line tool for transferring data to and from a server. It supports a huge range of protocols like HTTP, HTTPS, FTP, SFTP, and many more.
Some key cURL features include:
- Make GET, POST, PUT, DELETE and other HTTP requests
- Submit data via forms
- Set custom headers
- Handle cookies
- Follow redirects
- Proxy support
- Fetch data using authentication
- And much more!
cURL is commonly used for testing APIs, debugging requests, downloading files, and generally interacting with web servers and data. It‘s an indispensable tool for developers and power users.
What is XML?
XML stands for eXtensible Markup Language. It‘s a markup language and file format for storing, transmitting, and reconstructing arbitrary data. Like HTML, XML uses a tree structure with tags, attributes, and content, but where HTML is meant for displaying data, XML is designed for carrying data.
Here‘s what a snippet of XML looks like:
<?xml version="1.0" encoding="UTF-8"?>
<note>
<to>John</to>
<from>Jane</from>
<heading>Reminder</heading>
<body>Don‘t forget our meeting today at 2pm!</body>
</note>
XML is used extensively in APIs, data interchange between systems, configuration files, and more. Its simple, flexible structure makes it a popular choice for many data serialization needs.
Making an XML Request with cURL
To fetch an XML document or response using cURL, we need to make an HTTP request with a couple key settings:
- Set the HTTP method to GET (or POST for some APIs)
- Set the
Accept
header toapplication/xml
to tell the server we want XML data back
Here‘s what that looks like in a cURL command:
curl -X GET -H "Accept: application/xml" https://api.example.com/data
Let‘s break that down:
curl
invokes the cURL tool-X GET
sets the HTTP method to GET (this is the default so it‘s often omitted)-H "Accept: application/xml"
sets the Accept header to fetch XML data- The last argument is the URL we‘re requesting data from
When we run this, cURL will make the request and output the response body (the XML data) to the terminal.
Here‘s an example hitting an API endpoint that returns XML:
curl -H "Accept: application/xml" https://httpbin.org/xml
And here‘s a snippet of the response:
<?xml version=‘1.0‘ encoding=‘us-ascii‘?>
<slideshow
title="Sample Slide Show"
date="Date of publication"
author="Yours Truly"
>
<slide type="all">
<title>Wake up to WonderWidgets!</title>
</slide>
<slide type="all">
<title>Overview</title>
<item>Why <em>WonderWidgets</em> are great</item>
<item/>
<item>Who <em>buys</em> WonderWidgets</item>
</slide>
</slideshow>
We‘ve made a request for XML data and the API has returned an XML document which cURL outputs to the terminal.
Saving XML Response Data
In many cases we don‘t just want to see the XML response in the terminal, but save it to a file for later processing or use. We can do that by using cURL‘s output options.
To save the response data to a file, we use the -o
or --output
option followed by a file path:
curl -H "Accept: application/xml" https://httpbin.org/xml -o response.xml
This will save the XML response to a file named response.xml
in the current directory. You can specify any file path you‘d like.
Alternatively, you can use the -O
or --remote-name
option to save the file using the remote filename suggested by the server:
curl -H "Accept: application/xml" https://example.com/data/records.xml -O
This will save the XML data to a local file named records.xml
.
Dealing with Redirects
Sometimes when requesting data the server will return a redirect to another URL. By default, cURL doesn‘t follow redirects. If you want cURL to automatically follow redirects to the final destination URL, use the -L
or --location
option:
curl -H "Accept: application/xml" -L https://example.com/redirect/xml
Setting Other Headers
Besides the Accept
header to get XML, you can set any arbitrary headers using one or more -H
options:
curl -H "Accept: application/xml" -H "Cache-Control: no-cache" https://example.com/api/xml
This sets both the Accept
and Cache-Control
headers in the request.
Debugging with cURL
When things aren‘t working as expected, cURL has some great debugging tools. To see the full request and response details, use the -v
or --verbose
option:
curl -H "Accept: application/xml" https://example.com/data -v
This will print out the request headers, the response headers, and other useful debugging info.
You can also use --trace
or --trace-ascii
to get even more detailed trace info.
Parsing and Processing XML Data
Once you‘ve fetched XML data with cURL, you‘ll likely want to parse and extract relevant bits of data from it.
On the command line, you can pipe the XML output to a tool like xmllint
or xml2
to format it or run XPath queries:
curl -H "Accept: application/xml" https://example.com/data | xml2 > output.xml
In a programming language like Python, you can use a library like xml.etree.ElementTree
to parse the XML string:
import xml.etree.ElementTree as ET
xml_data = ‘‘‘
<?xml version="1.0"?>
<data>
<items>
<item>Item 1</item>
<item>Item 2</item>
</items>
</data>
‘‘‘
root = ET.fromstring(xml_data)
items = root.findall(‘.//item‘)
for item in items:
print(item.text)
This parses the XML data into an ElementTree
object, finds all <item>
elements, and prints out their text content.
cURL and APIs
One of the most common use cases for cURL is working with APIs that return XML data. Many APIs use XML as their primary data format or offer it as an option alongside JSON.
To get XML from an API endpoint, you‘d make a request like:
curl -H "Accept: application/xml" https://api.example.com/v1/records
Some APIs may require authentication via an API token or key. In that case you can pass the token in a header:
curl -H "Accept: application/xml" -H "Authorization: Bearer token123" https://api.example.com/v1/records
Or for some older APIs using query parameter authentication:
curl -H "Accept: application/xml" https://api.example.com/v1/records?api_key=secretkey
Always check the API documentation for the proper authentication method and any required headers.
XML vs JSON
These days, JSON has largely overtaken XML as the most common data format for web APIs. JSON is simpler and more lightweight than XML, but XML still has its place and is used by many legacy APIs.
One advantage of XML is its ability to represent more complex document structures with attributes, namespaces, and a richer schema language. For APIs with sophisticated data models and a document-based architecture, XML may still be the preferred choice.
To fetch JSON instead of XML with cURL, you‘d simply change the Accept
header:
curl -H "Accept: application/json" https://api.example.com/data
The server will see that we want JSON and return the response data in that format instead of XML.
Saving XML Data to a File
As we saw earlier, you can save the fetched XML data directly to a file using the -o
or -O
options. This is handy when you need to persist the data locally for later processing or ingestion into another system.
For example, you might want to fetch daily XML exports from an API and save them to dated files:
curl -H "Accept: application/xml" https://api.example.com/data/2023-01-01 -o data-2023-01-01.xml
curl -H "Accept: application/xml" https://api.example.com/data/2023-01-02 -o data-2023-01-02.xml
This fetches the data for a given date and saves it to a corresponding file. You could easily script this to run daily or hourly.
XML Use Cases
So why might you need to fetch XML data with cURL in the first place? Here are a few common scenarios:
- Integrating with a legacy API that only supports XML
- Exporting data from a system that provides XML exports
- Scraping websites that publish data in XML format
- Fetching RSS or Atom feeds (which use XML)
- Querying a SOAP-based web service (SOAP uses XML extensively)
Whenever you‘re working with a system or interface that speaks XML, cURL is a great tool to have in your belt for fetching and debugging that data.
Conclusion
In this guide, we‘ve looked at how to use cURL to fetch XML data from web services and APIs. We covered making HTTP requests, setting the Accept
header to specify the XML content type, saving response data to files, handling redirects, debugging requests, and parsing XML data.
While XML may not be as popular as it once was, it‘s still widely used and is an important data format to know. With cURL in your toolbox, you can easily work with XML APIs and data sources to fetch, process, and integrate that data into your applications and workflows.