How to Get XML Data with cURL: The Complete Guide

When it comes to fetching data from the web, few tools are as powerful and versatile as cURL. Short for "Client URL", cURL is a command-line tool for transferring data using a variety of protocols. It lets you make HTTP requests, download files, set headers and options, and work with many types of data – including XML.

In this guide, you‘ll learn how to leverage cURL to fetch and parse XML data from the web. We‘ll cover making HTTP requests, setting headers, parsing response data, saving data to files, and lots of handy tips and tricks. By the end of this post, you‘ll be a pro at slurping up XML with cURL!

Prerequisites

Before we dive in, make sure you have cURL installed on your system. On most Unix-like systems (Linux and macOS), cURL is installed by default. You can check if it‘s available by opening a terminal and running:

curl --version

If you get version output, you‘re good to go! If the curl command isn‘t found, check the official cURL download page for installation instructions.

What is cURL?

As mentioned, cURL is a command-line tool for transferring data to and from a server. It supports a huge range of protocols like HTTP, HTTPS, FTP, SFTP, and many more.

Some key cURL features include:

  • Make GET, POST, PUT, DELETE and other HTTP requests
  • Submit data via forms
  • Set custom headers
  • Handle cookies
  • Follow redirects
  • Proxy support
  • Fetch data using authentication
  • And much more!

cURL is commonly used for testing APIs, debugging requests, downloading files, and generally interacting with web servers and data. It‘s an indispensable tool for developers and power users.

What is XML?

XML stands for eXtensible Markup Language. It‘s a markup language and file format for storing, transmitting, and reconstructing arbitrary data. Like HTML, XML uses a tree structure with tags, attributes, and content, but where HTML is meant for displaying data, XML is designed for carrying data.

Here‘s what a snippet of XML looks like:

<?xml version="1.0" encoding="UTF-8"?>
<note>
  <to>John</to>
  <from>Jane</from>
  <heading>Reminder</heading>
  <body>Don‘t forget our meeting today at 2pm!</body>
</note>

XML is used extensively in APIs, data interchange between systems, configuration files, and more. Its simple, flexible structure makes it a popular choice for many data serialization needs.

Making an XML Request with cURL

To fetch an XML document or response using cURL, we need to make an HTTP request with a couple key settings:

  1. Set the HTTP method to GET (or POST for some APIs)
  2. Set the Accept header to application/xml to tell the server we want XML data back

Here‘s what that looks like in a cURL command:

curl -X GET -H "Accept: application/xml" https://api.example.com/data

Let‘s break that down:

  • curl invokes the cURL tool
  • -X GET sets the HTTP method to GET (this is the default so it‘s often omitted)
  • -H "Accept: application/xml" sets the Accept header to fetch XML data
  • The last argument is the URL we‘re requesting data from

When we run this, cURL will make the request and output the response body (the XML data) to the terminal.

Here‘s an example hitting an API endpoint that returns XML:

curl -H "Accept: application/xml" https://httpbin.org/xml

And here‘s a snippet of the response:

<?xml version=‘1.0‘ encoding=‘us-ascii‘?>

<slideshow 
    title="Sample Slide Show"
    date="Date of publication"
    author="Yours Truly"
    >

    <slide type="all">
      <title>Wake up to WonderWidgets!</title>
    </slide>

    <slide type="all">
        <title>Overview</title>
        <item>Why <em>WonderWidgets</em> are great</item>
        <item/>
        <item>Who <em>buys</em> WonderWidgets</item>
    </slide>

</slideshow>

We‘ve made a request for XML data and the API has returned an XML document which cURL outputs to the terminal.

Saving XML Response Data

In many cases we don‘t just want to see the XML response in the terminal, but save it to a file for later processing or use. We can do that by using cURL‘s output options.

To save the response data to a file, we use the -o or --output option followed by a file path:

curl -H "Accept: application/xml" https://httpbin.org/xml -o response.xml

This will save the XML response to a file named response.xml in the current directory. You can specify any file path you‘d like.

Alternatively, you can use the -O or --remote-name option to save the file using the remote filename suggested by the server:

curl -H "Accept: application/xml" https://example.com/data/records.xml -O

This will save the XML data to a local file named records.xml.

Dealing with Redirects

Sometimes when requesting data the server will return a redirect to another URL. By default, cURL doesn‘t follow redirects. If you want cURL to automatically follow redirects to the final destination URL, use the -L or --location option:

curl -H "Accept: application/xml" -L https://example.com/redirect/xml 

Setting Other Headers

Besides the Accept header to get XML, you can set any arbitrary headers using one or more -H options:

curl -H "Accept: application/xml" -H "Cache-Control: no-cache" https://example.com/api/xml

This sets both the Accept and Cache-Control headers in the request.

Debugging with cURL

When things aren‘t working as expected, cURL has some great debugging tools. To see the full request and response details, use the -v or --verbose option:

curl -H "Accept: application/xml" https://example.com/data -v

This will print out the request headers, the response headers, and other useful debugging info.

You can also use --trace or --trace-ascii to get even more detailed trace info.

Parsing and Processing XML Data

Once you‘ve fetched XML data with cURL, you‘ll likely want to parse and extract relevant bits of data from it.

On the command line, you can pipe the XML output to a tool like xmllint or xml2 to format it or run XPath queries:

curl -H "Accept: application/xml" https://example.com/data | xml2 > output.xml

In a programming language like Python, you can use a library like xml.etree.ElementTree to parse the XML string:

import xml.etree.ElementTree as ET

xml_data = ‘‘‘
<?xml version="1.0"?>
<data>
    <items>
        <item>Item 1</item>
        <item>Item 2</item>
    </items>
</data>
‘‘‘

root = ET.fromstring(xml_data)
items = root.findall(‘.//item‘)
for item in items:
    print(item.text)

This parses the XML data into an ElementTree object, finds all <item> elements, and prints out their text content.

cURL and APIs

One of the most common use cases for cURL is working with APIs that return XML data. Many APIs use XML as their primary data format or offer it as an option alongside JSON.

To get XML from an API endpoint, you‘d make a request like:

curl -H "Accept: application/xml" https://api.example.com/v1/records

Some APIs may require authentication via an API token or key. In that case you can pass the token in a header:

curl -H "Accept: application/xml" -H "Authorization: Bearer token123" https://api.example.com/v1/records

Or for some older APIs using query parameter authentication:

curl -H "Accept: application/xml" https://api.example.com/v1/records?api_key=secretkey

Always check the API documentation for the proper authentication method and any required headers.

XML vs JSON

These days, JSON has largely overtaken XML as the most common data format for web APIs. JSON is simpler and more lightweight than XML, but XML still has its place and is used by many legacy APIs.

One advantage of XML is its ability to represent more complex document structures with attributes, namespaces, and a richer schema language. For APIs with sophisticated data models and a document-based architecture, XML may still be the preferred choice.

To fetch JSON instead of XML with cURL, you‘d simply change the Accept header:

curl -H "Accept: application/json" https://api.example.com/data

The server will see that we want JSON and return the response data in that format instead of XML.

Saving XML Data to a File

As we saw earlier, you can save the fetched XML data directly to a file using the -o or -O options. This is handy when you need to persist the data locally for later processing or ingestion into another system.

For example, you might want to fetch daily XML exports from an API and save them to dated files:

curl -H "Accept: application/xml" https://api.example.com/data/2023-01-01 -o data-2023-01-01.xml
curl -H "Accept: application/xml" https://api.example.com/data/2023-01-02 -o data-2023-01-02.xml

This fetches the data for a given date and saves it to a corresponding file. You could easily script this to run daily or hourly.

XML Use Cases

So why might you need to fetch XML data with cURL in the first place? Here are a few common scenarios:

  • Integrating with a legacy API that only supports XML
  • Exporting data from a system that provides XML exports
  • Scraping websites that publish data in XML format
  • Fetching RSS or Atom feeds (which use XML)
  • Querying a SOAP-based web service (SOAP uses XML extensively)

Whenever you‘re working with a system or interface that speaks XML, cURL is a great tool to have in your belt for fetching and debugging that data.

Conclusion

In this guide, we‘ve looked at how to use cURL to fetch XML data from web services and APIs. We covered making HTTP requests, setting the Accept header to specify the XML content type, saving response data to files, handling redirects, debugging requests, and parsing XML data.

While XML may not be as popular as it once was, it‘s still widely used and is an important data format to know. With cURL in your toolbox, you can easily work with XML APIs and data sources to fetch, process, and integrate that data into your applications and workflows.