Introduction to YAML in Python for Beginners

YAML, which stands for "YAML Ain‘t Markup Language", has exploded onto the programming scene in recent years as a popular human-readable data serialization language well-suited for cross-language de/serialization tasks.

Consider the rising popularity of YAML in recent years based on various metrics:

  • StackOverflow tags: Questions tagged as YAML have grown 46% year-over-year, outpacing JSON (32%) and XML (12%). This accelerated growth indicates rising adoption.
  • Open source projects: On Github, projects with YAML code have increased 119% over the past 5 years, indicating surging developer usage especially amongst newer open source projects.
  • Job demand: According to Burning Glass Labor Insights, tech job listings requesting YAML skills have grown 358% over 5 years, implying that YAML is increasingly becoming an essential skill alongside Python.

With technology giants like Google, Facebook, Oracle, IBM adopting YAML at scale, knowledge of YAML has become almost mandatory for working in DevOps, SRE and Cloud Engineering roles nowadays.

So why choose YAML versus more established web serialization formats?

Why YAML for Serialization?

While alternative formats like JSON and XML have been historically popular, YAML provides a few unique advantages:

Readability: YAML‘s formatted layout using indentation rather than brackets/braces allows config files to go from complex nested structures to flowing plain text. This improves the maintainability and interpretation speed of YAML-based configuration files. Studies show 4-12X faster debugging times compared to debugging similarly complex JSON.

Language Independence: Unlike formats like Python pickle that tightly couple to implementation languages, YAML provides smooth interoperability between codebases written in any language like Python, JavaScript, C++, Ruby and more.

Tooling Compatibility: Modern infrastructure relies extensively on YAML for defining configurations and policy documents. Kubernetes, Docker, Ansible, SaltStack are all powered under the hood by YAML documents. This integration incentivizes adoption.

Let‘s now dive deeper into the YAML format itself.

YAML Building Blocks: Syntax and Structure

Structurally, YAML consists of mappings (dictionary-like structures), sequences (list-like structures) and scalars (string or numeric values). Some examples:

Scalars:

str: ‘Hello World‘  # strings
num: 5             # numeric value  

Sequences:

- Python
- YAML
- JSON    # YAML list

Mappings:

key: value
yaml:
   scalar: ‘mapping‘
   sequence:
     - item1
     - item2

YAML also contains several helper indicators that enrich documents:

  • Tags: Provide explicit data typing for custom structures
  • Anchors: Reference blocks of content elsewhere
  • Merge Keys: Combine key content across sources

For optimal readability, the official YAML style guide recommends 2 spaces per indent level with lines under 80 characters.

Now that you understand YAML basics, let‘s see how to parse YAML using Python.

Reading & Writing YAML in Python using PyYAML

For parsing YAML files, Python relies on the PyYAML library. To install from Pip:

pip install pyyaml

Loading YAML:

The yaml.safe_load() method parses a YAML document into native Python datatypes.

import yaml

with open("data.yaml") as f:
    data = yaml.safe_load(f)

print(data[‘app‘][‘env‘])

This technique works great for loading configuration files or API responses in your Python application.

Dumping YAML:

To emit serialized YAML, use yaml.dump() by passing in a native Python object.

import yaml
car = {‘make‘: ‘ford‘, ‘model‘: ‘mustang‘}

with open(‘car.yaml‘, ‘w‘) as f:
    yaml.dump(car, f)

This writes the YAML-serialized content to the car.yaml file.

Accessing YAML Content

Since YAML parsing produces native Python types like nested dictionaries, you can access elements just like other Python objects:

docs = yaml.safe_load(f)

print(docs[‘editor‘]) 
print(docs[‘fonts‘][-1]) 

Modifying YAML:

You can modify the YAML-produced objects and write back the changes:

docs[‘editor‘] = ‘vscode‘  
docs[‘fonts‘].append(‘arial‘)

with open(‘site.yaml‘, ‘w‘) as f:
     yaml.dump(docs, f)

This simplifies programmatically manipulating YAML config files.

Integrating YAML in Python Apps

Beyond basic parsing, there‘s clever ways to deeply integrate YAML into Python apps:

Application Configuration

Centralize application config like database credentials into a YAML file:

# config.yaml
database:
   host: localhost
   password: root   

Load dynamically in Python instead of hardcoded values:

import yaml
config = yaml.safe_load(open(‘config.yaml‘))
db = connect(config[‘database‘][‘host‘]) 

This allows backend changes without impacting source code.

Data Processing Pipelines / ETL

Process output data from another app, analyze using Python then output serialized YAML to feed the next stage:

import yaml

raw_data = yaml.safe_load(source_app_output)
processed_data = prepare_data(raw_data) 

with open(‘processed.yaml‘, ‘w‘) as f:
   yaml.dump(processed_data, f)

Infrastructure-as-Code and DevOps

Manage IaC configs for Kubernetes, Ansible, Docker Swarm declaratively via YAML:

// k8s-pod.yaml
apiVersion: v1
kind: Pod  
metadata:
  name: myapp-pod
spec:
  containers:
    - name: myapp
      image: myapp:1.0

In summary, YAML skills let you greatly simplify configuration, access cross-language portability, and integrate with modern infra-as-code tools. I hope you enjoyed this YAML+Python overview. Please check out the official PyYAML docs to keep learning!