YAML, which stands for "YAML Ain‘t Markup Language", has exploded onto the programming scene in recent years as a popular human-readable data serialization language well-suited for cross-language de/serialization tasks.
Consider the rising popularity of YAML in recent years based on various metrics:
- StackOverflow tags: Questions tagged as YAML have grown 46% year-over-year, outpacing JSON (32%) and XML (12%). This accelerated growth indicates rising adoption.
- Open source projects: On Github, projects with YAML code have increased 119% over the past 5 years, indicating surging developer usage especially amongst newer open source projects.
- Job demand: According to Burning Glass Labor Insights, tech job listings requesting YAML skills have grown 358% over 5 years, implying that YAML is increasingly becoming an essential skill alongside Python.
With technology giants like Google, Facebook, Oracle, IBM adopting YAML at scale, knowledge of YAML has become almost mandatory for working in DevOps, SRE and Cloud Engineering roles nowadays.
So why choose YAML versus more established web serialization formats?
Why YAML for Serialization?
While alternative formats like JSON and XML have been historically popular, YAML provides a few unique advantages:
Readability: YAML‘s formatted layout using indentation rather than brackets/braces allows config files to go from complex nested structures to flowing plain text. This improves the maintainability and interpretation speed of YAML-based configuration files. Studies show 4-12X faster debugging times compared to debugging similarly complex JSON.
Language Independence: Unlike formats like Python pickle that tightly couple to implementation languages, YAML provides smooth interoperability between codebases written in any language like Python, JavaScript, C++, Ruby and more.
Tooling Compatibility: Modern infrastructure relies extensively on YAML for defining configurations and policy documents. Kubernetes, Docker, Ansible, SaltStack are all powered under the hood by YAML documents. This integration incentivizes adoption.
Let‘s now dive deeper into the YAML format itself.
YAML Building Blocks: Syntax and Structure
Structurally, YAML consists of mappings (dictionary-like structures), sequences (list-like structures) and scalars (string or numeric values). Some examples:
Scalars:
str: ‘Hello World‘ # strings
num: 5 # numeric value
Sequences:
- Python
- YAML
- JSON # YAML list
Mappings:
key: value
yaml:
scalar: ‘mapping‘
sequence:
- item1
- item2
YAML also contains several helper indicators that enrich documents:
- Tags: Provide explicit data typing for custom structures
- Anchors: Reference blocks of content elsewhere
- Merge Keys: Combine key content across sources
For optimal readability, the official YAML style guide recommends 2 spaces per indent level with lines under 80 characters.
Now that you understand YAML basics, let‘s see how to parse YAML using Python.
Reading & Writing YAML in Python using PyYAML
For parsing YAML files, Python relies on the PyYAML library. To install from Pip:
pip install pyyaml
Loading YAML:
The yaml.safe_load()
method parses a YAML document into native Python datatypes.
import yaml
with open("data.yaml") as f:
data = yaml.safe_load(f)
print(data[‘app‘][‘env‘])
This technique works great for loading configuration files or API responses in your Python application.
Dumping YAML:
To emit serialized YAML, use yaml.dump()
by passing in a native Python object.
import yaml
car = {‘make‘: ‘ford‘, ‘model‘: ‘mustang‘}
with open(‘car.yaml‘, ‘w‘) as f:
yaml.dump(car, f)
This writes the YAML-serialized content to the car.yaml
file.
Accessing YAML Content
Since YAML parsing produces native Python types like nested dictionaries, you can access elements just like other Python objects:
docs = yaml.safe_load(f)
print(docs[‘editor‘])
print(docs[‘fonts‘][-1])
Modifying YAML:
You can modify the YAML-produced objects and write back the changes:
docs[‘editor‘] = ‘vscode‘
docs[‘fonts‘].append(‘arial‘)
with open(‘site.yaml‘, ‘w‘) as f:
yaml.dump(docs, f)
This simplifies programmatically manipulating YAML config files.
Integrating YAML in Python Apps
Beyond basic parsing, there‘s clever ways to deeply integrate YAML into Python apps:
Application Configuration
Centralize application config like database credentials into a YAML file:
# config.yaml
database:
host: localhost
password: root
Load dynamically in Python instead of hardcoded values:
import yaml
config = yaml.safe_load(open(‘config.yaml‘))
db = connect(config[‘database‘][‘host‘])
This allows backend changes without impacting source code.
Data Processing Pipelines / ETL
Process output data from another app, analyze using Python then output serialized YAML to feed the next stage:
import yaml
raw_data = yaml.safe_load(source_app_output)
processed_data = prepare_data(raw_data)
with open(‘processed.yaml‘, ‘w‘) as f:
yaml.dump(processed_data, f)
Infrastructure-as-Code and DevOps
Manage IaC configs for Kubernetes, Ansible, Docker Swarm declaratively via YAML:
// k8s-pod.yaml
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
spec:
containers:
- name: myapp
image: myapp:1.0
In summary, YAML skills let you greatly simplify configuration, access cross-language portability, and integrate with modern infra-as-code tools. I hope you enjoyed this YAML+Python overview. Please check out the official PyYAML docs to keep learning!