What is a Subprocess in Python? A Deep Dive with 10 Practical Examples

Subprocesses empower Python developers to interact with their operating systems on a whole new level. By Dan Alvarez

You‘ve likely used many subprocesses on your computer without even realizing it. That Reddit thread you have opened? A subprocess. That complex database query you just ran? Also a subprocess.

At their core, subprocesses allow programs to delegate tasks to the operating system. The OS then handles running those tasks in the background so the parent program can keep doing its thing.

Python makes working with subprocesses incredibly straightforward with the aptly named subprocess module. It comes baked right into Python without any external libraries required!

In this comprehensive guide, you‘ll learn:

  • What are subprocesses and how do they work?
  • Core concepts of Python‘s subprocess module
  • 10 hands-on subprocess coding examples
  • Best practices and pro tips for Python subprocesses

So if you‘re ready to take your Python automation to the next level, let‘s get started!

What Exactly is a Subprocess?

A subprocess is exactly what it sounds like – a process spawned by another parent process.

Your operating system is constantly juggling countless processes behind the scenes. Anything from opening your web browser to saving a file involves processes waking up, doing their jobs, then shutting back down.

Here‘s a quick metaphor…

Think of your computer like a restaurant. Programs place orders (create subprocesses) which the kitchen (OS) then prepares. While the kitchen handles the orders in parallel, the programs can keep requesting new ones!

The core benefit of subprocesses is concurrency. The parent process doesn‘t have to stop and wait while the subprocess does its work. This makes subprocesses perfect for delegating I/O tasks like reading files, grabbing data from APIs, running intensive jobs, and more.

Let‘s visualize the hierarchy:

[Diagram showing tree structure of parent/child processes linked together]

At the very root is the magical init process that boots up everything else. From there, processes keep spawning more subprocesses recursively downwards.

As a Python developer, you can tap into this functionality to interface with your OS and automate tasks programmatically.

The sky is truly the limit once you master subprocess workflows!

With that foundation laid, let‘s look under the hood at Python‘s subprocess library itself.

A Crash Course on Python‘s Subprocess Module

Before we can start scripting, we need to first understand how Python‘s subprocess module works.

At its core, subprocess enables 3 key things:

  1. Spawning new processes (like running other programs)
  2. Interacting with process input/output (via pipes)
  3. Obtaining return codes (to check for errors)

Let‘s explore some key functions that make this possible:

subprocess.run()

The subprocess.run() function is the simplest interface for running subprocesses.

Introduced in Python 3.5, it handles all the dirty work of setting up pipes, returning outputs, and fetching return codes.

Here is the basic syntax:

import subprocess

result = subprocess.run(args, *, input=None, capture_output=False, shell=False, timeout=None)

You invoke a program by passing the executable name and arguments to args. This returns a CompletedProcess result object containing useful metadata like return codes and outputs.

Let‘s try it out:

import subprocess

result = subprocess.run([‘ls‘, ‘-l‘]) 

print(result.args) # The command executed
print(result.returncode) # 0 means success
print(result.stdout) # Anything printed to stdout

While simple, run() sets up stdout/stdin pipes behind the scenes so you can further manipulate subprocess I/O…

subprocess.Popen()

The subprocess.Popen() function provides maximum flexibility for advanced use cases.

Unlike run(), Popen() starts the subprocess but doesn‘t wait for it to complete. This allows you to interact via inputs and outputs manually.

Here‘s a common Popen workflow:

import subprocess

process = subprocess.Popen(args, 
                          stdout=subprocess.PIPE, 
                          stderr=subprocess.PIPE)

# Interact via stdin/stdout pipes 
output = process.communicate(input=‘Custom input‘)  

# Get return code after it finishes
rc = process.wait()

As you can see, the pipe options give you tons of control to interact at a low level.

Some other key benefits of Popen():

  • Connect subprocess I/O to file handlers
  • Start long-running background tasks
  • Dynamically send input and process outputs

Now let‘s shift gears and put our newfound knowledge to work with some hands-on examples!

10 Practical Subprocess Examples in Python

While the theory may take some time to sink in, you‘ll solidify your understanding of subprocesses fastest by coding them up in projects.

Let‘s go through some common use cases with examples…

1. List a Directory with ls

It always helps to start simple. Let‘s try listing the contents of the current directory using the ls command:

import subprocess

result = subprocess.run([‘ls‘, ‘-l‘])

print(result.stdout)

Most Linux/Unix commands like ls, cat, grep etc can be invoked right within Python scripts thanks to subprocesses.

2. Check if a Program is Installed

Here‘s a handy snippet that checks if a command exists using which:

import subprocess

try:
    subprocess.run([‘which‘, ‘aws‘])
    print(‘aws cli is installed‘)
except FileNotFoundError: # Handle file not found error 
    print(‘aws cli is not installed‘)

By calling the which meta-command, we can try locating any program on the PATH. If it fails, we know it doesn‘t exist.

This demonstrates simple control flow with subprocess return codes.

3. Read and Write Files

Subprocesses make working with files a breeze with built-in Unix utilities like cat:

import subprocess 

# Write content to file
subprocess.run([‘cat‘, ‘Some content‘], stdout=open(‘file.txt‘, ‘w‘))

# Print file contents to console 
print(subprocess.check_output([‘cat‘, ‘file.txt‘]))

Here check_output() captures the output so we can further process it in Python.

4. Web Scraping with curl + jq

Let‘s scrape a JSON API response because why not:

import subprocess
import json

result = subprocess.run(
    [‘curl‘, ‘example.com/api/data‘, ‘|‘, ‘jq‘, ‘.value‘],
    capture_output=True,
    text=True,  
    check=True
)

data = json.loads(result.stdout)
print(f"API value: {data}") 

We pipe the curl response to jq to parse out just the JSON field we need. Subprocess pipelines let you leverage the Unix philosophy!

5. Background Tasks and Jobs

Time to dip our toes into some parallel processing:

import subprocess 
import time

proc = subprocess.Popen([‘ping‘, ‘8.8.8.8‘]) # Runs perpetually

time.sleep(10) # Do other stuff here in the meantime  

proc.terminate() # Kill after some time

Because Popen() starts the subprocess asynchronously, we can force kill it later once we‘re done with other tasks.

This is perfect for ongoing jobs like scraping batches of URLs in parallel across multiple processes.

6. System Administration Automation

Let‘s use subprocesses to script some sysadmin flows:

import subprocess
import secrets 

new_user = ‘johndoe‘
new_password = secrets.token_hex() # Generate random password 

subprocess.run([‘useradd‘, new_user])
subprocess.run([‘passwd‘, new_user, new_password]) # Set password

Now we‘ve created a secure user credential without touching the command line! You can wrap this in a function to reuse anywhere.

Subprocesses are a must for any sysadmin looking to automate their job.

7. Test Python Scripts

Let‘s evolve from code monkeys 🐒 to code scientists 👩‍🔬 by testing hypotheses:

import subprocess
from hypothesis import given, strategies, settings  

@given(platforms=strategies.sampled_from([‘win32‘,‘darwin‘,‘linux‘]))
@settings(max_examples=3) 
def test_platform(platforms):
    result = subprocess.run([‘uname‘], capture_output=True, text=True) 
    assert platforms in result.stdout # Assert OS guess is correct

if __name__ == ‘__main__‘:
    test_platform() # Automagically tries multiple platforms

Here we use the Hypothesis testing library to generate dummy subprocess runs across MacOS, Linux, and Windows platforms. We then assert our script guesses the correct platform each time.

This "scientific method for software" approach allows us to try every possible input case.

8. Database Clients

While Python has great DB connector libraries, sometimes using CLI tools is faster for one-off scripts:

import subprocess

CREATE_QUERY = """
  CREATE TABLE users (
    id int, 
    name varchar(50)
  );  
"""

subprocess.run([‘mysql‘, ‘-e‘, CREATE_QUERY]) # Pass CREATE query 
print(subprocess.check_output([‘mysql‘, ‘-e‘ ‘SHOW TABLES;‘])) # See tables

Here we use the MySQL CLI instead of connecting a library just to create/show tables. The same applies for Redis, MongoDB, Cassandra, PostgreSQL, and other databases.

9. Data Science Pipelines

Let‘s demonstrate piping data into a model training process:

import subprocess
import pandas as pd
from sklearn import linear_model

df = pd.DataFrame(data) # Pandas for preprocessing  

# Pipe CSV data to scikit-learn model training process
result = subprocess.run([‘echo‘, df.to_csv(), ‘|‘, ‘sklearn_train.py‘], capture_output=True)  

# sklearn_train.py handles model fitting
print(result.stdout) # Print model accuracy

By connecting inputs/outputs between processes, you can create complex workflows for data science, ML engineering, ETL, and more!

10. Meta-Programming Magic ✨

Finally, the cherry on top… who said other processes have to be lower level?

import subprocess 

# Run another Python script as a subprocess
result = subprocess.run([‘python3‘, ‘other_script.py‘], capture_output=True)  

# Imports, constants, functions are all re-initialized 
print(result.stdout)

This allows you to programmatically import Python modules and execute code dynamically within a subprocess.

Talk about about subprocesses opening up infinite possibilities! ✨

Now over to you: see what other creative use cases you can come up with!

Best Practices when Working with Subprocesses

While subprocess workflows unlock immense power, they also introduce complexity.

Let‘s review some best practices to use them safely, efficiently, and correctly:

🔐 Validate user inputs – Use regexes or whitelisting to prevent command injection attacks

🦺 Minimize scripts running as root – Executive scripts as limited users whenever possible

🛡️ Enable security hardening features like SELinux, AppArmor, seccomp filters etc

Avoid shell=True as it enables arbitrary shell commands

🛠 Prefer passing argument list instead of entire command string

👪 Manage worker pool size for subprocess parallelism rather than go crazy

💤 Asynchronously wait with Popen() and timeouts to prevent hanging waits

📊 Capture outputs for further processing when possible

📚 Wrap logic in functions/classes so subprocess code can be reused

🐞 Handle errors gracefully with try/catch blocks and exit codes

By following these tips and architecting your applications intentionally, you‘ll avoid headaches down the road!

Alright my friend, you made it to the finish line! 🏁

Let‘s recap all we learned…

Key Takeaways: Mastering Python Subprocesses

  • Subprocesses enable powerful program concurrency by delegating tasks to the OS
  • Python‘s built-in subprocess module makes subprocess management easy
  • Functions like run(), Popen(), check_output() etc handle the heavy lifting
  • Subprocesses shine for I/O flows, automation scripts, microservices, distributed computing and more!
  • Follow best practices around security, permissions, error handling and code style

I hope these 10 examples gave you a solid idea of what‘s possible. But these truly are just scratching the surface!

The best way forward is to build real projects with subprocesses. Test the waters, run some experiments, break things, then put them back together again.

Programming is a journey of discovery rather than destination. And subprocesses are the vehicles that‘ll take your Python skills places unimagined!

Build on, my friend!

Dan