How to Learn Java Stream API [+5 Resources]

The Stream API was introduced in Java 8 to simplify data processing on collections. A stream represents a sequence of elements that can be processed in a declarative way without mutating the original data source.

In this comprehensive guide, we‘ll cover everything you need to master streams in Java.

What is a Stream in Java?

A stream in Java is a sequence of elements that supports sequential and parallel aggregate operations. Common sources for streams include:

  • Collections like lists, sets, maps
  • Arrays
  • Static factory methods
  • Files
  • Random number generation

Some key features of streams:

  • Declarative: Specify what transformations to apply without managing iteration
  • Lazy: Operations are only evaluated when a terminal operation is invoked
  • Parallel-capable: Stream pipelines can leverage multi-core architectures without extra effort
  • Reusable: Same stream pipeline can work with different data sources
  • Stateless: Most stream ops don‘t maintain state between elements

Compared to collections that store data in-memory, streams simply refer to the source data and apply ops as needed.

Working with Java Streams

There are a few common ways to obtain a stream instance in Java:

From Collections

All Collection types like Lists and Sets have a stream() method to convert to a stream:

List<Integer> list = List.of(1, 2, 3);
Stream<Integer> stream = list.stream(); 

From Arrays

Use Arrays.stream() static factory method:

String[] arr = {"a", "b", "c"};
Stream<String> stream = Arrays.stream(arr);

Can also specify a range of array indexes.

Empty Streams

The static Stream.empty() method returns a stream with no elements:

Stream<String> empty = Stream.empty();

Helpful default value to avoid NullPointerExceptions.

Stream Operations

Once we have a stream, there are two kinds of operations:

Intermediate Operations

Intermediate operations return a new stream that applies a transformation. Common examples:

  • filter – Returns stream of elements that match predicate
  • map – Apply function to each element
  • flatMap – Flatten nested streams

These are always lazy – nothing happens until the stream is consumed.

Terminal Operations

Terminal ops consume the stream and produce a final result. Examples:

  • forEach – Perform action on each element
  • count – Count number of elements
  • collect – Aggregate elements to a collection
  • reduce – Combine elements using a reducer function

After a terminal operation is invoked, the stream is considered consumed and no longer usable.

Here is a simple pipeline example:

List<Integer> numbers = List.of(1, 2, 3, 4);

numbers.stream()
       .filter(n -> n % 2 == 0)
       .map(n -> n * 2)  
       .forEach(System.out::println); // 4 8

The order of stream operations matter – intermediate operations are evaluated lazily based on the terminal operation.

Stream Pipeline

A stream pipeline consists of a data source, zero or more intermediate operations, and a terminal operation:

Source -> Intermediate Op -> Intermediate Op -> ... -> Terminal Op

The key things to know about the stream execution model:

  • Operations run sequentially in the defined order
  • Each intermediate operation returns a new stream
  • Evaluation is "lazy" – elements traverse only when terminal op is invoked
  • Once consumed, a stream cannot be reused

This enables very efficient pipelines that only processes elements needed for the output.

Certain ops like sorted and distinct may require buffering all elements before emitting a result. But most intermediate ops like filter, map, limit can work element-by-element.

Enabling Parallel Streams

By default streams operate sequentially. We can leverage multi-core architectures by calling parallelStream():

list.parallelStream()
    .forEach(System.out::println); 

The stream API handles partitioning elements across threads. Generally parallel streams can offer better throughput for large datasets and CPU-bound tasks, provided there are limited shared state mutations.

But for simpler or IO-bound ops, parallel streams might add overhead without much gain. Always test for performance + correctness with parallel streams.

Stream Use Cases

Here are some common use cases where Java streams shine:

Filtering and Transforming Data

Streams make it easy to describe data filtering pipelines without mutating the underlying data source.

Numeric Ranges

Generating streams of numbers, dates etc. instead of collections:

Stream.iterate(1, n -> n + 1) 
      .limit(10)
      .forEach(System.out::print); // 12345678910

File Processing

Process lines of large files efficiently without loading everything into memory:

Files.lines(path) 
     .filter(line -> line.contains("foo"))
     .forEach(System.out::println); 

Concurrency

Parallel bulk data processing and computing:

int sum = numbers.parallelStream()
                 .mapToInt(n -> n * 2)
                 .sum();

And many more like aggregation, reporting, branching etc!

Stream Limitations

While streams enable declarative data processing, there are some limitations:

  • Single-use – A stream can only be consumed once. We have to obtain a new stream to rerun the pipeline.
  • Stateful ops – Some stateful intermediate ops may produce unintended results in parallel.
  • Long-running – Streams are best for short-lived batch ops, not long-running services.

So streams might not be efficient for transactional systems or continuous data monitoring.

Best Practices

Here are some key best practices when working with Java streams:

  • Prefer immutable data sources like unmodifiable collections
  • Avoid stateful intermediate operations like sorted and distinct
  • Declare ops that mutate state carefully
  • Close streams that process IO resources
  • Limit parallel stream usage for simple or IO-bound operations

Following these will help avoid unexpected behavior and efficiency bottlenecks.

Learning Resources

To take your Java stream skills to the next level, here are some great references:

Java 8 in Action

My favorite book that covers all aspects of streams for Java 8 including operations, concurrency, best practices and more.

{{< amz B00MSWG7VE >}}

Java 8 Lambdas by Richard Warburton

Great focus on lambda expressions and functional programming using practical Java examples.

{{< amz 1449370772 >}}

Java SE 8 for the Really Impatient

Concise book for experienced Java devs to get an overview of all Java 8 additions including streams.

{{< amz 0321927761 >}}

Learn Java Functional Programming

Very hands-on Udemy course full of exercises on streams and lambda expressions.

Java Class Library on Coursera

For a deep dive into Java generics and class libraries to write better type-safe code.

Conclusion

The stream API introduces an extremely useful declarative model for working with data in Java. It enables simpler code that can efficiently leverage parallelism for large datasets.

However streams also take some experience to use correctly. The resources provided above will help accelerate your learning – especially examples of common data processing patterns.

With some practice, streams can simplify previously complex tasks and remove many error-prone loops from your code! Let me know if you have any other questions.