The Stream API was introduced in Java 8 to simplify data processing on collections. A stream represents a sequence of elements that can be processed in a declarative way without mutating the original data source.
In this comprehensive guide, we‘ll cover everything you need to master streams in Java.
What is a Stream in Java?
A stream in Java is a sequence of elements that supports sequential and parallel aggregate operations. Common sources for streams include:
- Collections like lists, sets, maps
- Arrays
- Static factory methods
- Files
- Random number generation
Some key features of streams:
- Declarative: Specify what transformations to apply without managing iteration
- Lazy: Operations are only evaluated when a terminal operation is invoked
- Parallel-capable: Stream pipelines can leverage multi-core architectures without extra effort
- Reusable: Same stream pipeline can work with different data sources
- Stateless: Most stream ops don‘t maintain state between elements
Compared to collections that store data in-memory, streams simply refer to the source data and apply ops as needed.
Working with Java Streams
There are a few common ways to obtain a stream instance in Java:
From Collections
All Collection types like Lists and Sets have a stream()
method to convert to a stream:
List<Integer> list = List.of(1, 2, 3);
Stream<Integer> stream = list.stream();
From Arrays
Use Arrays.stream()
static factory method:
String[] arr = {"a", "b", "c"};
Stream<String> stream = Arrays.stream(arr);
Can also specify a range of array indexes.
Empty Streams
The static Stream.empty()
method returns a stream with no elements:
Stream<String> empty = Stream.empty();
Helpful default value to avoid NullPointerExceptions.
Stream Operations
Once we have a stream, there are two kinds of operations:
Intermediate Operations
Intermediate operations return a new stream that applies a transformation. Common examples:
filter
– Returns stream of elements that match predicatemap
– Apply function to each elementflatMap
– Flatten nested streams
These are always lazy – nothing happens until the stream is consumed.
Terminal Operations
Terminal ops consume the stream and produce a final result. Examples:
forEach
– Perform action on each elementcount
– Count number of elementscollect
– Aggregate elements to a collectionreduce
– Combine elements using a reducer function
After a terminal operation is invoked, the stream is considered consumed and no longer usable.
Here is a simple pipeline example:
List<Integer> numbers = List.of(1, 2, 3, 4);
numbers.stream()
.filter(n -> n % 2 == 0)
.map(n -> n * 2)
.forEach(System.out::println); // 4 8
The order of stream operations matter – intermediate operations are evaluated lazily based on the terminal operation.
Stream Pipeline
A stream pipeline consists of a data source, zero or more intermediate operations, and a terminal operation:
Source -> Intermediate Op -> Intermediate Op -> ... -> Terminal Op
The key things to know about the stream execution model:
- Operations run sequentially in the defined order
- Each intermediate operation returns a new stream
- Evaluation is "lazy" – elements traverse only when terminal op is invoked
- Once consumed, a stream cannot be reused
This enables very efficient pipelines that only processes elements needed for the output.
Certain ops like sorted
and distinct
may require buffering all elements before emitting a result. But most intermediate ops like filter
, map
, limit
can work element-by-element.
Enabling Parallel Streams
By default streams operate sequentially. We can leverage multi-core architectures by calling parallelStream()
:
list.parallelStream()
.forEach(System.out::println);
The stream API handles partitioning elements across threads. Generally parallel streams can offer better throughput for large datasets and CPU-bound tasks, provided there are limited shared state mutations.
But for simpler or IO-bound ops, parallel streams might add overhead without much gain. Always test for performance + correctness with parallel streams.
Stream Use Cases
Here are some common use cases where Java streams shine:
Filtering and Transforming Data
Streams make it easy to describe data filtering pipelines without mutating the underlying data source.
Numeric Ranges
Generating streams of numbers, dates etc. instead of collections:
Stream.iterate(1, n -> n + 1)
.limit(10)
.forEach(System.out::print); // 12345678910
File Processing
Process lines of large files efficiently without loading everything into memory:
Files.lines(path)
.filter(line -> line.contains("foo"))
.forEach(System.out::println);
Concurrency
Parallel bulk data processing and computing:
int sum = numbers.parallelStream()
.mapToInt(n -> n * 2)
.sum();
And many more like aggregation, reporting, branching etc!
Stream Limitations
While streams enable declarative data processing, there are some limitations:
- Single-use – A stream can only be consumed once. We have to obtain a new stream to rerun the pipeline.
- Stateful ops – Some stateful intermediate ops may produce unintended results in parallel.
- Long-running – Streams are best for short-lived batch ops, not long-running services.
So streams might not be efficient for transactional systems or continuous data monitoring.
Best Practices
Here are some key best practices when working with Java streams:
- Prefer immutable data sources like unmodifiable collections
- Avoid stateful intermediate operations like
sorted
anddistinct
- Declare ops that mutate state carefully
- Close streams that process IO resources
- Limit parallel stream usage for simple or IO-bound operations
Following these will help avoid unexpected behavior and efficiency bottlenecks.
Learning Resources
To take your Java stream skills to the next level, here are some great references:
Java 8 in Action
My favorite book that covers all aspects of streams for Java 8 including operations, concurrency, best practices and more.
{{< amz B00MSWG7VE >}}
Java 8 Lambdas by Richard Warburton
Great focus on lambda expressions and functional programming using practical Java examples.
{{< amz 1449370772 >}}
Java SE 8 for the Really Impatient
Concise book for experienced Java devs to get an overview of all Java 8 additions including streams.
{{< amz 0321927761 >}}
Learn Java Functional Programming
Very hands-on Udemy course full of exercises on streams and lambda expressions.
Java Class Library on Coursera
For a deep dive into Java generics and class libraries to write better type-safe code.
Conclusion
The stream API introduces an extremely useful declarative model for working with data in Java. It enables simpler code that can efficiently leverage parallelism for large datasets.
However streams also take some experience to use correctly. The resources provided above will help accelerate your learning – especially examples of common data processing patterns.
With some practice, streams can simplify previously complex tasks and remove many error-prone loops from your code! Let me know if you have any other questions.