Petabyte Scale Data Growth is Accelerating - Why You Need On-Premises Object Storage

Hey there – feeling overwhelmed by the nonstop data deluge? As connected technologies proliferate across industries, unstructured data is expanding at mind-blowing rates – 73% CAGR projected through 2025 according to IDC. We‘re talking exabytes and zettabytes approaching faster than infrastructure budgets can keep pace!

Content Navigation show

Traditional file and block storage like SANs and NAS can‘t cut it for these extreme workloads driven by digital media, genomic sequencing data, satellites, sensors and machine logs. The future calls for…object storage.

Object Storage to the Rescue

You‘re forgiven if this term is still foreign – object storage manages data very differently from legacy approaches:

A single storage pool serves limitless files and objects
Each object includes metadata for context + content retention rules
Scale capacity and throughput simply by adding low-cost nodes
Software defines features so hardware refreshing stays simple

The best part? The Amazon S3 API became a universal abstraction layer across vendors. So you can deploy affordable on-premises object stores but still reap public cloud benefits.

Let‘s unpack the 7 leading options for self-hosted S3 object storage that put you in the driver‘s seat. With certain platforms scaling to exabytes cost-effectively, these beasts can slay your biggest data center beasts!

Why On-Prem Object Storage Beats the Public Cloud

Before highlighting the products, what prompts companies to deploy their own object store foundation rather than rent S3 capacity from AWS, Google or Microsoft? A few motivations…

Data gravity + ingress costs: Majoranalytics and ML workloads crunch terabytes on-premises. Pouring all that data into AWS racks up massive public Internet tolls.

Data sovereignty: Regulations mandate financial services and healthcare data stay geographically contained. Government agencies also favor on-prem.

Latency: IoT apps speed up with local processing. Media workflows stream content faster with internal object backends.

Security: Complete control over physical security, encryption keys and access controls reduces risks.

Let‘s see how the leading purpose-built platforms stack up!

Object Storage Software Criteria Breakdown

To compare apples-to-apples, I analyzed vendors across several pivotal criteria:

Scalability: Billions of objects? Exabyte capacity? How far can it grow?
Performance: Speed counts too – are SSDs supported? How parallel is it?
Resiliency: Reed-Solomon coding? Node failures tolerance? Bucket versioning?
Interoperability: Dev ecosystem maturity? Hardware flexibility?
Ease of use: Can a non-PhD wield this without six months training?
Tiering: Hot/cold data shifting to balance cost/performance?
Security: Encryption scheme details? Immutable/WORM support?

OK, time for the heavyweights!

1. MinIO – Lightweight High Speed Object store

MinIO is an open source high performance S3 gateway for object storage workloads. With 214,000+ GitHub stars, it leads the pack in community adoption. And MinIO benchmarks make it arguably the fastest too…

Scalability

MinIO distributes objects intelligently across nodes in an erasure coded cluster. This storage pool starts at just 4 nodes yet scales past petabytes smoothly. Software expansion beats appliance limits.

Performance

AWS-compatible in name but outpaces S3 in speed by wide margins! MinIO hit 171 GB/s write and 183 GB/s read throughput on commodity servers with NVMe flash storage in a recent benchmark.

Resiliency

MinIO safeguards data integrity through bit rot protection and object versioning. Configurable erasure coding distributes parts across cluster drives and nodes. Rebuilds get automated upon failures.

Interoperability

It natively integrates with Kubernetes and mostly platform agnostic beyond Linux OS prerequisites. The surrounding tool ecosystem lags behind some competitors.

Ease of Use

A clean intuitive UI plus Kubernetes Operator package simplifies management. Open source means direct GitHub access for customizations too. Unrivaled for small ops teams.

Tiering

Some large deployments utilize MinIO alongside Ceph to tier older data at scale. For internal tiering, extended Azure Blob or AWS Glacier support comes through addons.

Security

MinIO Authorization framework ties IAM-like policies to users and service accounts. TLS encryption secures all communications. Bucket policies round out robust access controls.

For blazing throughputs served by a lightweight container-friendly object gateway, prioritize MinIO.

2. Ceph – Unified Block, File and Object

Originating from a doctoral thesis on distributed file systems at UC Santa Cruz, the Ceph project now powers some of the largest storage clusters on the planet – think CERN‘s LHC experiments!

Scalability

Ceph stands unmatched for scaling capacity across 100,000s OSD daemon containers. Some clusters grow towards exabyte scale on tens of petabytes currently.

Performance

Parallelism wins as Ceph dividies and conquers objects across nodes for heavy analytics like genomic sequencing data lakes. Throughputs routinely hit 10 GB+/sec and beyond.

Resiliency

Triple replication, erasure coding and CRUSH algorithms provide fault tolerance and high availability. Bucket policies tier data across media. Self-healing architecture auto-balances at global scale.

Interoperability

Ceph plays nice with Kubernetes plus every major hypervisor. Block and filesystem access broadens appeal beyond just S3 object use cases. Rich ecosystem via Inktank partner network.

Ease of Use

Don‘t kid yourself – Ceph flexibility rewards those who master distributed systems intricacies. But Ansible automation helps, and Red Hat interconnects their stack.

Tiering

Built-in data placement and load balancing features mix flash/HDD media optimally based on frequency of object access. Intelligent global data distribution.

Security

Authentication via Cephx or LDAP/Kerberos integration. Granular capabilities control privileges. NIST compliance helps highly regulated organizations.

If you seek storage consolidation under one mammoth software-defined platform or require native block/filesystem integration alongside object APIs, Ceph brings the rain.

3. Zenko – Geo-Distributed Data Control

Engineered by Scality utilizing the same RING object store protecting trillions of objects behind the scenes, Zenko goes beyond S3 API compatibility to enable orchestration of data location globally across on-prem and public cloud resources.

Scalability

Distributed architecture scales linearly to 100s of petabytes. Multi-region capabilities mean massive repositories can expand without boundaries.

Performance

Workloads stay performant through work scheduling automation that collocates computation near the data. Plus SSD pooling, parallel processing and WAN optimization.

Resiliency

The Zenko Insights dashboard provides admins visibility into data replication status across all integrated repositories – whether on-premises or sitting in Google Cloud buckets.

Interoperability

A vast partner network integrates VAULT cloud, WASABI hot cloud storage, and other tiers to geo-distribute massive datasets based on usage patterns.

Ease of Use

Dashboards analyze data gravity trends and migration advisor picks optimal repos. Lifecycle management automates tiering cold data to cold storage based on policies.

Tiering

Rules-based tiering places data in the most economical location matching retrieval patterns. Automated movement between memory, SSDs, HDDs and public cloud.

Security

Encryption, access controls, IAM policies and activity auditing logs follow data wherever it flows across endpoints. Integrates with Kubernetes secrets.

For orchestrating vast object storage across regions and cloud providers, Zenko delivers intelligent automation found nowhere else.

4. Riak S2 – Purpose Built for Billions of Objects

Now owned by the actively acquired Basho, Riak S2 provides industrial-grade object storage to hyperscale cloud applications with extreme data retention needs. We‘re talking petabytes of observational data from networks of instrumentation and sensors.

Scalability

Massive clusters holding 50+ petabytes run Riak in production across multiple datacenters. Arithmetic progressions of scale outgrowth tested.

Performance

Riak excels for apps needing sustained IO vs raw throughput. Parallel GETs/PUTs coordinate across nodes. SSD pooling responding to hot spots.

Resiliency

Cluster-wide self-healing capabilities built on Amazon Dynamo principles. Replication and peer synchronization make multi-datacenter configurations resilient.

Interoperability

Basho partner network offers preconfigured cloud infrastructure solutions. Integrates with Apache Spark. BYOH approach supports choice public/private.

Ease of Use

Expertise in distributed systems helps when harnessing extreme scale. Straightforward to bootstrap but advanced tuning requires skills.

Tiering

Bucket properties define storage classes across hot, warm and cold media types – SSDs, HDDs etc. Rules reactively balance cost, performance and data protection levels.

Security

Server or client-side encryption via AES-256 and SSL/TLS protect data and communications. Bucket lifecycle expiration improves compliance.

If your architecture calls for keeping countless objects perpetually accessible to distributed processing, Riak S2 offers battle-hardened blueprints proven at unmatched magnitudes.

5. Triton – Joyent‘s Software-Defined Object Store

A pioneer in container orchestration known for Node.js language stewardship, Joyent created the Triton Elastic Object Store as a secure, performant and S3-compatible storage backend for modern cloud native apps.

Scalability

While not yet benchmarked at petabyte magnitudes seen by Ceph or Riak, Triton scales incrementally with consistent throughput and latency by adding nodes.

Performance

Response times under 50 milliseconds for typical IO profiles. Triton also uniquely offers hardware accelerated erasure coding using GPUs or FPGAs for blazing rebuild rates.

Resiliency

Triton makes erasure coding a first class citizen ensuring bit rot protection, timely rebuilds upon disk failures and room for hot spares all while reducing capacity overhead.

Interoperability

Docker registry and Kubernetes (including CSI driver) exemplify Triton‘s cloud native DNA. Integrates with popular DevOps tools like Terraform, Ansible, Grafana, Prometheus, and Logstash.

Ease of Use

Admins utilize CLI tools for snapshot management, bucket policies and access controls. Triton Cloud Analytics provides usage insights and statistics.

Tiering

Automatic replication across data centers gives admins control over data gravity. Storage node allocation optimized based on workload profiles.

Security

Encrypts data in flight and at rest. Integrates with enterprise identity sources including LDAP. Supports PCI and HIPAA workloads.

If your priorities center around air tight data integrity guarantees plus feeding distributed cloud native applications, Triton warrants your consideration.

6. LeoFS – Lighting Fast Object Storage

Originating from Japanese web scale companies building CDNs, LeoFS takes a minimalist approach while reaching unmatched throughput benchmarks. Designed for flash storage and GPU/FPGA rich nodes, its lightweight architecture simplifies cluster management enormously.

Scalability

LeoFS divides objects into small blocks then fragments across drives for parallel IO. Scaling past 1000 nodes proved at customers. Roadmaps take it higher!

Performance

By optimizing memory/CPU efficiency, LeoFS delivered an astounding 457 GB/sec aggregate throughput with just 84 servers – putting it in a class of its own!

Resiliency

Extremely short rebuilding times given tiny fragmentation plus intelligent replica layout. 99.9999999% durability by design.

Interoperability

REST and Erlang client APIs promote integration across apps and tools written in popular languages. BYOH approach supports choice infrastructure.

Ease of Use

Admins appreciate the simplicity. Configuration uses easy to understand INI-like syntax. Dev friendly!

Tiering

Ring partitioning groups hot/cold object clusters based on access frequency analysis. Automated rebalancing.

Security

API security integrates with AWS Signature Version 4. Supports tenant-based access controls and encrypted communications.

Need for speed at web scale? LeoFS astonishing numbers reveal a lightweight contender punching far above its weight class while simplifying cluster operations.

7. Cloudian HyperStore – Turnkey Enterprise Scale

And finally, What about prepackaged solutions taking hassles out of hardware planning? Purpose-built for S3 object workloads up to multi-petabyte levels, Cloudian offers turnkey appliances with all-inclusive licensing.

Scalability

With over 50 exabyte-sized installations, Cloudian specializes in limitlessly scaling capacity without downtime by consolidating silos spread across disk and tape onto one platform.

Performance

Leveraging patented caching optimizes response times. Adaptive Workload Compression reduces overhead. Plans scale performance in tandem by activating SSD disks.

Resiliency

Data durability and availability increases through erasure coding schemes tailored based on required levels. Rebuilds complete quickly without admin involvement.

Interoperability

In addition to S3 support, HyperStore uniquely offers NFS/SMB protocols to natively integrate with legacy apps. Appliance model eases procurement.

Ease of Use

Cloudian engineers pre-install, optimize and support the platform end-to-end so your team focuses on data – not infrastructure plumbing.

Tiering

In-place conversions shift data between replication types. Policies automate transitions from performant SSDs down to colder HDD over time.

Security

Starts safe with AES-256 encryption. Extend protections using WORM data retention and blockchain techniques proving immutable changes didn‘t occur.

Seeking a proven object store without the hardware planning headaches? Let Cloudian HyperStore extra-large scale blueprints guide you into exabyte territory!

Recommendations – Finding the Right Fit

I evaluated these self-hosted object store systems across criteria like scale, security and ecosystem maturity so you can zone in on ideal candidates matching budget and skills. A few closing thoughts:

If lightning speed trumps all for high performance apps, shortlist LeoFS and MinIO.
If global namespace control across regions and tiers ranks highest, consider Zenko.
If massive consolidation or legacy app integration calls, explore Cloudian HyperStore and Ceph.
For bulletproof geo-distributed architecture with extreme replication, Riak enters the chat.

Of course requirements differ across every organization and workload mix based on growth, compliance and data gravity priorities. I aimed to provide comprehensive analysis that arms you to pick the shortlist most deserving of PoCs. Storage challenges demand bespoke solutions!

Now torn between public cloud sticker shock and trepidation managing on-prem object scale dinosaurs? These modern platforms bridge the best of both worlds when configured consciously.

Hopefully demystifying self-hosted object storage capabilities as viable alternatives to AWS S3, Azure Blob and Google Cloud Storage proved useful guideposts – stay empowered as the data onslaught marches on! Questions welcome.

Petabyte Scale Data Growth is Accelerating – Why You Need On-Premises Object Storage

Object Storage to the Rescue

Why On-Prem Object Storage Beats the Public Cloud

Object Storage Software Criteria Breakdown

1. MinIO – Lightweight High Speed Object store

2. Ceph – Unified Block, File and Object

3. Zenko – Geo-Distributed Data Control

4. Riak S2 – Purpose Built for Billions of Objects

5. Triton – Joyent‘s Software-Defined Object Store

6. LeoFS – Lighting Fast Object Storage

7. Cloudian HyperStore – Turnkey Enterprise Scale

Recommendations – Finding the Right Fit

Related