Simplifying Shared File Storage for your AWS Environment with Elastic File System

Dealing with user file shares, content repositories, and config files that need to be accessed across multiple application servers can be a chore.

Traditionally, teams would have to build out complex networked storage with NFS and NAS devices to enable these shared storage workloads.

The AWS Elastic File System (EFS) service aims to massively simplify setting up fully managed scalable file storage for use with cloud resources like EC2 instances.

In this detailed guide, you‘ll gain expert insights on how EFS works under the hood and all the steps needed to get shared file storage up and running for your fleet of Linux application servers.

Overview of Elastic File System Service

At a high level, EFS provides a serverless network file system that can be simultaneously mounted to thousands of EC2 instances in order to share files and data.

Some key benefits over building your own NFS include:

Fully Managed Service

EFS abstracts away all the file system management, capacity planning, scaling, patching, redundnacy and availability concerns behind a simple to use service.

High Availability

Data in EFS is stored across multiple availability zones automatically to ensure high durability. The service guarantees 99.999999999% durability.

Fully Scalable

You can grow your EFS filesystems to petabyte scale automatically without any disruption. Performance and throughput capabilities scale seamlessly as you add more data.

Cost Effective

You only pay for what you use with EFS thanks to its elastic usage model. No need to provision excess storage upfront. Saves over 62% costs over self-managed NAS.

Encrypted

EFS enables encryption at rest and encryption of data in transit for added security, meeting compliance requirements.

Use Case Examples

EFS works great for:

  • Serving shared storage across container clusters
  • Web server content repositories accessed by auto-scaling group
  • ML model stores needed by distributed training jobs
  • Shared configuration files, logs, data lakes

Below we walk through the full process of creating your EFS filesystems and connecting them to EC2 Linux instances as mount points.

Prerequisites

Before getting started with EFS, good to have:

  • AWS account
  • EC2 instances deployed
  • VPC networks and Security Groups built out
  • IAM permissions to create EFS & EC2 resources

Now let‘s dive in and walk through the step-by-step process…

Step 1 – Creating your EFS File Systems

First, login to the AWS console and navigate to the EFS section. Click "Create File System" to begin.

Choosing Performance Mode

Choose from:

  • General Purpose – best for most use cases. Designed for latency-sensitive workloads up to 10K IOPS per TB.
  • Max I/O – highest latency, throughput, and IOPS. Good for highly parallel workloads >2000 IOPS per TB.

Throughput Modes

EFS offers two throughput modes:

Bursting

  • Set throughput scales dynamically based on size of filesystem
  • 1TB = 50 MiB/s + burst to 100MiB/s
  • Simple, no monitoring or provisioning needed

Provisioned

  • Choose a minimum and maximum MiB/s rate
  • More predictable performance tuning
  • Set alarms to adjust capacity automatically

Unless you have defined throughput requirements, bursting is the easiest to work with.

Encryption

You can opt to encrypt the contents of EFS using AES-256 encryption. This provides encryption at rest for added security.

Configuring Network Access

Select the VPC, subnets, security groups etc that will grant your EC2 servers access to the EFS mount targets.

Once configured, click "Create File System" and your EFS will be ready within minutes.

Now that your empty EFS filesystem is up and running, we need to connect it to EC2…

Step 2 – Install NFSv4 Client

SSH into your EC2 Linux instances that need access to the shared EFS storage.

Use your distro package manager to install the NFS client tools:

# Debian/Ubuntu
sudo apt install nfs-common 

# RHEL/CentOS  
sudo yum install nfs-utils

Alternatively, you can compile from source as well.

Step 3 – Create Local Mount Point

sudo mkdir /efs-mount-point

This will create the local directory our EFS mount will be mapped to.

Step 4 – Mounting EFS File System

Grab the mount command string from the EFS console info panel. It will look similar to:

sudo mount -t efs -o tls fs-12345a67:/ /efs-mount-point

Make sure to replace the FS ID with your actual filesystem ID shown in the console.

Step 5 – Validate Access

Check that EFS mounted properly:

df -h

You should see the EFS filesystem size and mount location.

Try writing a test file to check read/write:

echo "EFS mount working!" >> test.txt
cat test.txt

Hooray! Our EC2 instance can now access the network storage from EFS.

Do the same steps on any other instances that need to share this storage volume.

Next we‘ll explore some best practices…

Auto-Scaling Capacity

A huge benefit of EFS is the ability to scale storage capacity on demand without any disruption to applications using the mount.

Some metrics worth monitoring in CloudWatch include:

PercentIOLimit

Tracks how close your filesystem is to maxing out provisioned throughput capacity. May indicate a need to scale up.

BurstCreditBalance

For systems using bursting throughput mode, this tracks credits available for burst capacity, again indicating a need to scale up if too low.

Storage capacity and throughput will automatically grow dynamically to meet application demands without taking any filesystems offline.

Data Lifecycle Management

As data grows within your EFS filesystem, you may wish to optimize costs by tiering storage classes based on access patterns.

EFS offers:

  • EFS Standard – The default storage class, for frequently accessed data
  • EFS Infrequent Access (EFS IA) – A lower cost storage tier for files not accessed often

Using EFS Lifecycle Management Policies you can automatically transition files to IA if not accessed after a set period of time, saving up to 85% in costs.

Some use cases for EFS IA:

  • Old log files after 30 days
  • Project workspaces not accessed after 90 days
  • Infrequent batch job directories

Performance Optimization

While EFS scales elastically, you can tune performance using provisioned throughput mode for production workloads.

It allows setting a minimum and maximum MiB/s guaranteeing capacity even under heavy IO.

Make sure to use PIOPS EBS volumes on EC2 instances to match EFS throughput capabilities.

Good monitoring metrics like PermittedThroughput and BurstCreditBalance can indicate when to scale up throughput limits.

Security Best Practices

Since EFS contains shared data accessed by multiple EC2 instances, it‘s good to implement some basic security hygiene.

  1. Enable encryption at rest using AES-256.

  2. Ensure only required VPC subnets and security groups grant access to EFS from EC2.

  3. Utilize IAM roles and policies to enforce least privilege permissions for EC2 instances.

For even more isolation, you can create VPC endpoints for EFS to entirely remove public internet access.

Now let‘s look at some more advanced ways to utilize EFS…

Advanced Integration Patterns

While this guide focused on basic EFS usage with EC2, you can leverage EFS in powerful ways across many AWS services:

Shared Storage for ECS/EKS Containers

EFS mounts can provide persistent shared storage volumes to ECS and EKS docker containers.

Serving Data Lakes

Applications like Spark, Hadoop, Presto, etc can leverage EFS instead of standalone HDFS.

Shared Training Data

Machine learning training jobs from SageMaker or batch processing can access common datasets on EFS.

Lambda Function Storage

EFS filesystems can provide shared scratch space for Lambda functions across VPC.

Cross-Region Replication

Sync your EFS volumes across regions for disaster recovery using EFS-to-EFS replication.

Cost Optimization

With the auto-scaling capabilities of EFS, it‘s easy to end up with "runaway" costs consuming way more throughput than required.

Here are tips to keep costs in check:

  • Analyze CloudWatch metrics over time to right-size your performance mode limits up or down. No reason to massively overprovision throughput without analyzing usage patterns.

  • As data ages, leverage Lifecycle Management to transition to Infrequent Access storage class for cost savings.

  • For burst usage occasional spikes, provisioned mode may cost more than paying burst fees.

Perform cost modeling with your typical usage numbers and storage capacity needs.

Troubleshooting Issues

If running into problems mounting or slow access times, some issues to check:

Mount Failures

  • Validate security group rules allow NFS traffic on port 2049 between EFS and EC2.
  • Check subnet access also lines up.
  • Recheck VPC mount target matches filesystem ID.

Slow File Access

  • Instance families like T3 with burstable CPU can max their credits causing slow EFS access. Upgrade to fixed performance offerings.

Network Dropout

  • Is VPN connection unstable causing NFS protocol errors?
  • Ensure mount helper resiliency options are enabled.

That completes our deep dive on everything you need to know to leverage EFS shared file storage effectively!