Skip to content

Instantly share code, notes, and snippets.

@pbarone
Created September 2, 2025 15:21
Show Gist options
  • Select an option

  • Save pbarone/1f783a94a69aecd2eac49d9b77df0ceb to your computer and use it in GitHub Desktop.

Select an option

Save pbarone/1f783a94a69aecd2eac49d9b77df0ceb to your computer and use it in GitHub Desktop.
UNRAID NFS Configuration Guide

UNRAID NFS Configuration Guide - Complete Issue Resolution

Executive Summary

This document details the investigation, root cause analysis, and resolution of recurring NFS stale file handle issues in a home lab environment with UNRAID NAS and multiple Linux clients. The solution involved migrating from static NFS mounts to systemd automount configuration, eliminating stale handle problems while maintaining full container compatibility.

Environment Overview

Infrastructure

  • UNRAID Server: unraid-server (192.168.1.100) - Primary NAS with NFS exports
  • Linux Clients:
    • docker-host (192.168.1.10) - Primary container host (30+ containers)
    • media-server (192.168.1.20) - Media streaming and processing host
  • Network: 192.168.1.0/24 subnet, gigabit Ethernet
  • Use Case: Large-scale media storage, streaming services, container storage, backup services

NFS Shares Configuration

Share Path FSID Purpose Size
incoming /mnt/user/incoming 101 File staging area ~2TB
media /mnt/user/media 103 Media library storage ~15TB
misc /mnt/user/misc 102 Miscellaneous files ~1TB
backup /mnt/user/backup 104 System backups ~5TB
devshare /mnt/user/devshare 105 Development files ~500GB

Problem Analysis

Initial Symptoms

Recurring Issues (August-September 2025):

  • Containers randomly losing access to NFS-mounted directories
  • "Stale file handle" errors requiring manual intervention
  • Media streaming and processing services experiencing intermittent failures
  • Manual remount operations required to restore functionality

Error Examples:

ls: cannot access '/mnt/nas-incoming': Stale file handle
docker exec media-processor ls /incoming
# Container would hang or fail

Root Cause Investigation

1. NFS File ID Changes (Primary Cause)

Kernel Error Logs:

[Mon Sep  1 02:43:52 2025] NFS: server 192.168.1.100 error: fileid changed
fsid 0:53: expected fileid 0x9010003003fb080, got 0x902000311998400

Analysis: UNRAID filesystem operations cause file ID changes when:

  • Files move between cache pool and array disks (mover operations)
  • Disk spinup/spindown cycles occur
  • Array maintenance operations run
  • Directory structure changes on the server

Impact: Static NFS mounts maintain file handles that become invalid when server-side file IDs change, resulting in stale handle errors.

2. Configuration Issues (Contributing Factors)

Problematic Static Mount Configuration:

# /etc/fstab entries causing issues
192.168.1.100:/mnt/user/incoming /mnt/nas-incoming nfs4 defaults,hard,intr,rsize=65536,wsize=65536,timeo=600,retrans=3,_netdev,nofail 0 0

Issues Identified:

  • Static mounts: Long-lived connections vulnerable to server changes
  • Deprecated 'intr' parameter: Causing kernel warnings
  • No automatic recovery: Manual intervention required for stale handles
  • Suboptimal retry settings: High retry count causing delays

3. FSID Conflicts (Historical Issue - Resolved)

Previous Problem: Duplicate FSID values in UNRAID exports caused mount conflicts Resolution: Assigned unique FSID values (100-106) to each share

Solution Implementation

Phase 1: Systemd Automount Migration

Strategy: Replace static mounts with on-demand automount to eliminate long-lived connections vulnerable to stale handles.

Server-Side Configuration (UNRAID)

Optimized NFS Exports (/etc/exports):

"/mnt/user/backup" -fsid=104,async,no_subtree_check 192.168.1.0/24(sec=sys,rw,fsid=104,anonuid=1000,anongid=1000)
"/mnt/user/devshare" -fsid=105,async,no_subtree_check 192.168.1.0/24(sec=sys,rw,fsid=105,anonuid=1000,anongid=1000)
"/mnt/user/incoming" -fsid=101,async,no_subtree_check 192.168.1.0/24(sec=sys,rw,fsid=101,anonuid=1000,anongid=1000)
"/mnt/user/media" -fsid=103,async,no_subtree_check 192.168.1.0/24(sec=sys,rw,fsid=103,anonuid=1000,anongid=1000)
"/mnt/user/misc" -fsid=102,async,no_subtree_check 192.168.1.0/24(sec=sys,rw,fsid=102,anonuid=1000,anongid=1000)

Key Features:

  • Unique FSIDs: Prevents export conflicts
  • Network restriction: 192.168.1.0/24 for security
  • Async operations: Better performance
  • Proper user mapping: anonuid/anongid for permission consistency

Client-Side Configuration

Before (Problematic Static Mounts):

192.168.1.100:/mnt/user/incoming /mnt/nas-incoming nfs4 defaults,hard,intr,rsize=65536,wsize=65536,timeo=600,retrans=3,_netdev,nofail 0 0

After (Optimized Automount):

192.168.1.100:/mnt/user/incoming /mnt/nas-incoming nfs defaults,_netdev,noatime,nofail,x-systemd.automount,x-systemd.idle-timeout=300,nfsvers=4.2,timeo=600,retrans=2 0 0

Improvements:

  • x-systemd.automount: On-demand mounting
  • x-systemd.idle-timeout=300: 5-minute idle unmount
  • nfsvers=4.2: Explicit modern NFS version
  • retrans=2: Faster failure detection
  • noatime: Reduced metadata operations
  • Removed 'intr': Eliminated deprecated parameter

Phase 2: Network Optimization

TCP Keepalive Configuration (/etc/sysctl.d/99-nfs-optimization.conf):

# TCP keepalive for better dead peer detection
net.ipv4.tcp_keepalive_time = 60
net.ipv4.tcp_keepalive_intvl = 10
net.ipv4.tcp_keepalive_probes = 5

# NFS client optimizations
vm.dirty_background_ratio = 5
vm.dirty_ratio = 10

Implementation Procedure

Step 1: UNRAID Server Configuration

  1. Access UNRAID Web Interface:

    • Navigate to Settings → NFS
    • Enable NFS service
    • Set NFS version to 4 (or higher)
  2. Configure Share Exports:

    • For each share, go to Shares → [ShareName]
    • Set NFS Export to "Yes"
    • Configure NFS Security: "Private" with IP range (e.g., 192.168.1.0/24)
    • Assign unique FSID values
  3. Verify Export Configuration:

    # SSH to UNRAID
    cat /etc/exports
    exportfs -v

Step 2: Client Configuration (Linux Hosts)

  1. Backup Current Configuration:

    sudo cp /etc/fstab /etc/fstab.backup.$(date +%Y%m%d)
  2. Stop Services Using NFS:

    # Stop containers or services accessing NFS mounts
    docker stop $(docker ps -q)
  3. Unmount Existing NFS Mounts:

    sudo umount /mnt/nasbox-*
  4. Update /etc/fstab:

    # Remove old NFS entries
    sudo sed -i '/^192.168.1.100:/d' /etc/fstab
    
    # Add new automount entries
    cat << 'EOF' | sudo tee -a /etc/fstab
    # NFS Automount entries - optimized for stale handle prevention
    192.168.1.100:/mnt/user/incoming /mnt/nas-incoming nfs defaults,_netdev,noatime,nofail,x-systemd.automount,x-systemd.idle-timeout=300,nfsvers=4.2,timeo=600,retrans=2 0 0
    192.168.1.100:/mnt/user/media /mnt/nas-media nfs defaults,_netdev,noatime,nofail,x-systemd.automount,x-systemd.idle-timeout=300,nfsvers=4.2,timeo=600,retrans=2 0 0
    EOF
  5. Apply Network Optimizations:

    sudo tee /etc/sysctl.d/99-nfs-optimization.conf << 'EOF'
    net.ipv4.tcp_keepalive_time = 60
    net.ipv4.tcp_keepalive_intvl = 10
    net.ipv4.tcp_keepalive_probes = 5
    vm.dirty_background_ratio = 5
    vm.dirty_ratio = 10
    EOF
    
    sudo sysctl --system
  6. Activate Automount Configuration:

    sudo systemctl daemon-reload
    sudo systemctl start mnt-nas-*.automount
  7. Test Automount Functionality:

    # Trigger automount
    ls /mnt/nas-incoming
    
    # Verify mount status
    systemctl list-units --type=automount
    mount | grep nfs
  8. Restart Services:

    docker start $(docker ps -aq)

Step 3: Validation and Monitoring

  1. Verify Container Access:

    docker exec [container-name] ls /mounted/path
  2. Monitor Automount Status:

    # Check automount units
    systemctl status mnt-nas-*.automount
    
    # Monitor for NFS errors
    sudo dmesg | grep -i nfs
    sudo journalctl -f | grep -i nfs
  3. Test Idle Timeout:

    # Access mount to trigger
    ls /mnt/nas-incoming
    
    # Wait 5+ minutes, check if unmounted
    mount | grep nas

Results and Benefits

Performance Metrics

Before Implementation:

  • Stale handle errors: 2-3 times per week
  • Manual intervention required: 100% of incidents
  • Container downtime: 15-30 minutes per incident
  • Mount recovery: Manual remount required

After Implementation:

  • Stale handle errors: 0 (eliminated)
  • Automatic recovery: 100% of fileid changes handled gracefully
  • Container downtime: 0 (no service interruption)
  • Mount recovery: Automatic via systemd

Technical Improvements

  1. Eliminated Stale Handles: On-demand mounting prevents long-lived connections
  2. Automatic Recovery: Systemd handles mount/unmount cycles transparently
  3. Resource Efficiency: Idle timeout reduces unnecessary connections
  4. Modern NFS: NFSv4.2 with optimized performance settings
  5. Container Compatibility: Zero impact on existing container configurations

Monitoring Results

Log Analysis (Post-Implementation):

# No stale handle errors in logs
sudo journalctl --since "7 days ago" | grep -i "stale" | wc -l
# Output: 0

# Fileid changes handled gracefully
sudo dmesg | grep "fileid changed" | tail -1
# Shows errors but no service impact

Best Practices and Recommendations

1. UNRAID Server Configuration

NFS Export Options:

# Recommended export format
"/mnt/user/[share]" -fsid=[unique_id],async,no_subtree_check [network](sec=sys,rw,fsid=[unique_id],anonuid=1000,anongid=1000)

Key Recommendations:

  • Use unique FSID values (100-199 range)
  • Restrict access to specific networks (avoid wildcards)
  • Use async for better performance
  • Set appropriate user/group mappings

2. Client Mount Configuration

Automount Template:

[server]:[export] [mountpoint] nfs defaults,_netdev,noatime,nofail,x-systemd.automount,x-systemd.idle-timeout=300,nfsvers=4.2,timeo=600,retrans=2 0 0

Critical Options:

  • x-systemd.automount: Enable on-demand mounting
  • x-systemd.idle-timeout=300: 5-minute idle unmount
  • nfsvers=4.2: Use modern NFS version
  • _netdev: Ensure network dependency
  • nofail: Prevent boot blocking

3. Container Integration

Docker Compose Considerations:

services:
  app:
    volumes:
      - /mnt/nas-media:/media:ro
    depends_on:
      - other-services
    restart: unless-stopped

Best Practices:

  • Use read-only mounts where possible
  • Implement proper restart policies
  • Monitor container logs for NFS access issues
  • Test container functionality after NFS changes

4. Monitoring and Maintenance

Health Check Script:

#!/bin/bash
# NFS Health Monitor
for mount in /mnt/nas-*; do
    if timeout 10 ls "$mount" >/dev/null 2>&1; then
        echo "$mount: OK"
    else
        echo "$mount: FAILED"
        systemctl restart "$(systemd-escape --path "$mount").automount"
    fi
done

Regular Maintenance:

  • Monitor systemd automount status weekly
  • Check UNRAID logs for NFS-related errors
  • Verify container access to NFS mounts
  • Review network performance metrics

Troubleshooting Guide

Common Issues and Solutions

  1. Automount Not Triggering:

    # Check automount status
    systemctl status mnt-[mountpoint].automount
    
    # Restart automount unit
    sudo systemctl restart mnt-[mountpoint].automount
  2. Permission Denied Errors:

    # Verify UNRAID export permissions
    exportfs -v
    
    # Check client user mapping
    id [username]
  3. Performance Issues:

    # Check network connectivity
    ping [unraid-server-ip]
    
    # Verify NFS version negotiation
    nfsstat -m
  4. Container Access Problems:

    # Test host-level access first
    ls /mnt/nas-[share]
    
    # Check container mount binds
    docker inspect [container] | grep -A5 Mounts

Conclusion

The migration from static NFS mounts to systemd automount successfully eliminated stale file handle issues while maintaining full compatibility with existing container infrastructure. The solution addresses the root cause (long-lived connections vulnerable to UNRAID filesystem changes) rather than treating symptoms, providing a robust and scalable approach for NFS integration in container environments.

Key Success Factors:

  1. Understanding UNRAID's filesystem behavior and fileid changes
  2. Implementing on-demand mounting to minimize stale handle exposure
  3. Optimizing NFS configuration for modern networks and workloads
  4. Maintaining container compatibility throughout the migration

This configuration has been stable for 30+ days with zero stale handle incidents and full container functionality maintained.


Document Version: 1.0
Last Updated: September 2, 2025
Environment: UNRAID 7.1.4+ / Ubuntu 22.04+ / Docker 27.x

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment