Why do we call file systems a tree when they can have symbolic links
Last updated: April 1, 2026
Key Facts
- Unix's hierarchical file system was first formally described in the Unix Programmer's Manual, 1st Edition, published in November 1971 by Ken Thompson and Dennis Ritchie at Bell Labs.
- Symbolic links were introduced in BSD Unix version 4.2, released in 1983, approximately 12 years after the original Unix tree file system design was established.
- Linux limits symlink traversal to a maximum of 40 hops (defined as MAXSYMLINKS in the kernel source) before returning an ELOOP error to prevent infinite loop path resolution.
- Modern Ubuntu and Debian systems use UsrMerge, where /bin, /sbin, /lib, and /lib64 — 4 top-level root directories — are themselves symbolic links pointing into /usr, fully adopted in Ubuntu 20.04 (April 2020).
- Windows added full symbolic link support in Windows Vista (released January 2007) via the CreateSymbolicLink API; directory junction points had existed since Windows 2000 (released February 2000).
Overview: The Tree Metaphor and Its Unix Origins
The term tree for file systems comes directly from computer science's use of the tree data structure — a hierarchical arrangement where each node has exactly one parent (except the root, which has none) and no cycles exist. When Unix introduced its hierarchical file system in the early 1970s, the directory structure genuinely resembled a mathematical tree: a single root directory at the top, directories branching outward into subdirectories, and files sitting at the leaf positions with no paths looping back. The name was natural, intuitive, and technically accurate at the time of its coining.
The problem is that symbolic links (symlinks), introduced in BSD Unix 4.2 in 1983, allow any directory entry to point to any other file or directory anywhere in the system — including ones that cause a single file to appear in multiple locations simultaneously, or that create loops where following a path repeatedly leads back to the same point. This means the modern Unix, Linux, and macOS file system, strictly speaking, is no longer a pure tree. Mathematically, it is closer to a directed graph — or at best, a directed acyclic graph (DAG) when circular symlinks are prevented by operating system limits.
Yet we still say the file system tree, and this phrasing appears throughout official Linux documentation, university operating systems textbooks, shell tool manpages, and everyday developer conversation. The reason is a combination of historical inertia, the fact that the underlying inode-based structure genuinely is still tree-shaped at the directory level, and the continued usefulness of the tree mental model for navigating, organizing, and reasoning about files in everyday work.
Technical Reality: Trees, DAGs, and Graphs in File System Structure
To fully understand why the terminology is both imprecise and defensible, it helps to distinguish three levels of graph structure that are relevant here:
- Tree: A connected graph with no cycles where every node except the root has exactly one parent. A file system with no hard links or symlinks at all would be a strict tree — each file and directory reachable by exactly one path from root.
- DAG (Directed Acyclic Graph): A graph where edges have direction and no cycles exist, but a node can have multiple parents. Hard links create DAG structure because a single file (identified by its inode number) can have multiple directory entries pointing to it simultaneously in different locations.
- General directed graph: Can have cycles. Symbolic links can theoretically create cycles — for example, a symlink inside directory A that points back to directory A itself. Operating systems detect and limit such traversal to prevent programs from looping forever.
The key insight is that Unix and Linux file systems operate at two distinct structural layers that behave differently:
- The inode layer: The actual on-disk storage structure. Each inode is a fixed data structure containing file metadata — permissions, ownership, timestamps, and pointers to data blocks — identified by a number. The directory hierarchy at this level, excluding symlinks, is essentially tree-shaped. Most systems prohibit hard links to directories precisely to preserve tree structure at the directory level, preventing any directory from appearing in two places simultaneously at the inode level.
- The namespace layer: The path-based view of the system that users and programs navigate with commands like
cdandls. Once symbolic links are included in this view, the namespace is no longer a strict tree — the same underlying inode can be reached via multiple paths, symlinks can point anywhere including outside the current file system, and cycles become possible.
When Dennis Ritchie and Ken Thompson designed Unix around 1969–1971 at Bell Labs, they explicitly chose a simple, clean hierarchical design as a deliberate improvement over the more complex file system structures in earlier systems. The original Unix Programmer's Manual (1st Edition, November 1971) describes the file system as a hierarchy with a single root, and the directory structure at that time was a genuine tree. Hard links were present from the very beginning, technically making the structure a DAG at the file level — but the critical restriction against hard-linking directories preserved the tree structure at the directory level, which is the structure users actually navigate and reason about.
Symbolic links changed the picture significantly. A symlink is a special file whose entire content is a path string. When the kernel resolves a path and encounters a symlink, it substitutes the symlink's target string and continues resolution from that point. This mechanism allows directories to appear in multiple locations, enables cross-filesystem references that hard links cannot make, and creates the theoretical possibility of resolution cycles. Linux handles cycles by maintaining a hop counter during path resolution: the counter increments with each symlink followed, and when it reaches 40 (the value of MAXSYMLINKS), the system call returns ELOOP — the error message reads too many levels of symbolic links.
Common Misconceptions About File System Structure
Misconception 1: The file system is a tree. This statement is technically imprecise for any modern Unix, Linux, macOS, or Windows system in widespread use today. At the namespace level — the paths you actually traverse — the presence of symbolic links makes the file system a directed graph, not a tree. The more accurate statement is: the underlying inode-based directory hierarchy is tree-shaped (or a DAG due to hard links on files), but the full namespace including symbolic links forms a general directed graph. Most documentation and textbooks simplify this to tree because the concept is more useful for everyday file navigation than graph theory, and for most real-world use the tree model holds well enough.
Misconception 2: Symbolic links break or corrupt the file system. Symlinks do not break anything — they are a deliberate, well-supported, and widely useful feature that has been part of Unix since 1983. The kernel handles cycle detection during path resolution via the hop counter limit described above. Tools like find, rsync, and tar provide explicit flags (-L and --follow-symlinks) to control whether symlinks are followed or preserved as links, and languages like Python provide os.walk(followlinks=False) as a safe default. The graph nature of the namespace is a known, well-managed property of the system, not a defect.
Misconception 3: Windows uses a fundamentally different file system structure. Windows NTFS has supported directory junction points since Windows 2000 (released February 2000) and full symbolic links for both files and directories since Windows Vista (released January 2007) via the CreateSymbolicLink Win32 API call. The NTFS directory structure faces exactly the same theoretical graph issues as Unix when symlinks are present, yet Windows documentation also consistently uses the phrase directory tree throughout its official materials. The simplification to tree in naming is a universal convention across all major operating systems, not a quirk specific to the Unix tradition.
Practical Implications for Developers and System Administrators
Understanding that a file system is a graph rather than a pure tree has real, concrete consequences for anyone writing tools or scripts that traverse directory structures:
- Recursive operations can loop infinitely: Scripts that traverse directories recursively without symlink awareness can enter infinite loops when circular symlinks are present. The
findcommand's default behavior avoids following symlinks, which prevents this. Python'sos.walk(followlinks=False)is the safe default for the same reason. Always explicitly decide whether a traversal should follow symlinks rather than relying on default behavior. - Disk usage calculations can mislead: Tools like
du(disk usage) can double-count files or entire directory trees if symlinks are followed, making a path appear to consume more storage than it actually does. Thedu -Lflag follows symlinks while the default does not — this distinction matters when auditing disk usage on systems with many symlinks. - Backup tools must make an explicit choice: Backup utilities must decide whether to follow symlinks (capturing the linked content) or preserve them (backing up the link itself as a link). Tools like
rsyncoffer both modes:-lpreserves symlinks as symlinks in the backup, while-Lfollows and copies the linked content. The wrong choice can result in backups that are either incomplete or bloated with duplicate content. - Security — symlink attacks and TOCTOU vulnerabilities: Symbolic links can be exploited in time-of-check to time-of-use (TOCTOU) attacks, where an attacker replaces a regular file with a symlink between a security check and a subsequent operation on that path. This is particularly dangerous in world-writable directories like
/tmp. Secure code must use atomic operations such as theO_NOFOLLOWopen flag andopenat()with directory file descriptors to avoid this class of vulnerability entirely. - Modern systems use symlinks at the root level: The Linux UsrMerge initiative, fully adopted in Ubuntu 20.04 and later distributions, makes
/bin,/sbin,/lib, and/lib64symbolic links pointing into/usr. This means the very top level of a standard Linux installation now contains symlinks, and any traversal tool must handle them correctly from the root of the hierarchy itself, not just in deeper subdirectories.
The persistence of the tree metaphor is ultimately a lesson in how naming conventions outlive their technical precision once they become embedded in culture, documentation, education, and tooling ecosystems. The tree model is genuinely useful for understanding how to navigate a file system, reasoning about permission inheritance through directory hierarchies, and organizing directory structures for projects. The graph reality matters when you are writing tools that traverse the file system programmatically, auditing security configurations for symlink-based vulnerabilities, or managing complex deployment environments that use symbolic links extensively for version management or configuration abstraction across environments.
Related Questions
What is the difference between a hard link and a symbolic link?
A hard link is a direct directory entry pointing to an inode (the actual on-disk data structure), meaning multiple filenames literally refer to the same physical file — deleting one does not remove the data as long as at least one other hard link exists. A symbolic link is an indirect reference: a special file that contains a path string, and the kernel follows this path when the symlink is accessed during path resolution. Hard links cannot cross file system boundaries or link to directories in most systems, while symbolic links can do both freely. A file with 3 hard links will show a link count of 3 in <code>ls -l</code> output; symlinks are shown as a separate file type indicated by the letter l at the start of the permissions field.
How does Linux detect and prevent infinite loops from circular symbolic links?
Linux uses a simple hop counter during path resolution: every time the kernel follows a symbolic link while resolving a path, it increments this counter, and if the count exceeds 40 (the value of MAXSYMLINKS defined in the kernel source), the system call returns an ELOOP error with the message too many levels of symbolic links. This limit of 40 was chosen as a value that legitimate real-world symlink chains would never approach, providing a practical safety margin. The counter is scoped per path resolution operation, resetting with each new system call, so it does not penalize unrelated paths. This approach is simpler and more efficient than tracking visited inodes, which would require memory allocation proportional to path depth.
What is an inode in a Unix file system?
An inode (index node) is a data structure stored on disk that contains all metadata about a file except its name and actual content: permissions, ownership (user and group IDs), timestamps for creation, modification, and last access, file size in bytes, and pointers to the data blocks on disk where the file's content resides. File names exist only in directory entries, which map a human-readable name to an inode number — this is exactly why hard links work, since multiple names in different locations can map to the identical inode number. In the ext4 file system commonly used in Linux, each inode is 256 bytes by default, and the total number of inodes (and therefore the maximum number of files) is fixed at file system creation time, which can cause a disk to run out of inodes before running out of raw storage space.
Why can't you create hard links to directories in most file systems?
Hard links to directories are prohibited in most Unix-like systems to prevent cycles in the directory graph, which would cause tree traversal algorithms used by tools like <code>find</code>, <code>du</code>, and backup utilities to loop infinitely without special cycle detection. If directory hard links were permitted, you could create a structure where directory A contains directory B, and B also appears as a hard-linked entry back inside A, creating a genuine cycle at the inode level that no path-based traversal could safely handle. The POSIX standard explicitly permits implementations to restrict directory hard links, and Linux, macOS, and most BSD variants enforce this restriction, returning Operation not permitted for such attempts. Only the root user on some legacy Unix systems could create directory hard links, and even then the practice was strongly discouraged in system documentation.
How do other hierarchical data structures compare to a file system in terms of graph theory?
A file system's directory structure is the most complex of several common hierarchical structures in everyday computing when symbolic links are considered. DNS (Domain Name System) uses a genuine strict tree — each domain name has exactly one parent zone, with no cross-links or cycles possible, making it a true tree at all levels. XML and HTML documents also form strict trees where each element has exactly one parent node. Git's commit history is a DAG — commits can have multiple parents via merge commits but cycles are cryptographically impossible. Unlike all of these, a Unix file system with symbolic links is a general directed graph, the only one of these common structures that can contain cycles at the namespace level.
Also in Technology
More "Why Do" Questions
Trending on WhatAnswers
Browse by Topic
Browse by Question Type
Sources
- Unix filesystem — WikipediaCC BY-SA 4.0
- Symbolic link — WikipediaCC BY-SA 4.0
- Inode — WikipediaCC BY-SA 4.0
- Hard link — WikipediaCC BY-SA 4.0