operating-systems file-management
Definition
File Organisation
File organisation refers to the logical structure of the records within a file and the method used to access them. The choice of organisation depends on the nature of the application and the required access speed.
Common Organisations
Unstructured Sequence of Bytes
The simplest logical structure where the file is treated as a stream of bytes without any internal record structure.
- Usage: Standard in Unix and modern Windows for most files; interpretation is left to the application.
Pile
The simplest form of organisation. Records are stored in the order they arrive (chronological).
- Structure: Records can have variable lengths and a variable number of fields.
- Access: Retrieval requires a linear search, which is slow.
- Usage: Useful for log files or temporary data collection.
Sequential File
Records are stored in a fixed format and ordered based on a specific key field.
- Structure: Fixed-length records with a fixed set of fields in a predetermined order.
- Access: Efficient for processing the entire file in order. Searching for a specific record is faster than a pile (e.g., via binary search) but still requires multiple disk accesses.
- Maintenance: Difficult to insert new records; often requires an overflow file and periodic reorganisation.
Indexed Sequential File
An enhancement of the sequential file that adds an index to support direct access.
- Mechanism: The index contains the key and a pointer to the start of the corresponding block in the primary file.
- Maintenance: Uses an overflow file for new insertions to maintain order without shifting all subsequent records immediately.
Indexed File
Unlike indexed sequential, an indexed file may have multiple indexes, one for each field that might be used as a search criterion.
- Structure: The primary file records are not necessarily ordered. The indexes provide the logical ordering.
- Usage: Common in database systems where high-speed lookup by multiple attributes is required.
Hash (Direct) File
Uses a hash function applied to the key field to determine the physical address of the record.
- Access: Provides very fast direct access (ideally ).
- Collision Handling: If two keys hash to the same location, an overflow file or chaining is used.
- Usage: Ideal when direct access is the primary requirement and sequential processing is rare.