Fake file systems are used in the field of cyber deception to bait intruders and fool forensic investigators. File system researchers also frequently generate their own synthetic document repositories, due to data privacy and copyright concerns associated with experimenting on real-world corpora. For both these fields, realism is critical. Unfortunately, after creating a set of files and folders, there are no current testing standards that can be applied to validate their authenticity, or conversely, reliably automate their detection. This paper reviews the previous 30 years of file system surveys on real world corpora, to identify a set of discrete measures for generating synthetic file systems. Statistical distributions, such as size, age and lifetime of files, common file types, compression and duplication ratios, directory distribution and depth (and its relationship with numbers of files and sub-directories) were identified and the respective merits discussed. Additionally, this paper highlights notable absences in these surveys, which could be beneficial, such as analysing, on mass, the text content distribution, file naming habits, and comparing file access times against traditional working hours.