Creating an example directory tree
With these commands I am creating a simple directory tree with some directories and files which I will use further on.mkdir -p somedir/d1/b somedir/d1/c somedir/d2 somedir/d3; # Create some directories touch somedir/d1/b/file1 somedir/d3/file2; # Create some files ln -s somedir/d3/file2 somedir/d3/z; # Create symbolic links (cd somedir; ln -s d1 d4_link ;) find somedir somedir somedir/d1 somedir/d1/b somedir/d1/b/file1 somedir/d1/c somedir/d2 somedir/d3 somedir/d3/file2 somedir/d3/z somedir/d4_linkI want some code to show me that somedir contains 9 elements and somedir/d1 contains 3 elements aso.
List the complete directory tree with types of files
On Linux systems with GNU find (unfortunately this does not work on my Mac) one can easily retrieve a list of elements and their file types with the printf formatting options. My example contains six directories (including the topmost somedir), two files and two symbolic links (one to a file and one to a directory).find somedir -printf '%y %p\n' d somedir d somedir/d1 d somedir/d1/b f somedir/d1/b/file1 d somedir/d1/c d somedir/d2 d somedir/d3 f somedir/d3/file2 l somedir/d3/z l somedir/d4_link
Calculating the number of entries per directory
First of all the awk command consumes a slightly modified output of the find from above:I am adding a slash so that I can use the slash as an awk field delimiter.
find somedir -printf '%y/%p\n' ... d/somedir/d1/b ...The awk command:
awk -F/ ' { x = $2; count[x]++; for(i=3;i<=NF;i++) { x = x FS $i; count[x]++ } type[x] = $1; } END { for(i in count) printf "%s %2d %s\n", type[i],count[i]-1,i }'What happens in each step: the first field is the type of the element ( d for directory, f for field etc.), the count array is increased for each occurance of a path.
d/somedir x = "somedir" count["somedir"] = 1 # the for loop is not executed for "d/somedir" since NF=2 type["somedir"} = "d" d/somedir/d1 x = "somedir" count["somedir"] = 2 # increase count by 1 # NF = 3. The for loop is executed once. x = x FS "a" = "somedir/d1" count["somedir/d1"] = 1 # first time: 1 type["somedir/d1"] = "d" d/somedir/d1/b x = "somedir" count["somedir"] = 3 # increase count by 1 # NF = 4. The for loop is executed twice. x = x FS "a" = "somedir/d1" count["somedir/d1"] = 2 # increase count by 1 x = x FS "b" = "somedir/d1/b" count["somedir/d1/b"] = 1 # first time: 1 type["somedir/d1/b"] = "d"
Here is the combined command sequence also appended by a sort statement for better readability
find somedir -printf '%y/%p\n' | awk -F/ '{ x=$2; count[x]++; for(i=3;i<=NF;i++) { x=x FS $i; count[x]++ } type[x]=$1;} END {for(i in count) printf "%s %2d %s\n", type[i],count[i]-1,i } '| sort -k3,3 d 9 somedir d 3 somedir/d1 d 1 somedir/d1/b f 0 somedir/d1/b/file1 d 0 somedir/d1/c d 0 somedir/d2 d 2 somedir/d3 f 0 somedir/d3/file2 l 0 somedir/d3/z l 0 somedir/d4_linkSo somedir contains 9 elements altogether: 4 direct elements d1, d2, d3 and d4_link and also elements of elements.
Note that the END statement prints the count minus one since the count was set to one when the directory appeared originally but we want to show only the number of elements i.e. I need to exclude the directory itself.
Note also that somedir/d4_link (the symbolic link to directory somedir/d1) is not followed and listed as having zero elements. If you want to follow symbolic links to directories with find somedir -follow the calculations will be misleading since - in this example - elements of d1 and d4_link would be calculated twice.
The counts for non-directory file types should always be zero, they could probably be excluded completely from the output.
find somedir -printf '%y/%p\n' | awk -F/ '{ x=$2; count[x]++; for(i=3;i<=NF;i++) { x=x FS $i; count[x]++ } type[x]=$1;} END {for(i in count) if( type[i]=="d") printf "%2d %s\n", count[i]-1,i } '| sort -k2,2 9 somedir 3 somedir/d1 1 somedir/d1/b 0 somedir/d1/c 0 somedir/d2 2 somedir/d3
Usages
Empty directories
Add a grep '^d 0' (or adjust the awk code with if clause count[i]==1 etc.)find somedir -printf '%y/%p\n' | awk -F/ '{ x=$2; count[x]++; for(i=3;i<=NF;i++) { x=x FS $i; count[x]++ } type[x]=$1;} END {for(i in count) printf "%s %2d %s\n", type[i],count[i]-1,i } '| grep '^d 0' d 0 somedir/d1/c d 0 somedir/d2
Non-empty directories
find somedir -printf '%y/%p\n' | awk -F/ '{ x=$2; count[x]++; for(i=3;i<=NF;i++) { x=x FS $i; count[x]++ } type[x]=$1;} END {for(i in count) printf "%s %2d %s\n", type[i],count[i]-1,i } '| grep '^d .[^0]' | sort -k 3,3 d 9 somedir d 3 somedir/d1 d 1 somedir/d1/b d 2 somedir/d3