Thursday, 11 March 2010

Directory to XML BASH Script

Ever wanted to get a directory into an XML structure?

Here is a quick, short and easily modifiable BASH script that works well.

To see how it works, you could run it like this:

scriptname dir | xmllint --format - | less

for dir, you can use . .. ./ ../ or / as well as subdirectories and full directory paths.

You might also be interested in this project: xml-dir-listing

How it works

The script first makes special directory specifiers easily useable. It then calls doDir with the directory.

doDir uses ls to get all files in the current directory. If the file is actually a directory and not a sym-link, it calls itself to process the subdirectory. Otherwise it outputs the file name.

Certain special directories (. and ..) are ignored to avoid infinite loops.

The program can use a lot of stack space so I increase it - I just guessed a value. It also can take a long time, so I renice the process so you can do other things.

To stop it, you may need to enter a lot (10-20) of ctrl-c's. I'm not sure why.

Sample Output


<dir>
<dirname><![CDATA[/usr/share/doc/distcc/example]]></dirname>
<file><![CDATA[init]]></file>
<file><![CDATA[init-suse]]></file>
<file><![CDATA[logrotate]]></file>
<file><![CDATA[xinetd]]></file>
</dir>
<file><![CDATA[protocol-1.txt]]></file>
<file><![CDATA[protocol-2.txt]]></file>
<file><![CDATA[reporting-bugs.txt]]></file>
<file><![CDATA[status-1.txt]]></file>
<file><![CDATA[survey.txt]]></file>
</dir>
<dir>
<dirname><![CDATA[/usr/share/doc/groff]]></dirname>
</dir>
<dir>
<dirname><![CDATA[/usr/share/emacs]]></dirname>
<dir>
<dirname><![CDATA[/usr/share/emacs/22.1]]></dirname>
<dir>
<dirname><![CDATA[/usr/share/emacs/22.1/etc]]></dirname>
</dir>
<dir>
<dirname><![CDATA[/usr/share/emacs/site-lisp]]></dirname>
</dir>
<dir>
<dirname><![CDATA[/usr/share/enscript]]></dirname>
<file><![CDATA[88591.enc]]></file>
<file><![CDATA[885910.enc]]></file>
<file><![CDATA[88592.enc]]></file>
</dir>



The Script
#!/bin/bash


# WARNING: To break this, you need to enter a lot of ctrl-c's


# heavy recursion so allow a bigger stack
ulimit -s 32768


# run with low priority so you can do other stuff while it works
renice -n +19 -p $$


function doDir {
  # directory name may contain illegal XML characters so we won't use attributes 
  #echo "<dir name=\"${1}\">"
  echo "<dir>"
  echo "<dirname><![CDATA[${1}]]></dirname>"
  # get all files and directories
  ls -Ab1 "$1/" | while read file; do
  # recursively process directories but not sym-links
  if [ -d "${1}/${file}" ] && [ ! -h "${1}/${file}" ]; then
    # don't do . and .. either
    if [ "$file" != "." ] && [ "$file" != ".." ]; then
      doDir "${1}/${file}"
    fi
  else
    # output the file
    echo "<file><![CDATA[$file]]></file>"
  fi
  done
  echo "</dir>"
}


# normalise initial directories so they all work
DIR=$1
if [ "."   == "$DIR" ]; then DIR="$(pwd)" ; fi
if [ ".."  == "$DIR" ]; then DIR=".."     ; fi
if [ "../" == "$DIR" ]; then DIR=".."     ; fi
if [ "./"  == "$DIR" ]; then DIR="$(pwd)" ; fi
if [ "/"   == "$DIR" ]; then DIR=""       ; fi


doDir $DIR