Converting the Erlang directory walker to multi-process design
Monday, June 29th, 2009In my previous post I wrote a simple file system walker using Erlang for the purposes of getting my head around the syntax. To start thinking more in an Erlang mindset (though in a very contrived manner) in this post I convert the file system walker to use two processes. The first process (”walk”) performs the file system walking and the second (”visit”) processes each found item (i.e. prints the full item path name).
Something along the lines of this…

Since walk and visit are the process entry points they need to be in the export list (by the way – forgetting to do this does not cause a compiler error in erl – why is that?) and we need a new entry function to create the new processes (start/1).
start(Path) -> Visit_PID = spawn(walkproc, visit, []), spawn(walkproc, walk, [Path, Visit_PID]).
On line 2 we create the process whose pid is assigned to Visit_PID. This process calls the visit() function with 0 parameters. At this point visit is running and waiting to receive a message.
On line 3 we spawn off another process. This process calls the walk/2 function as walk(Path, Visit_PID). Since the walk function is doing the file system walking it makes sense that it will be the one sending the first message. Because of this it needs to know the PID of the process to send the message to.
visit is a very straight forward function. It receives a message, processes it, sends a response and recurses (rinse and repeat).
visit() -> receive {Path, Walk_PID} -> io:format("~s~n", [Path]), Walk_PID ! next, visit() end.
walk is very similar (and has not changed substantially from our previous version. it starts by firing a message off to visit with the Path passed as a parameter. Next it waits for the “next” message. Once it receives that it gets the next file system entry and recurses.
walk(Path, Visit_PID) -> Visit_PID ! {Path, self()}, receive next -> FileType = file_type(Path), case FileType of file -> ok; symlink -> ok; directory -> Children = filelib:wildcard(Path ++ "/*"), lists:foreach(fun(P) -> walk(P, Visit_PID) end, Children) end end.
The other methods (file_type and is_symlink) have not changed.
I enjoyed how easy it was to convert to a multi-process approach and am looking forward to moving to a solution that uses RabbitMQ.
Walking the directory tree in Erlang
Monday, June 29th, 2009I’m learning Erlang. I’ll get into “why” in some other post – the purpose here is to share my first sample program and solicit feedback. The purpose of the program is to start print the contents of a file system from the indicated point downwards (ignoring symlinks).
The application has a single module, walker, which exports walk/1. The argument to walk/1 is the starting path. For example:
> walker:walk("/home").
This method prints the path name, determines the type of the current path (file, directory or symlink), and then IFF the path is a directory it calls filelib:wildcard to get the children of the path and repeats the process on them.
-module(walker). -include_lib("kernel/include/file.hrl"). -export([walk/1]). is_symlink(Path) -> case file:read_link_info(Path) of {ok, #file_info{type = symlink}} -> true; _ -> false end. file_type(Path) -> IsRegular = filelib:is_regular(Path), case IsRegular of true -> file; false -> case is_symlink(Path) of true -> symlink; false -> directory end end. walk(Path) -> io:format("~s~n", [Path]), FileType = file_type(Path), case FileType of file -> ok; symlink -> ok; directory -> Children = filelib:wildcard(Path ++ "/*"), lists:foreach(fun(P) -> walk(P) end, Children) end.
My questions about this module are:
- Does calling walk(P) in a foreach prevent tail recursion optimizations?
- Where I have “case FileType of” (in walk/1) is there a more succinct way to express that?
- Why doesn’t read_file_info ever return file_info#type==symlink?
- How should this have really been done?
I’ll be working on answering #1-3 on my way to learning #4 – but if you have any feedback I would love to hear it.
The next step is to make this message based and have the walk/1 method send messages to a consumer who will do the printing.
The next-next step is to RabbitMQ and setup one producer and three consumers – one for files, one for directories and one for symlinks. The walk/1 method will no longer print the file info but rather send the appropriate message and let the consumers print the messages.
