Syntax Highlighing for Erlang in NotePad++
Wednesday, July 8th, 2009Update: The definition has been updated to include support for atoms, variables and function names as well as additional file extensions. Screen shot and downloadable content have been updated.
Thus far I’ve done all of my Erlang development on Fedora using vim or KWrite (which does a decent job in Ruby mode).
But today I found myself on a windows box and wanted a basic syntax highlighting editor for Erlang that was free and worked on Windows. Oh – and not Eclipse+Erlide. I wanted something small and fast.
I grabbed the “free as in beer” and “free as in speech” editor NotePad++ and created a simple syntax file that is a bit hokey but will serve my needs fine.
Here’s a screen shot …

NotePad++ has pretty weak syntax highlighting but was sufficent to do most of what I wanted. Some regex based rules would make this a more robust.
Highlighted entities include
- Erlang reserved words (and named operators)
- Variables
- Atoms
- function names (same coloring as atoms)
- Operators
- Comments
- Kernal, stdlib, mnesia and odbc modules.
- Support for *.erl, *.hrl and *.htp extentions
I’ve probably missed several things.
Looks a lot better than nothing and it took all of 10 15 minutes.
If you are using NotePad++ here is the file:
http://www.roberthorvick.com/wp-content/uploads/2009/07/erlangSyntaxDefinition.zip
And here are the instructions on how to install it:
http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Syntax_Highlighting_Sharing
And here’s the instructions on modifying or creating your own:
http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=User_Defined_Languages
Startup and Cleanup functions in Erlang EUnit tests
Monday, July 6th, 2009I’ve got the basic grasp on Erlang but before I can do anything meaningful I need to learn how to test my code EUnit is the de-facto test framework. There are a ton of great sites that talk about testing with EUnit – I have linked to several at the end of this post. I’m going to focus on the problem I ran into – needing to run some startup and cleanup code around the test suite.
I started by writing a very simple dictionary that lived in it’s own process. The interface is:
start() -> ok stop() -> ok write(Key, Element) -> ok delete(Key) -> ok read(Key) -> {ok, Value} | not_found
Calling start() spawns and registers the process the dictionary lives in. Calling stop ends the process. The rest do what you expect.
I started by writing some basic tests for each function but I didn’t really think it through. The start test ran somewhere in the middle so all the tests prior to start either failed or timed out (because I was waiting on a receive that would never come since the process wasn’t started). Very quickly I smacked my forehead and said “I need the test suite to have a pre and post function to start and stop the process”.
It took about 15 minutes of reading to realize that I needed to create a test generator with a setup tuple that defined the startup, cleanup and list of tests to run.
Conceptually it was easy to understand but getting the syntax right was a royal pain. Basically you do this (I’m not sure if giving the function arity is the preferred method but it saved some keystrokes so I went with it):
my_db_test_() -> {spawn, {setup, fun start/0, fun stop/1, [ fun write_new/0, fun write_existing/0, fun read_existing/0, fun read_missing/0, fun delete_existing/0, fun delete_missing/0 ] } }.
The test methods are exactly what you’d expect so I won’t repeat them. The syntax feels a little foreign and uncomfortable but it’s very easy to understand and I totally get why it’s written this way. I’m just so used to MSTest attributes and RSpec that this new way is going to take a night or two to feel like an old friend.
Some Eunit resources
Prag Dave – Test-First Word Wrap in Erlang
The last word frequency post – from dict to ets
Friday, July 3rd, 2009One last iteration through my learning exercise of building a word frequency list. In this last post I’m moving away from a dict and to an ets table. I was pleasantly surprised how easy the conversion was. For example printing the output was just converting from dict:fold to ets:foldl. The one parity fail was that dict:update can take an initial value when the key is missing but ets:update_counter (nor any other ets function) has this benefit. This required that I write a little wrapper function to call from the list:foldl (instead of having a multi-line inlined fun).
No point in getting too deep into this – here’s the code:
-module(wordets). -export([print_word_counts/1]). words(String) -> {match, Captures} = re:run(String, "\\b\\w+\\b", [global,{capture,first,list}]), lists:append(Captures). %% reads the next line from the file. If there is data then... %% split the data into a list of words and add to the word table process_each_line(IoDevice, Table) -> case io:get_line(IoDevice, "") of eof -> file:close(IoDevice), Table; {error, Reason} -> file:close(IoDevice), throw(Reason); Data -> NewTable = lists:foldl( fun(W, T) -> update_word_count(W, T) end, Table, words(Data)), process_each_line(IoDevice, NewTable) end. update_word_count(Word, Table) -> case ets:lookup(Table, Word) of [{Word, _}] -> ets:update_counter(Table, Word, 1); [] -> ets:insert(Table, {Word, 1}) end, Table. print_words(Words) -> ets:foldl(fun({W,C}, AccIn) -> io:format("~s: ~w~n", [W, C]), AccIn end, void, Words). %% opens the indicated file, processes the contents and prints %% out the word/count pairs to stdout print_word_counts(Filename) -> {ok, IoDevice} = file:open(Filename, read), Words = process_each_line(IoDevice, ets:new(words, [])), print_words(Words).
The ets implementation feels a bit forced (which it was – the point was to learn another module). I don’t think I’d have gone this way in practice unless I wanted to persist the frequency data to a file or if the word data were more complex (for example if I were storing information about where in the file the word was, word neighbors, etc).
Enough of this sample. On to something more substantial.
Adding a custom protocol to Firefox for easy Erlang docs
Friday, July 3rd, 2009To make getting to the Erlang docs a bit quicker I added a custom protocol handler to firefox so when I go to:
I get redirected to
http://www.erlang.org/doc/man/lists.html
To do this I first created a little shell script that would take the address and spawn firefox with the appropriate url. (insert caveat about this being the first bash script I’ve written in 10 years here …).
#!/bin/bash ERLANG_TOPIC=`echo $1 | sed 's/erlang:\/\///'` exec firefox http://www.erlang.org/doc/man/$ERLANG_TOPIC.html
Next I marked it as executable:
chmod +x erlang.protocolFinally I followed these directions and added the following config settings in firefox:

Since the Erlang/OTP doc pages match their module names this works pretty well.
For queries where I’m not sure what I want I added a Erlang specific google search to my Erlang Resources page.
Word frequency redux – Erlang list comprehension, regex and list folding
Thursday, July 2nd, 2009Sean Cribbs was nice enough to point out a pair of changes I could make to my word frequency counter from last time.
Based on his feedback I made three changes. First – the regular expression code has changed from this:
matches(H,{match,M}) -> matches(H,M,[]). matches(_,[],Acc) -> Acc; matches(H,[{I,L}|T],Acc) -> matches(H,T,[lists:sublist(H,I,L)|Acc]). words(String) -> matches(String,regexp:matches(String, "[A-Za-z0-1]+")).
to this:
words(String) -> {match, Captures} = re:run(String, "\\b\\w+\\b", [global,{capture,first,list}]), [hd(C) || C<-Captures].
That last line took me a bit to grok. It’s a list comprehension (if you are reading Joe Armstrong’s thesis it is section 3.3.13. In Erlang Programming it is chapter 9.3). Basically it’s saying “for each list in the list of matches take the head of the list” – a-gigga-wah?
Ok. Let’s go to erl.
7> re:run("foo foo bar", "\\b\\w+\\b", [global,{capture,first,list}]).
{match,[["foo"],["foo"],["bar"]]}
Observe that re:run returns a nested list (i.e. a list of lists) – and each list has exactly one element (the string [which is itself a list but I'll cal them strings]). What we want to do is take that list-of-lists-of-strings and turn it into a list-of-strings.
That’s what “[hd(C) || C<-Captures].” does – it pulls every capture (a word wrapped in a list) from the match list and runs it through erlang:hd which pulls the word from the list – then it gets added to the resulting list. So we end up with a list strings.
It’s un-nesting the list.
Next Sean suggested “Then I’d probably use some kind of key-value structure, like a proplist or dict, to count the words using a lists:foldX function.”
so I fired up “erl -man lists” to learn what foldX meant (actually “foldl” “foldr” depending on whether you want to fold from the left or right.
In a nut shell folding is iterates over a list calling a fun that takes the current value and an accumulator and which returns the new accumulator. An example from the man page is:
> lists:foldl(fun(X, Sum) -> X + Sum end, 0, [1,2,3,4,5]).
15
I spent some time thinking and after some trial and error came up with this:
lists:foldl(fun(W, Dict) -> dict:update(W, fun(C) -> C + 1 end, 1, Dict) end, dict:new(), ["foo", "foo", "bar"]). %% sample input
In a nutshell – for every word in the list update the dictionary by calling the fun which increments the count value, setting the initial count to 1 if the value does not already exist in the dictionary (and starting with an empty dictionary).
After these three changes the new program is about half the size of the previous and really only has a few interesting lines surround by nearly error and flow control.
Thanks Sean!
The new code …
-module(wordlist). -export([print_word_counts/1]). words(String) -> {match, Captures} = re:run(String, "\\b\\w+\\b", [global,{capture,first,list}]), [hd(C) || C<-Captures]. %% reads the next line from the file. If there is data then... %% split the data into a list of words and add those to the word dict process_each_line(IoDevice, Dict) -> case io:get_line(IoDevice, "") of eof -> file:close(IoDevice), Dict; {error, Reason} -> file:close(IoDevice), throw(Reason); Data -> NewDict = lists:foldl( fun(W, D) -> dict:update(W, fun(C) -> C + 1 end, 1, D) end, dict:new(), words(Data)), process_each_line(IoDevice, NewDict) end. print_dict(Dict) -> dict:fold(fun(Word, Count, AccIn) -> io:format("~s: ~w~n", [Word, Count]), AccIn end, void, Dict). %% opens the indicated file, processes the contents and prints %% out the word/count pairs to stdout print_word_counts(Filename) -> case file:open(Filename, read) of {ok, IoDevice} -> Dict = process_each_line(IoDevice, dict:new()), print_dict(Dict); end.
Finding word frequencies using Erlang
Thursday, July 2nd, 2009Here’s a follow-up post that provides a better implementation.
I wanted to try and bite off something a little larger today hitting a few areas that seem generally useful:
- Basic text file operations
- Basic string operations
- Using the gb_trees module
- Avoiding any usage of lists:foreach; instead using tail recursion (that whole “thinking in Erlang” thing).
The problem is to produce the word frequency of a specific text file and to print out the frequency information. For example if a file overdunn.txt contained the text:
“Dunn was over Unger and I was over Dunn.” (Capt. Oveur, Airplane II: The Sequel)
The output would be:
Dunn: 2
I: 1
Unger: 1
and: 1
over: 2
was: 2
Word frequencies have some practical uses (fuzzy text matching, building tag clouds, etc). So while it’s a bit contrived it is the basis for something useful.
The code doesn’t need a lot of explanation – so here you go …
-module(wordlist). -export([print_word_counts/1]). %% matches/* and words/1 From: http://www.trapexit.org/Matching_Words matches(H,{match,M}) -> matches(H,M,[]). matches(_,[],Acc) -> Acc; matches(H,[{I,L}|T],Acc) -> matches(H,T,[lists:sublist(H,I,L)|Acc]). words(String) -> matches(String,regexp:matches(String, "[A-Za-z0-1]+")). %% builds a tree of word/count pairs. If the word does not exist in %% the tree it is added with an initial value of 1. If the word does %% exist the count is retrieved and incremented build_word_tree([], Tree) -> Tree; build_word_tree([W|R], Tree) -> case gb_trees:is_defined(W, Tree) of true -> Count = gb_trees:get(W, Tree), NewTree = gb_trees:update(W, Count + 1, Tree), build_word_tree(R, NewTree); false -> NewTree = gb_trees:insert(W, 1, Tree), build_word_tree(R, NewTree) end. %% reads the next line from the file. If there is data then... %% split the data into a list of words and add those to the word tree process_each_line(IoDevice, Tree) -> case io:get_line(IoDevice, "") of eof -> file:close(IoDevice), Tree; {error, Reason} -> file:close(IoDevice), throw(Reason); Data -> NewTree = build_word_tree(words(Data), Tree), process_each_line(IoDevice, NewTree) end. %% walks the gb_tree and prints each word/count pair print_tree(Iter) -> case gb_trees:next(Iter) of none -> ok; {Key, Val, NewIter} -> io:format("~s: ~w~n", [Key,Val]), print_tree(NewIter) end. %% opens the indicated file, processes the contents and prints %% out the word/count pairs to stdout print_word_counts(Filename) -> case file:open(Filename, read) of {ok, IoDevice} -> Tree = process_each_line(IoDevice, gb_trees:empty()), print_tree(gb_trees:iterator(Tree)); {error, Reason} -> io:format("~s~n", [Reason]) end.
As usual – I’m just getting started with Erlang. What is the good, bad and ugly with this code?
Aha Moments – gb_trees:update and thinking in Erlang
Wednesday, July 1st, 2009The first episode of The X-Files I ever saw was the Home. I was hooked immediately – and at the same time somewhat disappointed. This was before on-demand TV, hulu or DVD compilations on NetFlix. I had missed more than 3 seasons of a what was clearly going to be one of my favorite shows and there the only way to catch up was the hope that the next rerun would be one I hadn’t seen. Then it got all weird and aliens were showing up too often, Mulder basically disappeared and The Lone Gunman got their own spinoff …
Learning Erlang is a little like seeing the first episode. I’m hooked immediately – and at the same time somewhat disappointed. I’m years behind again. At least this time I don’t need to fret over not being able to find the reruns I want and there is very little risk that Dean Haglund will make an appearance (I bet Langly knew Erlang).
A few minutes ago I had my first significant Aha! moment. In my effort to map what I do know (procedural/OO paradigms) to Erlang I wondered “how would a balance tree be implemented in Erlang”. I binged (yeah, I said it) for “Erlang balanced trees” and quickly found the gb_trees module.
What I wanted to understand was:
- How is the tree represented in Erlang?
- How is the tree (and it’s nodes) modified (insert, update, delete, etc)?
- Can processes share the same tree (and how is concurrency handled)?
How is the tree represented in Erlang?
This took me the longest to understand – but once it fell into place everything else made so much more sense. If you are curious go ahead and open up gb_trees.erl (on my machine it’s in /usr/lib/erlang/lib/stdlib-1.15.5/src) – now go to the definition of the node and tree types. Go ahead… I’ll wait.
Found them yet? No? Maybe it’s in some header file. Feel free to check … in the meantime I’ll continue.
Ready for this? There isn’t one. You don’t need to define a tree or node type – it’s just a tuple that gets passed around and interpreted at runtime. Not quite clear? OK – if we were going to define it, it would look something like this:
{Size, {Key, Value, Smaller, Bigger}}
- Size is the number of nodes in the tree (which is used for balancing).
- Key is the key of the current node.
- Value is the value associated with Key.
- Smaller is the nodes that are smaller (i.e. “to the left”).
- Bigger is nodes that are bigger (”to the right”).
These functions really helped make it much clearer:
%%-spec(empty/0 :: () -> gb_tree()). empty() -> {0, nil}. %%-spec(is_empty/1 :: (gb_tree()) -> bool()). is_empty({0, nil}) -> true; is_empty(_) -> false. %%-spec(size/1 :: (gb_tree()) -> non_neg_integer()). size({Size, _}) when is_integer(Size), Size >= 0 -> Size.
“size” says so much it’s wonderful. Besides the obvious of getting the size of the tree it also validates that the size is greater-than-or-equal-to zero otherwise there will be an exception. This is a great example of Erlang failing fast. Try it out:
2> gb_trees:size({-1, {[]}}).
** exception error: no function clause matching gb_trees:size({-1,{[]}})
See – the clause doesn’t match and an exception is thrown. Think of the comparable C++ code. It would look something like:
int size(Tree *tree) { if (tree == null) { throw ...; } if(tree->size < 0) { throw ...; } return tree->size; }

How is the tree modified?
I think gb_trees:update is the most approachable example of this. Besides being crazy small it is a very clear example of how the tree is walked to search for the key and how error handling is meant to be done.
%%-spec(update/3 :: (_, _, gb_tree()) -> gb_tree()). update(Key, Val, {S, T}) -> T1 = update_1(Key, Val, T), {S, T1}. %% See `lookup' for notes on the term comparison order. update_1(Key, Value, {Key1, V, Smaller, Bigger}) when Key < Key1 -> {Key1, V, update_1(Key, Value, Smaller), Bigger}; update_1(Key, Value, {Key1, V, Smaller, Bigger}) when Key > Key1 -> {Key1, V, Smaller, update_1(Key, Value, Bigger)}; update_1(Key, Value, {_, _, Smaller, Bigger}) -> {Key, Value, Smaller, Bigger}.
Notice first that since update is never changing the size it does not bother to pass it to update_1 – only the “node” portion of the tuple is passed.
Once passed there are four possible options – three of which need to be codified (in the same order as the code):
- The provided key is smaller than the current tree node. In this case build up a new result and call update_1 with update_1(Key, Value, Smaller) – “walk left”
- The provided key is larger than the current tree node. Walk right.
- The provided key is exactly the current node key (not too big … not to small … just right!). In this case build up a new node with the new value.
- (implicit) The provided key is not in the tree. Throw an exception.
In the case where the node is not found update_1 ends up being called like:
update_1(Key, Value, nil)
Since the match on #1-3 require a non-nil tuple this turns into a missing function clause error.
Can processes share the same tree?
No. No they can’t. I was having a little trouble groking this until I ran through some samples in erl. This is what made it clear for me:
1> Tree = gb_trees:empty().
{0,nil}
2> Tree2 = gb_trees:insert("Foo", "Bar", Tree).
{1,{"Foo","Bar",nil,nil}}
3> Tree3 = gb_trees:update("Foo", "NewBar", Tree2).
{1,{"Foo","NewBar",nil,nil}}
4> Tree4 = gb_trees:update("Missing", "value", Tree3).
** exception error: no function clause matching gb_trees:update_1("Missing","value",nil)
in function gb_trees:update_1/3
in call from gb_trees:update/3
Each time anything was done a new tree was created – the old tree was still valid and could be used but it was not the new tree. It’s the old tree. This boggled me for a few minutes until Justin Sheehy confirmed what the code was saying.

So no – multiple processes could not possibly share the tree because the tree is immutable.
Want proof? Go back to erl and run gb_trees:get on the key “Foo” on Tree, Tree2 and Tree3. Tree will error (it’s an empty tree). Tree2 will find “Bar” and Tree3 will find “NewBar”.
14> gb_trees:get("Foo", Tree).
** exception error: no function clause matching gb_trees:get_1("Foo",nil)
15> gb_trees:get("Foo", Tree2).
"Bar"
16> gb_trees:get("Foo", Tree3).
"NewBar"
Aha!
Converting the Erlang directory walker to multi-process design
Monday, June 29th, 2009In my previous post I wrote a simple file system walker using Erlang for the purposes of getting my head around the syntax. To start thinking more in an Erlang mindset (though in a very contrived manner) in this post I convert the file system walker to use two processes. The first process (”walk”) performs the file system walking and the second (”visit”) processes each found item (i.e. prints the full item path name).
Something along the lines of this…

Since walk and visit are the process entry points they need to be in the export list (by the way – forgetting to do this does not cause a compiler error in erl – why is that?) and we need a new entry function to create the new processes (start/1).
start(Path) -> Visit_PID = spawn(walkproc, visit, []), spawn(walkproc, walk, [Path, Visit_PID]).
On line 2 we create the process whose pid is assigned to Visit_PID. This process calls the visit() function with 0 parameters. At this point visit is running and waiting to receive a message.
On line 3 we spawn off another process. This process calls the walk/2 function as walk(Path, Visit_PID). Since the walk function is doing the file system walking it makes sense that it will be the one sending the first message. Because of this it needs to know the PID of the process to send the message to.
visit is a very straight forward function. It receives a message, processes it, sends a response and recurses (rinse and repeat).
visit() -> receive {Path, Walk_PID} -> io:format("~s~n", [Path]), Walk_PID ! next, visit() end.
walk is very similar (and has not changed substantially from our previous version. it starts by firing a message off to visit with the Path passed as a parameter. Next it waits for the “next” message. Once it receives that it gets the next file system entry and recurses.
walk(Path, Visit_PID) -> Visit_PID ! {Path, self()}, receive next -> FileType = file_type(Path), case FileType of file -> ok; symlink -> ok; directory -> Children = filelib:wildcard(Path ++ "/*"), lists:foreach(fun(P) -> walk(P, Visit_PID) end, Children) end end.
The other methods (file_type and is_symlink) have not changed.
I enjoyed how easy it was to convert to a multi-process approach and am looking forward to moving to a solution that uses RabbitMQ.
Walking the directory tree in Erlang
Monday, June 29th, 2009I’m learning Erlang. I’ll get into “why” in some other post – the purpose here is to share my first sample program and solicit feedback. The purpose of the program is to start print the contents of a file system from the indicated point downwards (ignoring symlinks).
The application has a single module, walker, which exports walk/1. The argument to walk/1 is the starting path. For example:
> walker:walk("/home").
This method prints the path name, determines the type of the current path (file, directory or symlink), and then IFF the path is a directory it calls filelib:wildcard to get the children of the path and repeats the process on them.
-module(walker). -include_lib("kernel/include/file.hrl"). -export([walk/1]). is_symlink(Path) -> case file:read_link_info(Path) of {ok, #file_info{type = symlink}} -> true; _ -> false end. file_type(Path) -> IsRegular = filelib:is_regular(Path), case IsRegular of true -> file; false -> case is_symlink(Path) of true -> symlink; false -> directory end end. walk(Path) -> io:format("~s~n", [Path]), FileType = file_type(Path), case FileType of file -> ok; symlink -> ok; directory -> Children = filelib:wildcard(Path ++ "/*"), lists:foreach(fun(P) -> walk(P) end, Children) end.
My questions about this module are:
- Does calling walk(P) in a foreach prevent tail recursion optimizations?
- Where I have “case FileType of” (in walk/1) is there a more succinct way to express that?
- Why doesn’t read_file_info ever return file_info#type==symlink?
- How should this have really been done?
I’ll be working on answering #1-3 on my way to learning #4 – but if you have any feedback I would love to hear it.
The next step is to make this message based and have the walk/1 method send messages to a consumer who will do the printing.
The next-next step is to RabbitMQ and setup one producer and three consumers – one for files, one for directories and one for symlinks. The walk/1 method will no longer print the file info but rather send the appropriate message and let the consumers print the messages.
