Syntax Highlighing for Erlang in NotePad++
Wednesday, July 8th, 2009Update: The definition has been updated to include support for atoms, variables and function names as well as additional file extensions. Screen shot and downloadable content have been updated.
Thus far I’ve done all of my Erlang development on Fedora using vim or KWrite (which does a decent job in Ruby mode).
But today I found myself on a windows box and wanted a basic syntax highlighting editor for Erlang that was free and worked on Windows. Oh – and not Eclipse+Erlide. I wanted something small and fast.
I grabbed the “free as in beer” and “free as in speech” editor NotePad++ and created a simple syntax file that is a bit hokey but will serve my needs fine.
Here’s a screen shot …

NotePad++ has pretty weak syntax highlighting but was sufficent to do most of what I wanted. Some regex based rules would make this a more robust.
Highlighted entities include
- Erlang reserved words (and named operators)
- Variables
- Atoms
- function names (same coloring as atoms)
- Operators
- Comments
- Kernal, stdlib, mnesia and odbc modules.
- Support for *.erl, *.hrl and *.htp extentions
I’ve probably missed several things.
Looks a lot better than nothing and it took all of 10 15 minutes.
If you are using NotePad++ here is the file:
http://www.roberthorvick.com/wp-content/uploads/2009/07/erlangSyntaxDefinition.zip
And here are the instructions on how to install it:
http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=Syntax_Highlighting_Sharing
And here’s the instructions on modifying or creating your own:
http://sourceforge.net/apps/mediawiki/notepad-plus/index.php?title=User_Defined_Languages
Startup and Cleanup functions in Erlang EUnit tests
Monday, July 6th, 2009I’ve got the basic grasp on Erlang but before I can do anything meaningful I need to learn how to test my code EUnit is the de-facto test framework. There are a ton of great sites that talk about testing with EUnit – I have linked to several at the end of this post. I’m going to focus on the problem I ran into – needing to run some startup and cleanup code around the test suite.
I started by writing a very simple dictionary that lived in it’s own process. The interface is:
start() -> ok stop() -> ok write(Key, Element) -> ok delete(Key) -> ok read(Key) -> {ok, Value} | not_found
Calling start() spawns and registers the process the dictionary lives in. Calling stop ends the process. The rest do what you expect.
I started by writing some basic tests for each function but I didn’t really think it through. The start test ran somewhere in the middle so all the tests prior to start either failed or timed out (because I was waiting on a receive that would never come since the process wasn’t started). Very quickly I smacked my forehead and said “I need the test suite to have a pre and post function to start and stop the process”.
It took about 15 minutes of reading to realize that I needed to create a test generator with a setup tuple that defined the startup, cleanup and list of tests to run.
Conceptually it was easy to understand but getting the syntax right was a royal pain. Basically you do this (I’m not sure if giving the function arity is the preferred method but it saved some keystrokes so I went with it):
my_db_test_() -> {spawn, {setup, fun start/0, fun stop/1, [ fun write_new/0, fun write_existing/0, fun read_existing/0, fun read_missing/0, fun delete_existing/0, fun delete_missing/0 ] } }.
The test methods are exactly what you’d expect so I won’t repeat them. The syntax feels a little foreign and uncomfortable but it’s very easy to understand and I totally get why it’s written this way. I’m just so used to MSTest attributes and RSpec that this new way is going to take a night or two to feel like an old friend.
Some Eunit resources
Prag Dave – Test-First Word Wrap in Erlang
The last word frequency post – from dict to ets
Friday, July 3rd, 2009One last iteration through my learning exercise of building a word frequency list. In this last post I’m moving away from a dict and to an ets table. I was pleasantly surprised how easy the conversion was. For example printing the output was just converting from dict:fold to ets:foldl. The one parity fail was that dict:update can take an initial value when the key is missing but ets:update_counter (nor any other ets function) has this benefit. This required that I write a little wrapper function to call from the list:foldl (instead of having a multi-line inlined fun).
No point in getting too deep into this – here’s the code:
-module(wordets). -export([print_word_counts/1]). words(String) -> {match, Captures} = re:run(String, "\\b\\w+\\b", [global,{capture,first,list}]), lists:append(Captures). %% reads the next line from the file. If there is data then... %% split the data into a list of words and add to the word table process_each_line(IoDevice, Table) -> case io:get_line(IoDevice, "") of eof -> file:close(IoDevice), Table; {error, Reason} -> file:close(IoDevice), throw(Reason); Data -> NewTable = lists:foldl( fun(W, T) -> update_word_count(W, T) end, Table, words(Data)), process_each_line(IoDevice, NewTable) end. update_word_count(Word, Table) -> case ets:lookup(Table, Word) of [{Word, _}] -> ets:update_counter(Table, Word, 1); [] -> ets:insert(Table, {Word, 1}) end, Table. print_words(Words) -> ets:foldl(fun({W,C}, AccIn) -> io:format("~s: ~w~n", [W, C]), AccIn end, void, Words). %% opens the indicated file, processes the contents and prints %% out the word/count pairs to stdout print_word_counts(Filename) -> {ok, IoDevice} = file:open(Filename, read), Words = process_each_line(IoDevice, ets:new(words, [])), print_words(Words).
The ets implementation feels a bit forced (which it was – the point was to learn another module). I don’t think I’d have gone this way in practice unless I wanted to persist the frequency data to a file or if the word data were more complex (for example if I were storing information about where in the file the word was, word neighbors, etc).
Enough of this sample. On to something more substantial.
Adding a custom protocol to Firefox for easy Erlang docs
Friday, July 3rd, 2009To make getting to the Erlang docs a bit quicker I added a custom protocol handler to firefox so when I go to:
I get redirected to
http://www.erlang.org/doc/man/lists.html
To do this I first created a little shell script that would take the address and spawn firefox with the appropriate url. (insert caveat about this being the first bash script I’ve written in 10 years here …).
#!/bin/bash ERLANG_TOPIC=`echo $1 | sed 's/erlang:\/\///'` exec firefox http://www.erlang.org/doc/man/$ERLANG_TOPIC.html
Next I marked it as executable:
chmod +x erlang.protocolFinally I followed these directions and added the following config settings in firefox:

Since the Erlang/OTP doc pages match their module names this works pretty well.
For queries where I’m not sure what I want I added a Erlang specific google search to my Erlang Resources page.
Word frequency redux – Erlang list comprehension, regex and list folding
Thursday, July 2nd, 2009Sean Cribbs was nice enough to point out a pair of changes I could make to my word frequency counter from last time.
Based on his feedback I made three changes. First – the regular expression code has changed from this:
matches(H,{match,M}) -> matches(H,M,[]). matches(_,[],Acc) -> Acc; matches(H,[{I,L}|T],Acc) -> matches(H,T,[lists:sublist(H,I,L)|Acc]). words(String) -> matches(String,regexp:matches(String, "[A-Za-z0-1]+")).
to this:
words(String) -> {match, Captures} = re:run(String, "\\b\\w+\\b", [global,{capture,first,list}]), [hd(C) || C<-Captures].
That last line took me a bit to grok. It’s a list comprehension (if you are reading Joe Armstrong’s thesis it is section 3.3.13. In Erlang Programming it is chapter 9.3). Basically it’s saying “for each list in the list of matches take the head of the list” – a-gigga-wah?
Ok. Let’s go to erl.
7> re:run("foo foo bar", "\\b\\w+\\b", [global,{capture,first,list}]).
{match,[["foo"],["foo"],["bar"]]}
Observe that re:run returns a nested list (i.e. a list of lists) – and each list has exactly one element (the string [which is itself a list but I'll cal them strings]). What we want to do is take that list-of-lists-of-strings and turn it into a list-of-strings.
That’s what “[hd(C) || C<-Captures].” does – it pulls every capture (a word wrapped in a list) from the match list and runs it through erlang:hd which pulls the word from the list – then it gets added to the resulting list. So we end up with a list strings.
It’s un-nesting the list.
Next Sean suggested “Then I’d probably use some kind of key-value structure, like a proplist or dict, to count the words using a lists:foldX function.”
so I fired up “erl -man lists” to learn what foldX meant (actually “foldl” “foldr” depending on whether you want to fold from the left or right.
In a nut shell folding is iterates over a list calling a fun that takes the current value and an accumulator and which returns the new accumulator. An example from the man page is:
> lists:foldl(fun(X, Sum) -> X + Sum end, 0, [1,2,3,4,5]).
15
I spent some time thinking and after some trial and error came up with this:
lists:foldl(fun(W, Dict) -> dict:update(W, fun(C) -> C + 1 end, 1, Dict) end, dict:new(), ["foo", "foo", "bar"]). %% sample input
In a nutshell – for every word in the list update the dictionary by calling the fun which increments the count value, setting the initial count to 1 if the value does not already exist in the dictionary (and starting with an empty dictionary).
After these three changes the new program is about half the size of the previous and really only has a few interesting lines surround by nearly error and flow control.
Thanks Sean!
The new code …
-module(wordlist). -export([print_word_counts/1]). words(String) -> {match, Captures} = re:run(String, "\\b\\w+\\b", [global,{capture,first,list}]), [hd(C) || C<-Captures]. %% reads the next line from the file. If there is data then... %% split the data into a list of words and add those to the word dict process_each_line(IoDevice, Dict) -> case io:get_line(IoDevice, "") of eof -> file:close(IoDevice), Dict; {error, Reason} -> file:close(IoDevice), throw(Reason); Data -> NewDict = lists:foldl( fun(W, D) -> dict:update(W, fun(C) -> C + 1 end, 1, D) end, dict:new(), words(Data)), process_each_line(IoDevice, NewDict) end. print_dict(Dict) -> dict:fold(fun(Word, Count, AccIn) -> io:format("~s: ~w~n", [Word, Count]), AccIn end, void, Dict). %% opens the indicated file, processes the contents and prints %% out the word/count pairs to stdout print_word_counts(Filename) -> case file:open(Filename, read) of {ok, IoDevice} -> Dict = process_each_line(IoDevice, dict:new()), print_dict(Dict); end.
Finding word frequencies using Erlang
Thursday, July 2nd, 2009Here’s a follow-up post that provides a better implementation.
I wanted to try and bite off something a little larger today hitting a few areas that seem generally useful:
- Basic text file operations
- Basic string operations
- Using the gb_trees module
- Avoiding any usage of lists:foreach; instead using tail recursion (that whole “thinking in Erlang” thing).
The problem is to produce the word frequency of a specific text file and to print out the frequency information. For example if a file overdunn.txt contained the text:
“Dunn was over Unger and I was over Dunn.” (Capt. Oveur, Airplane II: The Sequel)
The output would be:
Dunn: 2
I: 1
Unger: 1
and: 1
over: 2
was: 2
Word frequencies have some practical uses (fuzzy text matching, building tag clouds, etc). So while it’s a bit contrived it is the basis for something useful.
The code doesn’t need a lot of explanation – so here you go …
-module(wordlist). -export([print_word_counts/1]). %% matches/* and words/1 From: http://www.trapexit.org/Matching_Words matches(H,{match,M}) -> matches(H,M,[]). matches(_,[],Acc) -> Acc; matches(H,[{I,L}|T],Acc) -> matches(H,T,[lists:sublist(H,I,L)|Acc]). words(String) -> matches(String,regexp:matches(String, "[A-Za-z0-1]+")). %% builds a tree of word/count pairs. If the word does not exist in %% the tree it is added with an initial value of 1. If the word does %% exist the count is retrieved and incremented build_word_tree([], Tree) -> Tree; build_word_tree([W|R], Tree) -> case gb_trees:is_defined(W, Tree) of true -> Count = gb_trees:get(W, Tree), NewTree = gb_trees:update(W, Count + 1, Tree), build_word_tree(R, NewTree); false -> NewTree = gb_trees:insert(W, 1, Tree), build_word_tree(R, NewTree) end. %% reads the next line from the file. If there is data then... %% split the data into a list of words and add those to the word tree process_each_line(IoDevice, Tree) -> case io:get_line(IoDevice, "") of eof -> file:close(IoDevice), Tree; {error, Reason} -> file:close(IoDevice), throw(Reason); Data -> NewTree = build_word_tree(words(Data), Tree), process_each_line(IoDevice, NewTree) end. %% walks the gb_tree and prints each word/count pair print_tree(Iter) -> case gb_trees:next(Iter) of none -> ok; {Key, Val, NewIter} -> io:format("~s: ~w~n", [Key,Val]), print_tree(NewIter) end. %% opens the indicated file, processes the contents and prints %% out the word/count pairs to stdout print_word_counts(Filename) -> case file:open(Filename, read) of {ok, IoDevice} -> Tree = process_each_line(IoDevice, gb_trees:empty()), print_tree(gb_trees:iterator(Tree)); {error, Reason} -> io:format("~s~n", [Reason]) end.
As usual – I’m just getting started with Erlang. What is the good, bad and ugly with this code?
Aha Moments – gb_trees:update and thinking in Erlang
Wednesday, July 1st, 2009The first episode of The X-Files I ever saw was the Home. I was hooked immediately – and at the same time somewhat disappointed. This was before on-demand TV, hulu or DVD compilations on NetFlix. I had missed more than 3 seasons of a what was clearly going to be one of my favorite shows and there the only way to catch up was the hope that the next rerun would be one I hadn’t seen. Then it got all weird and aliens were showing up too often, Mulder basically disappeared and The Lone Gunman got their own spinoff …
Learning Erlang is a little like seeing the first episode. I’m hooked immediately – and at the same time somewhat disappointed. I’m years behind again. At least this time I don’t need to fret over not being able to find the reruns I want and there is very little risk that Dean Haglund will make an appearance (I bet Langly knew Erlang).
A few minutes ago I had my first significant Aha! moment. In my effort to map what I do know (procedural/OO paradigms) to Erlang I wondered “how would a balance tree be implemented in Erlang”. I binged (yeah, I said it) for “Erlang balanced trees” and quickly found the gb_trees module.
What I wanted to understand was:
- How is the tree represented in Erlang?
- How is the tree (and it’s nodes) modified (insert, update, delete, etc)?
- Can processes share the same tree (and how is concurrency handled)?
How is the tree represented in Erlang?
This took me the longest to understand – but once it fell into place everything else made so much more sense. If you are curious go ahead and open up gb_trees.erl (on my machine it’s in /usr/lib/erlang/lib/stdlib-1.15.5/src) – now go to the definition of the node and tree types. Go ahead… I’ll wait.
Found them yet? No? Maybe it’s in some header file. Feel free to check … in the meantime I’ll continue.
Ready for this? There isn’t one. You don’t need to define a tree or node type – it’s just a tuple that gets passed around and interpreted at runtime. Not quite clear? OK – if we were going to define it, it would look something like this:
{Size, {Key, Value, Smaller, Bigger}}
- Size is the number of nodes in the tree (which is used for balancing).
- Key is the key of the current node.
- Value is the value associated with Key.
- Smaller is the nodes that are smaller (i.e. “to the left”).
- Bigger is nodes that are bigger (”to the right”).
These functions really helped make it much clearer:
%%-spec(empty/0 :: () -> gb_tree()). empty() -> {0, nil}. %%-spec(is_empty/1 :: (gb_tree()) -> bool()). is_empty({0, nil}) -> true; is_empty(_) -> false. %%-spec(size/1 :: (gb_tree()) -> non_neg_integer()). size({Size, _}) when is_integer(Size), Size >= 0 -> Size.
“size” says so much it’s wonderful. Besides the obvious of getting the size of the tree it also validates that the size is greater-than-or-equal-to zero otherwise there will be an exception. This is a great example of Erlang failing fast. Try it out:
2> gb_trees:size({-1, {[]}}).
** exception error: no function clause matching gb_trees:size({-1,{[]}})
See – the clause doesn’t match and an exception is thrown. Think of the comparable C++ code. It would look something like:
int size(Tree *tree) { if (tree == null) { throw ...; } if(tree->size < 0) { throw ...; } return tree->size; }

How is the tree modified?
I think gb_trees:update is the most approachable example of this. Besides being crazy small it is a very clear example of how the tree is walked to search for the key and how error handling is meant to be done.
%%-spec(update/3 :: (_, _, gb_tree()) -> gb_tree()). update(Key, Val, {S, T}) -> T1 = update_1(Key, Val, T), {S, T1}. %% See `lookup' for notes on the term comparison order. update_1(Key, Value, {Key1, V, Smaller, Bigger}) when Key < Key1 -> {Key1, V, update_1(Key, Value, Smaller), Bigger}; update_1(Key, Value, {Key1, V, Smaller, Bigger}) when Key > Key1 -> {Key1, V, Smaller, update_1(Key, Value, Bigger)}; update_1(Key, Value, {_, _, Smaller, Bigger}) -> {Key, Value, Smaller, Bigger}.
Notice first that since update is never changing the size it does not bother to pass it to update_1 – only the “node” portion of the tuple is passed.
Once passed there are four possible options – three of which need to be codified (in the same order as the code):
- The provided key is smaller than the current tree node. In this case build up a new result and call update_1 with update_1(Key, Value, Smaller) – “walk left”
- The provided key is larger than the current tree node. Walk right.
- The provided key is exactly the current node key (not too big … not to small … just right!). In this case build up a new node with the new value.
- (implicit) The provided key is not in the tree. Throw an exception.
In the case where the node is not found update_1 ends up being called like:
update_1(Key, Value, nil)
Since the match on #1-3 require a non-nil tuple this turns into a missing function clause error.
Can processes share the same tree?
No. No they can’t. I was having a little trouble groking this until I ran through some samples in erl. This is what made it clear for me:
1> Tree = gb_trees:empty().
{0,nil}
2> Tree2 = gb_trees:insert("Foo", "Bar", Tree).
{1,{"Foo","Bar",nil,nil}}
3> Tree3 = gb_trees:update("Foo", "NewBar", Tree2).
{1,{"Foo","NewBar",nil,nil}}
4> Tree4 = gb_trees:update("Missing", "value", Tree3).
** exception error: no function clause matching gb_trees:update_1("Missing","value",nil)
in function gb_trees:update_1/3
in call from gb_trees:update/3
Each time anything was done a new tree was created – the old tree was still valid and could be used but it was not the new tree. It’s the old tree. This boggled me for a few minutes until Justin Sheehy confirmed what the code was saying.

So no – multiple processes could not possibly share the tree because the tree is immutable.
Want proof? Go back to erl and run gb_trees:get on the key “Foo” on Tree, Tree2 and Tree3. Tree will error (it’s an empty tree). Tree2 will find “Bar” and Tree3 will find “NewBar”.
14> gb_trees:get("Foo", Tree).
** exception error: no function clause matching gb_trees:get_1("Foo",nil)
15> gb_trees:get("Foo", Tree2).
"Bar"
16> gb_trees:get("Foo", Tree3).
"NewBar"
Aha!
