Introduction:

RBTree is a TCL extension which adds a new data type to TCL.

The new data type is a red-black tree.  Roughly speaking, this is a cross between an array and an ordered list.  It can efficiently find a value in the tree, like an array.  But it can also find the closest value to value that is not in the tree, or all the values between two values, like an ordered list.  Most operations are O(log(n)).  It can provide an alternative to lsort which more appropriate in interactive programs.

This data type is implemented as a new TCL data type.  It may be stored in a variable, passed to or from a procedure, converted to a string, etc.

New procedures are added to access this data type.  They can give the appearance of a map (i.e. an array, or a keyed list), a multimap, a set, or a multiset (i.e. a bag).

The functionality is similar to extended TCL's keyed lists, but this library offers two key advantages. This library is thread safe.  And the performance for this library is better, as shown below.

This package includes source code.  It has been tested with various versions of Linux and gcc.  It should also work with MS Windows with MS Visual C++ 5.0, however I have not tested that recently.

Cost:

This software is free under the same license as TCL.

System requirements:

This extension requires TCL version 8.0 or later.  Get the latest version of TCL from http://www.tcl.tk/.

Build instructions:

Unix (gcc):

Unzip the source files then type
cc -fPIC tcl_rbtree.c rbtree.c -Wall -shared -o rbtree.so -O2
on the command line.  Copy rbtree.so to an appropriate place.

MS Windows (MS Visual C++):

Unzip the source files then type
cl /DUSE_TCL_STUBS tcl_rbtree.c rbtree.c /I<tcl include dir> <tcl lib dir>\tcl81.lib /GD /LD -o rbtree.dll /O2 -DMS_VC /W3 /link /NODEFAULTLIB:MSVCRT
on the command line.  Copy rbtree.dll to an appropriate place, such as C:\winnt.
If you need to work with version 8.0 of TCL, try this, instead.
cl tcl_rbtree.c rbtree.c /I<tcl include dir> <tcl lib dir>\tcl80.lib /GD /LD -o rbtree.dll /O2 -DMS_VC /W3

Use instructions:

Start by loading this extension
load ./rbtree.so
or
load rbtree
or
package require RBTree
then use the commands listed below to create and manipulate trees.

Most commands take a variable name, where the variable contains the tree, rather than passing the tree directly as an argument.  This was done for efficiency.  If exactly one variable contains the tree, the tree will be modified in place.  If a command took a tree as an argument and returned a tree as an argument, but the tree was stored in a variable, a copy of the tree would be made unnecessarily.  This is analogous to saying incr x rather than set x [expr {$x + 1}], except that the penalty for copying a tree is worse than the penalty for copying an integer.  It is perfectly acceptable to copy trees into variables, procedure arguments, procedure return values, list elements, array elements, etc.

Except as noted below, all actions require O(log (n)) time where n is the number of elements in the tree.

Implicit commands:

Copy:

Copying a tree takes O(n) time.  TCL's normal copy-on-write semantics apply.

Convert to string:

Converting a tree to a string takes O(n log(n)) time.  The format may change in the future.  Currently the format is set up to match the arguments of array set and array get, but possibly with one extra item.  If the sort type is not ascii, then the sort type will be appended to the end of the list.

Convert from string:

Converting a string to a tree takes O(n log(n)) time.  The format might change in the future, but will always be the same as the format for convert to string.

Convert from list:

Converting from a list is slightly faster than converting from a string.

Convert to list:

Converting a tree to a list is not recommended.  It will cause shimmering.

Destruction:

When the reference count of the tree goes to 0, it will decrement the reference count of each item in the tree.  That takes O(n) time.

General commands:

The following commands apply to all trees, no matter how they are used.

tree::create ?sort_by?

The create command creates and returns a new tree.  sort_by must be "integer", "real", or "ascii".
"ascii" sorts the keys in case sensitive alphabetical order.  This is the default value for sort_by.  It sorts like strcmp, except that the null character is not treated as a special character.  ascii allows anything to be a key, so an ascii tree can replace a TCL array. 
"integer" and "real" sort the keys in numeric order.  If you try to insert a key which is not a valid integer or real number, that will cause the operation to fail.  In addition to changing the sort order, these options will improve the performance of the tree.
%set my_tree [tree::create ascii]

tree::find

This command searches the tree for a key matching the specified criteria.  This command only looks at keys.
%set tree [tree::create integer]
%tree::find - tree NONE
NONE
%tree::find + tree NONE
NONE
%tree::set insert tree 2
%tree::set insert tree 4
%tree::set insert tree 6
%tree::find - tree NONE
2
%tree::find + tree NONE
6
%foreach key {1 2 3 4 5 6 7} {
    foreach compare {< <= == >= >} {
        puts -nonewline "[tree::find $compare tree $key _] $compare $key     "
    }
    puts {}
}
_ < 1     _ <= 1     _ == 1     2 >= 1     2 > 1
_ < 2     2 <= 2     2 == 2     2 >= 2     4 > 2
2 < 3     2 <= 3     _ == 3     4 >= 3     4 > 3
2 < 4     4 <= 4     4 == 4     4 >= 4     6 > 4
4 < 5     4 <= 5     _ == 5     6 >= 5     6 > 5
4 < 6     6 <= 6     6 == 6     6 >= 6     _ > 6
6 < 7     6 <= 7     _ == 7     _ >= 7     _ > 7

tree::find - var ?error_value?

This command returns the smallest key in the tree.  If the tree is empty, error_value is returned.  If error_value is needed but not supplied, the operation will fail.

tree::find + var ?error_value?

This command returns the largest key in the tree.  If the tree is empty, error_value is returned.  If error_value is needed but not supplied, the operation will fail.

tree::find < var key ?error_value?

This command returns the largest key which is less than the given key.  If no such key exists in the tree, error_value is returned.  If error_value is needed but not supplied, the operation will fail.
Roughly speaking, this function returns the key which comes immediately before the given key.  But it works even if the given key does not exist in the tree.

tree::find > var key ?error_value?

This command returns the smallest key which is greater than the given key.  If no such key exists in the tree, error_value is returned.  If error_value is needed but not supplied, the operation will fail.
Roughly speaking, this function returns the key which comes immediately after the given key.  But it works even if the given key does not exist in the tree.

tree::find == var key ?error_value?

The command returns the key which is equal to the given key.  If no such key exists in the tree, error_value is returned.  If error_value is needed but not supplied, the operation will fail.
Note that, if the given key is found in the tree, the key in the tree is returned, not the key given on the command line.
%tree::find == tree 4
4
%tree::find == tree 04
4
%tree::find == tree 0x4
4

tree::find <= var key ?error_value?

This command returns the largest key which is no larger than the given key.  If no such key exists in the tree, error_value is returned.  If error_value is needed but not supplied, the operation will fail.
Roughly speaking, this command is the same as tree::find == tree key [tree::find < tree key error_value].  It tries to find an exact match, but settles for the next lower value if there is no exact match.

tree::find >= var key ?error_value?

This command returns the smallest key which is no smaller than the given key.  If no such key exists in the tree, error_value is returned.  If error_value is needed but not supplied, the operation will fail.
Roughly speaking, this command is the same as tree::find == tree key [tree::find > tree key error_value].  It tries to find an exact match, but settles for the next lower value if there is no exact match.

tree::previous var key ?error_value?

This command is obsolete.  Use tree::find < instead.

tree::next var key ?error_value?

This command is obsolete.  Use tree::find > instead.

tree::closest var position key ?error_value?

This command is obsolete.  Use tree::find <= or tree::find >= instead.

View commands:

There are four different ways to view a tree.  There is only one underlying tree data type, so you may mix and match these views on the same tree.  But be careful; not all cases are defined for all views.  One view may put a tree into an unexpected state which may confuse another view.  That may cause an immediate error, or that may cause results which vary from one version to the next.

tree::map

A map looks just like the underlying tree structure, and a lot like a TCL array.  A map can be used to simulate / inspect all the other views, but it may be slower in some cases.

tree::map exists var key

This function returns the number of instances of key in the map.  Like an array, each key can only exist 0 or 1 times in a map.
%set my_map [tree::create ascii]
%tree::map insert my_map 1 one
%tree::map insert my_map 2 two
%tree::map insert my_map 2 II
%tree::map exists my_map 1
1
%tree::map exists my_map 2
1
%tree::map exists my_map 3
0

tree::map insert var key data

This procedure inserts a mapping between the given key and the given data into the tree.  If there is already a mapping from the given key, that mapping is deleted before the new mapping is added.

tree::map delete var key

This procedure deletes the mapping from the given key, if one exists.  It returns the number of mappings deleted.  For a map, this will always be 0 or 1.
%tree::map delete my_map 2
1
%tree::map delete my_map 3
0
%tree::map exists my_map 1
1
%tree::map exists my_map 2
0
%tree::map exists my_map 3
0

tree::map foreach var key_var data_var ?from to? body

This procedure iterates over the elements of the mapping in order.  For each pair, it assigns the key to key_var and it assigns the data to data_var then it executes the body.  key_var and data_var may each be the null string, in which case it is ignored.
%set new_map [tree::create ascii]
%tree::map insert new_map 1 one
%tree::map insert new_map 2 two
%tree::map insert new_map 3 three
%tree::map insert new_map 5 five
%tree::map insert new_map 6 six
%tree::map foreach new_map key data {puts "$key -> $data"}
1 -> one
2 -> two
3 -> three
5 -> five
6 -> six
If from and to are specified, the loop will efficiently skip all keys before from and after to.  From and to do not have to be in the map.  If to comes before from, the body will not be executed at all.  If from is the same as to, and that element is a key, then the body will be executed exactly once.
%tree::map foreach new_map key data  4 6 {puts "$key -> $data"}
5 -> five
6 -> six
break, continue, error, and return act just like they would with the built in foreach command.
This command copies the tree before executing the body for the first time.  The body is free to modify the tree variable as it chooses.  That will not affect the number of times that the loop is executed or the values of the key_var or data_var.
%tree::map foreach new_map key data {puts "$key -> $data";tree::map insert new_map $data $key}
1 -> one
2 -> two
3 -> three
5 -> five
6 -> six
%tree::map foreach new_map key data {puts "$key -> $data";tree::map insert new_map $data $key}
1 -> one
2 -> two
3 -> three
5 -> five
6 -> six
five -> 5
one -> 1
six -> 6
three -> 3
two -> 2
foreach takes O(m log(n)) time where m is the number of iterations, and n is the total number of elements in the map.  If from and to are selected such that nothing is executed, there is still a cost of O(log(n)) time.  Note the performance penalty for modifying the tree while iterating over it.

tree::map keys var

This returns a list of all the keys in the map, in order.
%tree::map keys new_map
1 2 3 5 6 five one six three two

tree::map data var

This returns a list of all the data associate with the keys in the map.  The data items are ordered by their keys.
%tree::map data new_map
one two three five six 5 1 6 3 2

tree::map value var key ?error_value?

Given a key this looks up and returns the corresponding data.  If the key is not in the map, error_value is returned.  If error_value is required, but not specified, an error is signaled.
%tree::map value new_map one {I don't know!}
1
%tree::map value new_map five {I don't know!}
5
%tree::map value new_map {five and a half} {I don't know!}
I don't know!
 

tree::map empty var

This returns 1 if the map is empty, or 0 if the map is not empty.  This takes O(1) time to execute.

tree::map count var

This returns the number of keys in the map.  This takes O(n) time to execute.

tree::set

A set is a simplified version of a map.  Use a set when the data doesn't matter, only the keys.  A set is implemented as a map, where each key maps to itself.

tree::set exists var key

This returns the number of times that the given key exists in the given set, which will always be 0 or 1.

tree::set insert var key

This inserts the given key into the given set.

tree::set delete var key

This deletes the given key from the given set, if it exists.  It returns the number of elements deleted, which will always be 0 or 1.

tree::set foreach var key_var ?from to? body

This works just like tree::map foreach, except that the data_var is omitted because it is meaningless for a set.

tree::set keys var

This returns a list of all the keys in the set, in order.

tree::set empty var

This returns 1 if the set is empty, or 0 if the set is not empty.  This takes O(1) time to execute.

tree::map count var

This returns the number of items in the set.  This takes O(n) time to execute.

tree::multiset

A multiset is similar to a set, except that each key can appear more than once.  It is implemented by a map which maps members of the multiset to the number of times which they exist in the multiset.  The multiset type was introduced to this extension for performance reasons.  It is analogous to adding incr to TCL.  It is slightly better, because all unknown keys are assumed to be zero, whereas it is an error to use incr on an uninitialized variable.

tree::multiset exists var key

This returns the number of times that the given key exists in the given multiset, which will always be a natural number.

tree::multiset insert var key

This inserts the given key into the given multiset.

tree::multiset delete var key

This deletes one copy of the given key from the given set, if it exists.  It returns the number of elements deleted, which will always be 0 or 1.

tree::multiset foreach var key_var ?from to? body

This works just like tree::set foreach.
%set my_bag [tree::create ascii]
%tree::multiset insert my_bag 1
%tree::multiset insert my_bag 2
%tree::multiset insert my_bag 2
%tree::multiset insert my_bag 3
%tree::multiset insert my_bag 3
%tree::multiset insert my_bag 3
%tree::multiset foreach my_bag item {puts $item}
1
2
2
3
3
3

tree::multiset keys

This returns a list of all the keys in the multiset, in order.
%tree::multiset keys my_bag
1 2 2 3 3 3

tree::multiset empty var

This returns 1 if the multiset is empty, or 0 if the multiset is not empty.  This takes O(1) time to execute.

tree::multiset count var

This returns the number of items in the multiset.  This takes O(n log(n)) time to execute, where n is the number of unique elements.
% tree::multiset insert tree a
% tree::multiset insert tree a
% tree::multiset insert tree b
% tree::multiset insert tree a
% tree::multiset insert tree b
% tree::map count tree
2
% tree::multiset count tree
5

tree::multimap

A multimap is a set of mappings where keys may be repeated.  Just as a multiset is really a mapping from keys to integers, where the default value is 0 and insert does an incr, multimap is really a mapping from keys to lists, where the default value is the empty list, and insert does an lappend.  Again, this could be done directly with a map, but this is more efficient.
 

tree::multimap exists var key

This returns the number of mappings from the given key in the given multimap.  This will be a natural number.

tree::multimap insert var key data

This inserts a mapping from the given key to given data in the given multimap.  This will not remove any mappings.

tree::multimap delete var key

This will try to delete one mapping from the given key in the given multimap.  This will return the number of deletions made, which will be 0 if the key does not exist, or 1 if it does.  It is not specified which mapping will be deleted if more than one exists.

tree::multimap foreach var key_var ?from to? body

This works just like tree::map foreach.

tree::multimap keys var

This returns a list of all keys in the given mapping.  They are sorted.  Each key appears once for every mapping.

tree::multimap data var

This returns a list of the data in each mapping.  The mappings are processed in the same order as in tree::multimap.

tree::multimap value var key ?error_value?

This works like tree::map value.  If there is more than one mapping from the given key, one is chosen arbitrarily.
Multiple identical calls to this function will return the same value.  Furthermore, a call to delete will delete the value which this function returned. (So you can iterate by asking for a value then deleting that value.)  Once the tree has been modified, there is no guarantee of what value will be reported next.  Also, the selection of values may be implementation dependent and may change in the next version.

tree::multimap empty var

This returns 1 if the multimap is empty, or 0 if the multimap is not empty.  This takes O(1) time to execute.

tree::multimap count var

This returns the number of items in the multimap.  This takes O(n log(n)) time to execute, where n is the number of unique keys.
% set tree {}
% tree::multimap insert tree 2 a
% tree::multimap insert tree 2 b
% tree::multimap insert tree 3 c
% tree::multimap insert tree 3 d
% tree::multimap insert tree 3 e
% tree::multimap insert tree dup f
% tree::multimap insert tree dup f
% tree::multimap insert tree dup f
% tree::multimap count tree
8
% tree::multimap keys tree
2 2 3 3 3 dup dup dup
% tree::multimap data tree
a b c d e f f f
% llength [tree::multimap data tree]
8

Performance:

Motivation:

The primary goal of this extension was to make a simple hybrid of the array and list types.  It should never be significantly worse in performance than the better of these two.  When an array or list is the ideal structure, that data structure can be 2-3 times faster than the red-black tree (i.e. set roman(1) I or lappend my_sorted_list $new_last_value).  In other cases the red-black tree can more appropriate and is faster than an array or a sorted list.
Another goal of this extension is to provide a way to keep data sorted in an interactive program.  lsort will typically run faster than a red-black tree if all the values are known in advance and the list is sorted only once.  But if the data is constantly changing, and lsort must be called multiple times, then the red-black tree will do better.  Finally, a red-black tree can give better user response time than lsort because the work is spread out rather than all in one command.

O(log(n))

Big "O" notation describes the way that the performance of a function degrades as the input grows.  Roughly speaking, if you time an operation on a tree of 100 items, and the operation takes O(log(n)) time, then you'd expect the same operation to take about twice as long on a tree of 10,000 items, three times as long on a tree of 1,000,000 items, etc.  In fact, in the case of a red-black tree with 100 items, most of the time is spent in overhead tasks.  So the larger trees would take less time than described above.
The average person would take O(log(n)) time to find a phone number in a phone book with n names.  A reverse lookup--trying to find the name given a phone number--would take O(n) time.

Case study:

Recently someone asked the comp.lang.tcl newsgroup what was the best way to count the number of occurrences of an item in a list, without knowing in advance what items were in the list.  People posted a variety of responses, each attempting to be faster than the next.  An exact winner was not clear because so many factors could change the timing.
I solved the same problem with a multiset.  This was clearly faster than any of the standard TCL solutions.  This problem just looked more like a multiset than like any standard TCL data structure.  More importantly, my solution was simple.  The most obvious of the standard TCL solutions is by far the worst.  People spent a lot of time profiling and optimizing the other solutions.  The timing of the optimized solutions changes drastically if you change the number of distinct items in the example.
The code for this case study is listed below.  count_members1 is the obvious standard TCL solution.  count_members6 is the red-black tree solution.  The others are various optimized solutions that use only standard TCL.  All code except the red-black tree solution came from various contributors to comp.lang.tcl.
proc count_members1 list {
    foreach member $list {
        if [info exists count($member)] {
            incr count($member)
        } else {
            set count($member) 1
        }
    }
}
proc count_members2 list {
    foreach x $list {
        if {[catch {incr count($x)}]} {set count($x) 1}
    }
}
proc count_members3 list {
    foreach x $list {
        expr {[catch {incr count($x)}] && [set count($x) 1]}
    }
}
proc count_members4 list {
    foreach x $list {
        lappend ulist($x) {}
    }
    foreach name [array names ulist] {
        set count($name) [llength $ulist($x)]
    }
}
proc count_members5 list {
    foreach x $list {append ulist($x) .}
    foreach name [array names ulist] {
        set count($name) [string length $ulist($x)]
    }
}
proc count_members6 list {
    set count [tree::create ascii]
    foreach x $list {
        tree::multiset insert count $x
    }
}
# build a list of 10,000 items
set items [list john paul jones mary]
for {set i 0} {$i<10000} {incr i} {
    lappend data [lindex $items [expr {int(rand()*[llength $items])}]]
}
 
puts "[info patchlevel] over $tcl_platform(os) $tcl_platform(osVersion)."
foreach proc [info proc count_members*] {
    puts ""
    puts "$proc"
    puts [time {$proc $data} 10]
}

Compared to keyed lists:

While this library offers simial functionality to keyed lists, the performance is signficantly different on large lists.
% package require RBTree
1.2.0
% package require Tclx
8.4
% set itree [tree::create integer]
integer
% set tree {}
% set keyd {}
% time {for {set i 1} {$i < 10000} {incr i} {expr {$i*2}}}
5549 microseconds per iteration
% time {for {set i 1} {$i < 10000} {incr i} {tree::map insert itree $i [expr {$i*2}]}}
17671 microseconds per iteration
% time {for {set i 1} {$i < 10000} {incr i} {tree::map insert tree $i [expr {$i*2}]}}
29244 microseconds per iteration
% time {for {set i 1} {$i < 10000} {incr i} {keylset keyd $i [expr {$i*2}]}}
265642 microseconds per iteration
% time {for {set i 1} {$i < 100000} {incr i} {expr {$i*2}}}
51731 microseconds per iteration
% time {for {set i 1} {$i < 1000000} {incr i} {tree::map insert itree $i [expr {$i*2}]}}
2009698 microseconds per iteration
% time {for {set i 1} {$i < 100000} {incr i} {tree::map insert tree $i [expr {$i*2}]}}
327876 microseconds per iteration
% time {for {set i 1} {$i < 100000} {incr i} {keylset keyd $i [expr {$i*2}]}}
46953868 microseconds per iteration
%
etc.
Relative times (smaller is better):
10 items1,000 items10,000 items100,000 items
No storage0.00130.10251.00009.3226
Integer tree0.00210.40193.184559.0874
String tree0.00320.48845.2701362.1730
Keyed list0.00350.789947.87208461.6810

Limitations:

Revision history:

Version

Comments

1.0

Initial release.

1.0.1

The source code is now more portable.  The distribution now includes a precompiled .DLL for MS Windows.

1.1

Added the ability to sort by integers, real numbers, or strings.  The source code is now written in C, not C++.  The command tree::find was added.  The precompiled .DLL is now built with the stubs library, and should work for more versions of TCL.

1.1.1

Minor changes to the documentation.  No changes to the source or object code.

1.1.2

Changed to a less restrictive license.  Minor changes to the documentation and the comments in the source code.  No changes to the object code.

1.2

  • The sort type is now optional in both the tree::create command and the implicit create from string operation.
  • This now calls Tcl_PkgProvide so it can be loaded as a package.
  • Added the "empty" and "count" funcitons.
  • This version is thread safe.
  • This version no longer contains a precombiled DLL for Windows.  Download RBTree-1.1.2.zip if you need that.
  • Minor changes to avoid picky compiler warning messages.

Contact information:

Author:

This software was written by Philip Smolen.
Personal Info
Work Info
If you like and/or use this extension, please let me know.

Download:

http://www.trade-ideas.com/users/phil/rbtree/