No More Array Errors (Part II) - Array_alg(3C++)

Array_set insertion

The first Array_set operation whose implementation is interesting from an Array Algorithms perspective is Array_set insertion, which we will develop from the following outline:

        const T* insert(const T& t,int count=1){
           check();
           T* result = 0;
           if(count>0){
                grow b if necessary    if( t is not already present ){
                    insert t into b    }
           }
           check();
           return result;
       }

First note that insert() calls check() twice: once on entry and once again on exit. The entry call is made to protect insert() against errors made by other Array_set operations; the exit call is made to make sure that insert() hasn't introduced any errors of its own. As specified in Set(3C++), a successful insertion returns a pointer to the new element, while an unsuccessful insertion returns zero.

Before performing the actual insertion, we must make sure that b has at least one free cell. Stated another way, we must make sure that b[n] refers to a free cell. This is precisely what the Block reserve() function does:

        const T* insert(const T& t,int count=1){
           check();
           T* result = 0;
           if(count>0){
               b.reserve(n);
               if( t is not already present ){
                    insert t into b             }
           }
           check();
           return result;
       }

To determine whether an element equal to t is already present in b, we can use the Array Algorithm bin_loc(), one of two algorithms described in bin_loc(3C++):

        const T* insert(const T& t,int count=1){
           check();
           T* result = 0;
           if(count>0){
               b.reserve(n);
               const T* temp = bin_loc(t,&b[0],&b[n]);
               if(temp == &b[0] || *(temp-1) != t){
                    insert t into b             }
           }
           check();
           return result;
       }

Because bin_loc() does not modify its input array, its parameters are declared as pointers-to-const. Its result is also a pointer-to-const because bin_loc() would otherwise provide a way for clients to breach constness without generating a compiler warning. bin_loc() returns a pointer-to-const to the leftmost element (that is, the element with lowest array subscript) greater than t; stated another way, bin_loc() tells where t belongs if it is not currently present. The if-condition is true precisely when t is missing, so temp is the desired return value:

        const T* insert(const T& t,int count=1){
           check();
           T* result = 0;
           if(count>0){
               b.reserve(n);
               const T* temp = bin_loc(t,&b[0],&b[n]);
               if(temp == &b[0] || *(temp-1) != t){
                   result=(T*)temp;
                    insert t into b             }
           }
           check();
           return result;
       }

Notice that before we can assign to result, we must cast away the constness of temp. This will be necessary anyway, since the actions that follow must assign to *result.

Finally, t must be stored in the cell pointed to by result. To free this cell for assignment, we must first the elements in locations result, result+1, ... b+n-1 one cell to the right (that is, toward higher array indices). Rather than writing a loop to do this, we use the Array Algorithm copy(), described in copy(3C++):

       const T* insert(const T& t,int count=1){
           check();
           T* result = 0;
           if(count>0){
               b.reserve(n);
               const T* temp = bin_loc(t,&b[0],&b[n]);
               if(temp == &b[0] || *(temp-1) != t){
                   result = (T*)temp;
                   copy(result,&b[n],result+1);
                   n++;
                   *result = t;
               }
           }
           check();
           return result;
       }

copy() uses an algorithm that works even when arrays overlap (as they do in this example), or when arrays are empty (which would be the case if result==&b[n]). In fact, copy() is the only Array Algorithm that handles overlapping arrays correctly.

copy() illustrates another Array Algorithms convention, namely, that the second pointer identifying an array is omitted whenever its value can be deduced. In the case of copy(), the second array (result+1) must have the same size as the first array, so a pointer to the beginning of the second array is enough to identify it.

The version of insert() just developed is admittedly long and complicated. We developed this version intentionally in order to elucidate important points about Array Algorithms. Now that these points have been made, let's look at a much simpler version.

The simpler version uses set_insert(), one of several Array Algorithms that ``understand'' how to treat arrays as sets. set_insert() is described in set_insert(3C++). The other algorithms that know how to treat arrays like sets are described in set_diff(3C++), set_inter(3C++), set_sdiff(3C++), set_remove(3C++), and set_union(3C++).

        const T* insert(const T& t,int count=1){
           check();
           T* result=0;
           if(count>0){
               b.reserve(n);
               result=set_insert(t,&b[0],&b[n]);
               if(result)n++;
           }
           check();
           return result;
       }

set_insert() expects a sorted array containing no duplicates (we are assured of both these conditions by virtue of the invariant). It inserts t only if the element is not present and returns a pointer to the new element or zero if no insertion occurred. set_insert() uses a binary search to find the insertion point and then moves elements to the right (that is, toward higher array indices) to make room for the new element. In other words, it does everything our complicated implementation of insert() did. Since set_insert() changes the array, its parameter and result types are ordinary pointers (not pointers-to-const), so casting is unnecessary.

To convince yourself that set_insert() is no more expensive to use than the combination of bin_loc() and copy() used earlier, compare the COMPLEXITY sections of the respective manual entries. Each entry gives an order estimate that can be used for gross estimation, plus formulas that can be used to compute actual operation counts. First, consider the order estimates for our three algorithms: bin_loc() is O(lgN) and copy() is O(N); their combination is therefore O(N). Since set_insert() is also O(N), we can't make a choice based on order estimates alone. Probing deeper, we find that bin_loc() ``performs at most lgN tests of the ordering relation'' and copy() does ``exactly N assignments'' (where, in the first expression, N is the size of the original array and in the second expression, N is the number of elements copied to make room for the new element). set_insert(), on the other hand, does ``at most N assignments and at most lgN tests of the ordering relation'' (where N is interpreted as in the other specifications). We therefore conclude that the costs of the two implementations are equal. We choose set_insert() because it is easier to use.