# Data Structures and Algorithms

### Feedback from assignment 2

1. #### Toolbuilding

Many of you failed to take note of the suggestion about building tools for the timing routine in the feedback for the last assignment. You need to get into the habit of building transportable, re-usable tools, so that you can concentrate on the more difficult aspects of the next problem, rather than repeating machine-specific code in every program!

2. #### Post-Conditions

No-one seems to have noticed that the post-condition for AddToCollection should have been augmented with a "CollectionSorted" assertion! This is simply done with an auxillary routine:
```int CollectionSorted( collection c ) {
int i;
for( i=0;i<(c->item_cnt-1);i++)
{
/* item i should be less-than or equal item i+1 */
if( !(memcmp(ItemKey(c->items[i]),
ItemKey(c->items[i+1]), c->size) <= 0) )
return FALSE;
}
return TRUE;
}
```

3. #### Ordering Items

The memcmp( ItemKey(..), ItemKey(..), size ) approach to defining the order of items makes some assumptions about the structure of the item's key. A much more useful approach is to provide an ItemCmp function, which can provide a general ordering function. This function can order the items according to any rule - not just one which assumes lexicographical ordering of keys of fixed size.

Note that if you use one ordering function when adding, you must use exactly the same function when searching.

4. #### Adding on the key

When adding, quite a few of you compared items using the value of item which is the handle (C pointer) for the object. You must use ItemKey and memcmp as in bin_search (or better, an ItemCmp function).

5. #### Normalising

The computer can calculate log n too! You need to include the <math.h> header and compile with:
gcc ..... -lm
The -lm loads the math library. Some trivial modifications to your program enable one run to do the experiments and process the results for you too!

Even for a very simple insertion sort, some of you produced code with double loops, multiple compare calls, etc. Simple, proven code for standard problems like this can be found in almost all texts on data structures and algorithms! Spend some time reading, before hacking!
7. #### Designing experiments

• Think first

When searching, most of you searched an n-item collection n times. The best experiments searched each collection - no matter how big it was - a constant, m ( !=n ), times. Then all you have to do is find a value of m large enough to give a time greater than the timer resolution for the smallest collection. This enables you to find quite accurate times even for small collections (use m >> n). This helps when trying to verify log relationships as you need to have values of n spanning quite a wide range of n, because dlogn/dn decreases with n. Some of you were 'misled' when T(n)/logn appeared to become constant. In fact to verify a log relationship a set of logarithmically spaced values of n are best: n = 1000, 2000, 4000, 8000, 16000, 32000, ... would have given a clear result!

• Perturbing results

When designing any experiment, it's important to eliminate all sources of error possible. Here you are timing certain operations (adds, finds, ..): make sure that you are timing only those operations! Thus all unnecessary code should be removed.

• This includes function calls - put the timing code inside the AddAll, etc functions.
• Remove all the code that was originally there to verify the functions, eg the if ( ip == &list[i] ) after the FindInCollection call.
These things don't perturb your results by very much, but you can eliminate these sources of error trivially, so why not do it?

As many of you will have found, the cache on a modern processor causes your results to 'creep' slowly towards the predicted result. If you're doing the experiment carefully and want to ensure that you've actually proven your hypothesis, then you will need to eliminate all potential sources of error!

#### Users of DOS/Windows machines

Please make sure that the extra CRs needed by DOS, etc, are removed before you submit. Unix files should have LFs only at the end of a line.

The same applies to reports produced by your word processor: export them as plain text without the extra CRs.

Reports which are not plain text will not be accepted - there are a large number of word processors out there: it's much more productive (and therefore beneficial for you!) if the tutors spend time marking your report's content than trying to work out which WP will read it!