# Data Structures and Algorithms

### Feedback from assignment 1

1. #### Toolbuilding

If you are to be productive software engineers, you must get yourselves into the habit of building tools. Even the most trivial piece of code costs something to
• design,
• write and
• test and debug.
You must learn to write for re-use: the main benefit is in the time saved in test and debug - if it's been tested thoroughly before, you can re-use it with confidence!

In this assignment, you needed lists of random integers (or - as at least one of you worked out - any integers) for testing. So make a function which constructs such lists (in the simplest way possible of course!), e.g.

```int *ConsIntList( int n )
{
int *list = (int *)malloc( n*sizeof(int) );
int i;
for(i=0;i<n;i++)
list[i] = rand();
return list;
}
```

This function can be called as many times as needed to generate test lists of appropriate sizes. Once written and checked, it allows you to concentrate on other aspects of the problem.

Timing is a similar problem: you are presented with the problem that each machine has a slightly different method of telling the time. So build a generic timing function that returns a time to the best accuracy that the current machine will allow, e.g.

```double TimeNow()
{
double t;
#ifdef IRIX
/* Irix code to calculate t in secs */
#endif
#ifdef SUNOS
/* Sun code  to calculate t in secs */
#endif
return t;
}
```
Note that this function is double: it is designed to return seconds. This allows it to get the maximum resolution on any machine.

The #ifdef's are for serious programmers who want their code to run without alteration on any machine: each compiler will predefine some suitable string which will allow it to be identified. You will need to check the manuals for individual compilers to find out what the correct values for these strings are.

One of you made a starttime()/stoptime() pair and used a static variable to store the time when the starttime function was called. A good idea, which can be done effectively in a single function:

```/* timer.c */
static double last = 0.0;

double TimeSinceLastCall()
{
double t, diff;
#ifdef IRIX /* calculate t in secs as above */
..
#endif
diff = t - last;
last = t;
return diff;
}
```
This function can be called at the start of a timed section of code (and the return value thrown away) and again at the end of the timed section. The second return value is the running time of the timed section.

2. #### Efficiency

The assignment specification required you to run the same test for quite a few values of n. So you should have built a function which took n as an argument and called it from a simple loop in main so that one run would get you all the results - once everything was working!

This is akin to the "toolbuilding" approach above.

3. #### Errors

Unix is a time-sharing OS: even if you're apparently the only user on the machine, many things are usually happening. At the minimum, there's the clock interrupt that keeps the machine's time correct. Usually, there'll also be printer daemons (which wake up periodically to check whether you want to print anything), network daemons (which look to see if anyone else is trying to log in to this machine), etc, etc. Even the simplest operating system (eg DOS) will have timer interrupts to perturb your benchmark timing!

As a consequence, very accurate timing is impossible. If you're expecting a straight line (and you can usually transform your results so that they should be a straight line), a statistician would take the plot and determine the correlation coefficient, which would give a measure of the probability that the results deviate from a straight line (taking into account experimental error). However, you can just run a straight line through your results and if all the points lie close to it and appear randomly on both sides, then it's a straight line. With a little extra effort, you could use your calculator - or get a spreadsheet - to calculate the standard deviation from the mean of the normalised results. This will give you a simple measure of whether your results lie on a straight line or not.

Having determined the form of your results, it was essential that you report your final conclusions in the O notation. This is the standard way of describing the performance of an algorithm.

Reporting the slope of the straight line is a nice touch (especially if you add the correlation coefficient - which tells how good a straight line it is!), but your ultimate conclusion should still be O(1) or O(n).

The alert among you noted that the AddToCollection routine appeared to be O(n) because of the FindInCollection routine in the post-condition. Having noted that, it's a simple matter to remove it and re-run your experiment. You could either simply comment it out - or better - read the man page for assert and discover that if you recompile with:

gcc -DNDEBUG *.c

all the assert's will be automagically removed!

6. #### Designing experiments

Many people searched their collections for the same items as the ones in the list used to construct the collection in the same order! So the first item is found after 1 operation, the second after 2, etc (or the reverse for LIFO lists). Duplicates complicate the analysis and reduce the times. A better design either searched for an item that was guaranteed not to be in the list (to give worst case time complexity) or generated a new random sequence of items to search for. In the second case, it's important to ensure a low probability that the randomly generated item will be found, ie the range of the random numbers must be much greater than the number of items.

#### Users of DOS/Windows machines

Please make sure that the extra CRs needed by DOS, etc, are removed before you submit. Unix files should have LFs only at the end of a line.

The same applies to reports produced by your word processor: export them as plain text without the extra CRs.

Reports which are not plain text will not be accepted - there are a large number of word processors out there: it's much more productive (and therefore beneficial for you!) if the tutors spend time marking your report's content than trying to work out which WP will read it!