Any computer programming language is defined by various documents. Common languages such as Pascal, Fortran, and C are defined by official standards, agreed upon by international committees. Compilers will usually follow those standards fairly closely, but each compiler manufacturer will decide how to handle options left open by the international standards. Compilers usually include some additional features, not defined by the standard.
If you write your program so if follows the international standards, it should be very portable. That means it will run on many different computers with little or no change (although it will need to be compiled into different machine code for each machine). For example, your 'rocks' homework would probably run on PC's, Macintoshes, and Unix machines.
On the other hand, if you write your program so it uses the special features described in the manuals supplied by the compiler manufacturer, you can make the program more powerful, but it will only run on machines that are similar or identical to the one you developed it on. For example, your graphics homework will run only on PC's and PC clones.
In any language there will be statements that will compile and run, but whose results are not defined by the language standards or the manufacturer's manual. For example, the compiler may not force you to initialize a variable before adding things to it, or it may let you change the value of the iterating variable inside a for-loop. Testing the program may suggest what these officially undefined statements are doing in the program, but it is dangerous and bad programming to use them. Their behavior may depend on the hardware, or it may change with the next revision of the compiler or operating system, or it might even change depending on what other programs run before or at the same time as your program.
Even if you follow the standards closely, you still need to test your program. There will always be a few places where you misunderstand the standard, or the compiler doesn't follow the standard exactly, or you just make a mistake. Testing should include the ordinary cases (for example, searching for values that happen to be in the middle of the array) as well as all the special cases you can think of (such as values right at the beginning or end of the array, or values that it doesn't contain). A combination of careful adherence to the language standard and well-considered testing will ensure robustprograms -- programs that are not likely to break when the user runs them.
A pointer variable holds the memory address of other data. For example, a real number (a score, say) might be stored at memory address 32451, and the pointer variable scoreP could hold that address: 32451. Once you know where in memory something is, you can read or write to that address. (The Pascal syntax involves the special characters ^ and @ but you don't have to know the syntax for this class.)
Pointer variables are especially useful when you want the program to work with large data objects, such as graphic images or complicated records. The pointers can be held in arrays or passed to procedures instead of moving the actual data around. That makes the system run faster and it requires less memory. Note the Pascal compiler implements a procedure's var parameters as pointers.
Pointers can be declared and used in Pascal, but they are much more common in C programs. In a C program, the name of any array is a pointer to that array's location in memory. One problem with pointers, which makes C programming somewhat harder than Pascal programming, is that small errors involving pointers can cause data to be written into parts of memory that should be left alone, causing the program or machine to crash.
Linked lists are data structures for holding many records in memory. Unlike arrays, the number of records in the linked list does not need to be defined when the program is written and compiled. Instead, a single program variable holds a pointer (a memory address) to the first record. A pointer variable within the first record points to the second record, and so forth -- each record points to the next. Whenever a new record is added, it is added into the unused memory in the heap and the previous record is altered so it points to the new one. The last record in the list has a pointer to zero or a nonexistent memory address, indicating that there are no more records.
A large part of the work in computer science has involved the study of data structures and algorithms. A data structure is a way of organizing large amounts of data in memory or files. This class has introduced two common data structures: the array and the linked list (a single record isn't usually considered a data structure).
An algorithm is a detailed, step-by-step procedure for solving a problem. We looked at some algorithms in traditional arithmetic (addition by columns with carry left). We also looked at sort and search algorithms for arrays. These are simple, database oriented algorithms. Other areas of computer science have their own algorithms. For example, in graphics there are algorithms for filling in a shape with a color. In operating systems there are algorithms for checking passwords and avoiding conflicts on multiple user machines. And so forth.
Computer science algorithms are often associated with specific data structures. For example, you can do a binary search on a sorted array, but it would be difficult to do one on a linked list (because there's no easy way to find the midpoint between two records). However, there is a data structure called the Binary-Tree (B-Tree), which links records together in the heap, but which uses links that are specifically designed to support binary search.
There are many sorting algorithms used on computers. We looked at insertion sort, bubble sort, quicksort, and bucket sort. To decide which algorithm to use, you have to first investigate the data and the task: Is the data stored in an array or in some other data structure? (If it is in large files, there are other sorting routines to consider.) Is it likely to be already sorted? How often will you need to sort it? With these factors in mind, you can consider the tradeoffs between the algorithm's accuracy (bucket sort may only do a partial sort), its speed of execution on the expected data, and the time it will take you to program it.
To compare the speed of different algorithms, computer scientists use big-oh notation. For example, insertion sort is O(n-squared), bubble sort is also O(n-squared), and quicksort is O(n log-n). The big-oh notation doesn't exactly predict performance times -- insertion sort might actually be fractionally faster or slower than bubble sort. Instead, what the big-oh notation tells us is that both the bubble sort and the insertion sort algorithms will slow down at roughly the same rate as the number of items being sorted (that's n) gets larger. Quicksort will degrade much more slowly as n increases, so for very large data sets quicksort would usually be preferable to insertion or bubble sort.
Search algorithms also have big-oh values. Linear search is O(n), binary search is O(log n). Once again, these differences are most important for large data sets. If you have 1 million items in an array, a linear search will require about 1 million tests, but a binary search will only require 20 tests (because 2 to the 20 power is about 1 million -- the 'log' in big-oh notation is always log-base-2).
To display graphics on the screen, a computer treats the screen as a huge array of 'pixels'. Each pixel is a tiny square on the screen, and the video hardware can light up the pixel or leave it dark. On color screens, each pixel can be lit up in combinations of red, blue, and green -- varying the amounts of each allows any color to be generated. (Red + Blue is Purple, Red + Green is Yellow... really!).
Any language that supports graphics on a PC includes commands to light up specific pixels, which are identified by their horizontal (X) position and their vertical (Y) position. The top-left corner of the screen is position (0,0). The address of the bottom-right corner depends on how big the screen is. On the PC's in the lab, it's (639,479).
If the programmer had to specifically identify every pixel to be lit up in an image, programming graphics would be almost impossible. So programming languages also provide procedures that draw lines, circles, elipses, text, and various other shapes. The programmer supplies the endpoints or boundaries of the shape to be drawn, and the procedure figures out which pixels to turn on or off.
Sophisticated graphics programming, for 3D-animated graphics in movies or advertising, also uses procedures that decide which pixels to turn on or off. In addition to the shape of the object, these procedures consider factors such as the position of the observer, the texture of the object's surface, and the position of the light sources. They may even take into account the effects of light 'reflected' off other objects in the picture. This kind of graphics requires powerful floating point hardware (to calculate the where the rays of light are going) and large disk storage space.
Turbo Pascal is a compiled language. After you write the source code, you have to compile it before it will run. Compiling translates your Pascal source code into machine language. Once a program has been debugged, you can compile it to disk, producing an executable (.exe) file. The .exe version of the program can be run whenever you want, without recompiling.
Most major development languages are compiled (C, Fortran, Cobol). However, a few languages are interpreted. Instead of using a compiler to translate the entire program into machine language before running it, an interpreted language uses an interpreter, which translates it into machine language one line at a time while the program runs. If you save the program to disk, you simply save the source code -- the code you have written. Anyone who wants to run the program again has to have the interpreter.
Developing and editing programs is simpler if you don't need to compile them each time, especially on older personal computers where compiling a long program might take several minutes. However, because the interpreted language does the work of the compiler line-by-line at run time, the program runs considerably slower. Large loop bodies are especially problematic, because the same lines will have to be translated into machine code again and again, once every time through the loop.
The tradeoff in favor of usability and against speed and power makes interpreted languages popular for simple programs developed by end-users. Some common interpreted languages are Basic (on the old Apple II and the IBM PC), Visual Basic, and HyperCard. Batch or script files, like the autoexec.bat file on a PC, are also interpreted, as are macro languages for spreadsheets or other applications.
Another place where interpreted languages are popular is in artifical intelligence (AI) research. The Common Lisp language is interpreted, and this allows a 'smart' program to write or modify some of its own code and run it at run-time.
Although the compiled/interpreted distinction is usually associated with specific languages, it is possible to write compilers for interpreted languages, and it is sometimes possible to write interpreters for compiled languages. (It depends on what kind of cross references and dependancies there are within the program.) The Basic language, probably the best known example of an interpreted language, was originally developed as a compiled language.
Researchers in artificial intelligence (AI) use computers to simulate the thought processes of the human mind. There are two reasons for doing this. First, there are dangerous or boring or important jobs where we would like to replace or augment human intelligence with computers. But in many of those areas, people can still things better than computers: invest in stocks, manage a business, fly a fighter plane, explore a planet, invent new machines, diagnose illnesses, even clean a house. To improve the performance of computers, researchers look at how people do these jobs and try to build computers that do something similar. This is the power side of AI.
The second reason for doing AI is simple curiosity about human intelligence. Over the years, psychologists have proposed a lot of theories about problem solving, memory, vision, and other components of human thought. Computers make it possible to test some of those theories. This is done by building models -- computer programs -- that fit the descriptions of the theories, then testing those models to see if they behave the same way people behave. Even without the final testing phase, researchers have found that just building the models forces them to think more clearly about their research. This is the insight side of AI.
In AI research today, there are two major branches: symbolic and connectionist. In symbolic AI, programs represent things in the world by giving them a name and listing everything there is to know about them, much like records in Pascal or a database. The programmer has to include all the important information in the program code, for example:
(Plane (has 2 wings) (carries 200 people) (goes fast)) (Car (has 4 wheels) (carries 4 people)(goes medium)) (Horse (has 4 legs)(carries 1 people)(goes slow))Typically, each piece of information about a real-world object is true, false, or a simple number (2 wings, 4 wheels). Given this information, the program can then answer questions like, 'What carries at least 4 people and has wheels'? With a little more work it can answer things like, 'What mode of transportation should I use to go from Denver to New York City?'
Connectionist AI programs are also called Neural Networks. These programs use variables, but the variables represent the probability associated with 'features' of things in the world. For example, a connectionist program might have a wings feature, a legs feature, a number-carried feature, and a speed feature. The programmer would write a program that watched for these these features in data and kept track of how often they occurred in different combinations. Instead of telling that program exactly which features define planes, cars, etc., the programmer would give the program a data file that included a lot of examples:
Vehicle #1 has 4 wheels, carries 5 people, goes medium, called 'car'. Vehicle #2 has 2 wings, carries 200 people, goes fast, called 'plane'. Vehicle #3 has 2 wings, carries 250 people, goes fast, called 'plane'. Vehicle #4 has 4 legs, carries 1 person, goes slow, called 'horse'. Etc.The connectionist program would learn to recognize planes, cars, horses, etc., based on their features. The big advantage of connectionist programs is that they can recognize things they have never seen before. For example, the program would guess that something that goes fast and carries 300 people is, with 89.7% probability, a plane. A symbolic AI program would simply say it didn't know, because it had never seen anything exactly like that.
The history of programming language design is largely a history of attempts to find better ways to combine code and data into separate blocks that can be independantly maintained, modified, and reused. Pascal uses Procedures and Functions to declare a section of code as a separate block, which can be reused in the program wherever it is needed. Similarly, Pascal uses named Types to describe a specific kind of data (an array or a record), and those Type names can be used again and again to declare variables in the main program and in procedures.
Turbo Pascal also provides units and abstract data types. Units are files of procedures, without any program. The unit can be compiled as soon as the procedures are written. Then the procedures can be called in other programs, without including the procedure declaration itself in the program file.
An Abstract Data Type, or ADT, is a new data type and procedures to work with it, all defined by the programmer. They are called 'abstract' because all the code can be written by one programmer, and another programmer can use them as if they were originally part of the programming language, without ever looking at the first programmer's code -- the Type becomes an abstraction of the detail in that code.
For example, a programmer might define a Type called clockTimeT. The definition would include procedures for adding and subtracting hours, minutes, and seconds, so that 30 seconds plus 50 seconds would produce 1 minute 20 seconds (not 80 seconds). The code to do that would be hidden from other programmers who used the ADT. They would simply know that they could declare variables of Type clockTimeT, and that those variables could be added or subtracted or multiplied with correct results.
Defining Abstract Data Types in Turbo Pascal is described in Savitch, Chapter 8. That material is not required for this class.
Object Oriented Programming is usually implemented an extension of Pascal and similar languages, but it may be better to think of it as an entirely different approach to programming. In traditional programming, you write a program that describes, step-by-step, what you want to do. When the program runs, it starts at the beginning and works through to the end, with loops or branches depending on the data that's read in. In OOP, you define objects. These are like miniprograms, each with a while-true-do loop, and each with its own data types and private storage space in memory. The objects all run at the same time, sending messages back and forth among themselves, and responding to events (keystrokes and mouse clicks) generated by the user. Often, objects 'clone' themselves to create new objects at run-time.
In an object-oriented program, instead of storing student information in a record and running a procedure to calculate tuition for each student, you might have each student represented by a separate object. If the user clicked on 'calculate tuition' in the menu, then the menu object would broadcast a message to all of the student objects, saying: 'calculate your tuition'. Each student object would use its own procedure to calculate tuition, and it would store the result in its own private memory space. Then the user could click 'print tuition' in the menu, and the menu object would broadcast a message to the student objects, telling them to print their tuition. The student objects would have to exchange messages on their own to get themselves lined up in alphabetical order, then each would send its tuition to the printer.
The difficult part of object-oriented programming, once you learn the syntax, is deciding how to organize your program into objects. Should each student be an object, with tuition as part of its data? Or should tuition be an object, with student_name as part of its data? These questions tend to be easy to answer when some of the objects are real-world 'things', but they are much harder if you're doing OOP with abstractions like investment data or engineering analysis.