IC211 (Spring 2020)

Introduction

What is this course about? Is it "Object-Oriented Programming" or "Java"?

This course is titled Object-Oriented Programming so, not surprisingly, we're going to be learning about object-oriented programming (also referred to as "OOP"). But OOP is a programming paradigm not a programming language. So to actually write programs we need to use a language that supports this programming paradigm. The particular language we'll use is Java. So you'll be learning this new language called Java, but also learning a bigger set of ideas and practices called OOP, which are present in many languages.

Object-Oriented Programming

There is one overarching idea in programming — from small programs, to gigantic systems: Separation of Interface from Implementation i.e. separating what you need to know in order to use a tool from how the tool actually works on the inside. This is behind good coding, it's behind good system design, it is behind the design of protocols like TCP, HTTP and DNS, it's behind almost anything that involves computing! This separation allows us to manage the complexity of building big programs. It allows us to reuse in a new program code that was written in some other time and place for some other purpose (and to avoid duplicating code within a program, which is a sin!). It allows us to make changes to some of our code without having to worry about other parts of the program breaking. It makes collaboration easier and less error-prone. It brings many, many other benefits. In short: separating interface from implementation is a really good thing.

In procedural programming, which is what we did in IC210, the language mechanism that supported separating interface from implementation was the function: a function's prototype (plus some documentation, if we're lucky) is the interface, its definition is the implementation. Scoping rules actually prohibit us from accessing the local variables and parameters inside the function's body, so this separation of interface from implementation isn't just conceptual — it's actually enforced by the compiler! However, the function mechanism on its own is by no means enough to achieve the nirvana of full separation of Interface from Implementation. Why?

It doesn't help with structs. You have to understand how the designer of code intends to manipulate structs to store data in order to use that code. That means understanding implementation details!
You can't modify or extend the behavior of existing functions or existing collections of functions and structs without messing with the function/struct definitions, which means you have to understand those implementation details!
You're limited to a single implementation for each interface (a single function definition for each prototype), which is a real problem. As a simple analogy, that would be like saying each kind of light-bulb has to have a different socket, which would be ... difficult. Instead, we live in a world in which the same socket works for 40watt or 60watt bulbs, for incandescent bulbs or for fluorescent bulbs or LED bulbs or — more interestingly — black-light bulbs. For many kinds of programming problems we want the same thing: many different implementations of the same interface.

Object-oriented programming is a programming paradigm — that means not merely a language but a whole philosophy on what a program is — with the goal of complete separation of interface from implementation. It's basic ideas are

Combine structs and the functions that manipulate those structs into a single entity — such entities are called objects — that offers true separation of the interface to your struct+functions from its implementation. This is said to provide encapsulation and data-hiding.
Note: this addresses Point 1 from above.
Use a mechanism called inheritance to create new kinds of objects that modify or extend old kinds of objects without having to open up the implementation (in some cases without even having access to the implementation!).
Note: this addresses Point 2 from above.
Use inheritance and a mechanism called polymorphic function calls to allow multiple definitions of the same interface.
Note: this addresses Point 3 from above.

To sum up: The Procedural Programming paradigm sees a program as functions calling functions, with data being passed around as arguments to or return-values from function calls. The Object-Oriented Programming paradigm sees a program as a bunch of data+function bundles (the objects) that communicate by calling each other's member functions.

Introduction to Java: But we already know a language...

Some programming languages support object-oriented programming more than others. C++ supports OOP well (it was designed to), as were other major languages, such as Java, C#, and Python.

In this course, we will be using Java. Why Java? Knowing multiple languages will help you better understand programming. Java is in many respects easier than C++, and it comes with many built-in features that we can explore. You should expect to learn or at least become familiar with 8-10 languages while you are an undergrad. Once you have learned the first 2-3, you will find it very easy to pick up new ones.

Why do we need more than one language? Different languages are good for solving different problems. Each has its strengths and weaknesses. The cost model of software development has driven the direction of language development for the last few decades:

Early computers were expensive, but programmers' time was cheap. In 1964, a Burroughs 205 mainframe cost $5,000,000 (worth $36,500,000 in 2012 dollars). This was a popular computer for universities. The median wage in 1964 was $6,000 (worth $43,000 in 2012 dollars).
Modern computers are cheaper than programmers's time. You can buy a high-end desktop for under $1,000. The median wage for a software engineer is currently $88,000.

High-level languages are fast to develop because they hide the messy and time-consuming bits of programming, like tracking registers and allocating memory. The trade-off is that these languages take more clock-cycles to run each command. Thanks to Moore's Law, we get faster machines each year, which makes high-level programming feasible. The graph below shows an artist's interpretation of the relative distribution of various languages between time to develop and time to run. (NOTE - this is just to give you a rough idea of the relative speeds, there is no data to back up the exact position of any language on the graph):

Video games are often written in multiple languages. The graphics code in a first-person shooter must render a million polygons onscreen, at 30 frames per second. That code needs to be extremely efficient. It is probably written in C and/or Assembler. The high-level code that controls a character's actions are often written in a high-level scripting language such as Lua. In general - code that runs many times per second needs to be efficient; code than runs once can take as many clock cycles as it needs.

There is one other reason why we see multiple programming languages: personal preference. The computer industry is full of highly-opinionated people who insist that their favorite language/OS/editor/browser/etc. is better than all the others. We call these arguments "religious wars". If you show the above graph to any software professional, it will almost certainly spawn an argument as to where each language should appear on the chart.

Java History

Java is a programing language developed by Jim Gosling and Bill Joy at Sun Microsystems in the 1990s. Sun wanted to be the leader in "intelligent" consumer devices. They wanted every bit of electronics you have to be controlled by downloadable programs that would do things like model your behavior and do things for you (e.g. to have your refrigerator tell you that you are out beer.) They began to design a programming language that could be used for these goals. It had to:

Be robust. Your coffee maker should not crash.
Be portable. Code that runs on one microwave needs to be able to run on a microwave from another manufacturer, easily and without compilation.
Be safe. It should be difficult to develop malicious code to take over your appliances like some B horror movie.

The smart appliance market did not work out as a business model for Sun. There were too many technical challenges and faulty assumptions about the market.

Then along came the world wide web. These same requirements were needed for the Internet. Java finally found its market.

How is it robust?

Java gets rid of pointers. Pointer errors are the source of most programming bugs in C/C++. (No more Seg Faults!)
Java uses garbage collection. Unused memory is automatically freed, so you never have memory leaks. Programmers do not have to allocate and deallocate memory themselves. (No more delete!)
Java streamlines programming. It's easy to mess up in C++ because it carries all sorts of baggage from C. (e.g. .c_str()) Java's simplicity and readability make development and maintenance cheaper.
Java has similar syntax to C++. This lets existing programmers learn it easily.

How is it portable? And safe?

Unlike C++, Java runs on a virtual machine called the JVM (Java Virtual Machine).

Remember, C++ is compiled and linked, which results in an executable file only appropriate for that computer's architecture. Java is compiled into bytecode, which is identical no matter which machine it is compiled on. This bytecode is then interpreted by the Java Virtual Machine and run. So, anybody with a JVM (like the one your computer is always telling you to update) can run the bytecode.

But what is a virtual machine? A virtual machine is software capable of running code that appears to that code to be a specific physical computing environment. The diagrams below illustrate how this works:

The diagram above shows two different operating systems represented as puzzle pieces. A program will only run if its lower edge fits perfectly into the OS piece. The shape of the edge represents the libraries that must be linked at compile-time. For example, the "stdio" library that handles basic print and keyboard functions is very different between Windows and Linux.

The diagram above show the same program (Firefox) compiled for the two different Operating Systems. This program is written in C++. The program seems the same to the user (each one serves web pages) but are very different under the hood. The version compiled on Windows will not run on linux, and vice-versa.

The diagram above shows how the Java Virtual Machine fits between the application and the Operating System. OpenOffice is written in Java. You can download one copy of this program that will run on either Windows or Linux. The Java Virtual Machine is a program written in C++. It must be compiled specifically for either Windows or Linux, the same as Firefox. Once it is running, however, it can run any Java program, regardless of the underlying OS. That makes Java code more "portable" than C/C++.

Life is full of tradeoffs... this portability makes Java generally a little slower than C/C++. However, advancements in the JVM have made the speed difference quite small, and it more depends on the type of computations you are doing.