Sets

An unordered collection of an arbitrary number of distinct values of a common type. Pascal, first to implement this, did it with a characteristic array. Bit vectors represented a set; allowed for fast bitwise operations on sets.

Bit vectors are only suitable for base data types with few elements. Most languages that support sets switch to hash tables for large base types. For implementing ordered sets, some languages use skip lists, a probabilistic data structure for searching through sorted lists. For unordered sets, hash tables or associative arrays are commonly used.

Pointers and recursive types

Recursive types: objects may contain one or more references to other objects of the same type. These are used to build linked data structures.

Pointers are not addresses. A pointer is a high-level concept: reference to object. Address is low-level concept: location of word in memory. Pointers can be implemented as addresses, but can be implemented in other ways, e.g., segment + offset or address + access key. Also, smart pointers in C++.

In some languages, you can only allocate new pointers when you add an object to the heap (Java).

Operations on pointers

  • allocation and deallocation of object in heap
  • assignment
  • dereference
  • dereferencing can be explicit or implicit

The behavior depends on whether language is functional or imperative, and whether it uses reference or value model for variables.

Value model of variables

There is a reference model of variables and value model of variables. Two interpretations of a variable: A variable is a named container for a value, or a variable is a named reference to a value.

l-values are expressions that denote locations. r-values are expressions that denote values. These are context-dependent.

Functional languages usually employ a reference model for names. Java uses value model for primitive types and reference model for objects.

References and pointers

Note: Pointers only needed in languages with value model. In Java, reference types only support assignments and comparison. C# contains Java-style references and C, C++ style pointers (but must be marked unsafe).

C++ References vs Pointers

A reference variable is an alias, that is, another name for an already existing variable. References are often confused with pointers but three major differences between references and pointers are:

  • You cannot have NULL references. You must always be able to assume that a reference is connected to a legitimate piece of storage.
  • Once a reference is initialized to an object, it cannot be changed to refer to another object. Pointers can be pointed to another object at any time.
  • A reference must be initialized when it is created. Pointers can be initialized at any time.

Dangling pointers

Can be addressed by using tombstones. This way, a pointer whose object has been deallocated can be identified as being dangling, since pointers strictly point to intermediate objects called tombstones instead of the object itself. Can also “tag” the objects and pointers so when a pointer gets deleted, the object tag of the object that pointer pointed to gets changed to 0 so the other pointers pointing to this object can know this.

Lists

Recursively defined by pair consisting of head element and reference to rest of list. Important in functional programming (re: Lisp).