Array: Difference between revisions

Content added Content deleted
(Clarify the contrast between numerically indexed and associative arrays)
Line 3: Line 3:


The implementation details of an array gives rise to an important distinction between '''arrays''' and '''associative arrays'''.
The implementation details of an array gives rise to an important distinction between '''arrays''' and '''associative arrays'''.
:The implementation of '''arrays''' is based on setting the bounds of indices of the array, the ''size'' of the array, allocating a contiguous region of memory to hold the elements of the array, and using simple offset calculations of the indices from the origin of the memory to access memory elements. An extension is to allow such arrays to be resized, or re-shaped, in which the memory area is adjusted, but common elements are kept.
:The implementation of '''arrays''' is based on setting the bounds of indices of the array, the ''size'' of the array, normally by allocating a contiguous region of memory to hold the elements of the array, and using simple offset calculations on the indices from the origin of the memory to access memory elements. Some languages support extensions to allow such arrays to be resized, or re-shaped, in which the memory area is adjusted, but extent elements are retained.


:By contrast an '''[[associative array]]''' maps the association between index "keys" and their associated values, generally using more complex [http://en.wikipedia.org/wiki/Hash_function hash functions] on the keys of the array to map them to their corresponding elements (by pointers, references or memory addresses of some sort). Associative arrays are referred to variously as "hashes" ([[Perl]]), "maps" or "mappings" ([[Lua]]), "dictionaries" ([[Python]]) as well as "associative arrays" ([[Awk]], [[ksh]] and others). The keys into associative arrays are normally not constrained to be integers (unlike arrays which generally required contiguous integer ranges). Different languages may impose various constraints on these keys. For example in [[Perl]] keys are evaluated as Perl "scalar" values such that keys of 1 (a literal integer), "1" (a string representing the same), 1.0 (a literal floating point which is numerically equivalent) and "1.0" (another string representing a numerically equivalent value) will all map to the same key/value in a hash. Other languages (such as Python) may treat each type of object as distinct. (See [[associative array]] for further discussion).
:In an '''associative array''' more complex [http://en.wikipedia.org/wiki/Hash_function hash functions] are used to encode the indices of the array and sophisticated hash lookup algorithms are used to map indices to their corresponding elements. The number and range of indices is not pre-set and element storage in extendible so the storage of associative arrays can grow as more indices are used. The hash functions of associative arrays usually allow types other than ranges of integers or fixed enumerations to be used as indices. A common feature is to allow arbitrary strings as indices.

:Non-associative arrays have speed and memory consumption advantages. Associative arrays have greater flexibility in types used for indexing and the range of indices.


:Non-associative arrays may have speed and memory consumption advantages. Associative arrays have greater flexibility in types used for keys and generally obviate the need to implement searches through the collection (Each component on which one would search can be implemented as a different associative array of references to their corresponding values or records).


Arrays with more than one index are called '''multidimensional''' arrays. For example, a matrix is a two-dimensional array.
Arrays with more than one index are called '''multidimensional''' arrays. For example, a matrix is a two-dimensional array.

Some languages (such as [[awk]]) do not support true arrays and merely emulate them through their associative arrays. Similarly some languages emulate multi-dimensional arrays by concatenation of dimensional indices into keys (perhaps a peculiarity of [[awk]]).


Common operations defined on arrays include:
Common operations defined on arrays include:
* Indexing: accessing an array element by its indices. (There is a one to one mapping between an index and its corresponding element).
* Indexing: accessing an array element by its indices. (There is a one to one mapping between an index and its corresponding element).
* Slicing: producing a subarray by putting some constraint on the indices. For example, [[PL/1]] provides extracting of a row or a column of an array. In [[Ada]] any range of the index can be used in order to extract a subarray from a single-dimensional array.
* Slicing: producing a subarray by putting some constraint on the indices. For example, [[PL/1]] provides extracting of a row or a column of an array. In [[Ada]] any range of the index can be used in order to extract a subarray from a single-dimensional array. In [[Python]] slices can extract any contiguous subset of an array and extended slice notation can extract elements in reversed order and/or by traversing in a given "stride" --- for example ''a[100:0:-2]'' would return every odd element from 100 to the beginning of the list: a[99], a[97], ... a[1].
* Iteration over the array's elements. Some languages have a [[Loop/Foreach|foreach loop]] construct for array iteration.
* Iteration over the array's elements. Some languages have a [[Loop/Foreach|foreach loop]] construct for array iteration, in others this must be done with conventional looping and arithmetic.
* Iteration over the indices of an associative array.
* Iteration over the indices of an associative array.
* Querying the bounds of array indices.
* Querying the bounds of array indices (determining the maximum element index of offset)
* Querying the indices of an associative array.
* Querying the indices of an associative array (determining if the collection contains a value for any given key).
* Operations on indices (next, previous, range etc)
* Operations on indices (next, previous, range etc)
* Array programming languages provide operations applied to entire arrays, so programs in such languages often lack specific index reference
* Array programming languages provide operations applied to entire arrays, so programs in such languages often lack specific index reference (for example [[APL]]).


Multidimensional arrays in which the valid range of one index depends on the value of another are called '''ragged''' (also '''jagged'''). This term comes from a typical example of a ragged array, when a two-dimensional array is used to store strings of different length in its rows. When put on paper the right margin of the output become ''ragged''.
Multidimensional arrays in which the valid range of one index depends on the value of another are called '''ragged''' (also '''jagged'''). This term comes from a typical example of a ragged array, when a two-dimensional array is used to store strings of different length in its rows. When put on paper the right margin of the output become ''ragged''.


The lower bound of non-associative arrays in many [[:Category:Programming Languages|programming languages]] is commonly fixed at either 0 ([[C]] and relatives) or 1 (Old [[Fortran]] and relatives); or an arbitrary integer ([[Pascal]] and relatives, modern Fortran). In [[Ada]] any discrete type can used as an index.
The lower bound of non-associative arrays in many [[:Category:Programming Languages|programming languages]] is commonly fixed at either 0 ([[C]] and relatives) or 1 (Old [[Fortran]] and relatives); or an arbitrary integer ([[Pascal]] and relatives, modern Fortran). In [[Ada]] any discrete type can used as an index. Zero-based indexing is best thought of in terms of the index being an offset from the beginning of the array. Thus the first element is located zero elements from this starting point. The alternative can be thought of as ordinal indexes referring to the first, second, ... and ''n''th elements of the array.


In most programming languages, arrays are accessed by using the array brackets <code>[</code> and <code>]</code>, e.g. in <code>A[i]</code>, but exceptions exist, including [[Rexx]] which instead uses the dot operator <code>.</code>, such as in <code>A.i</code>; [[Fortran]], [[Ada]] and [[BASIC]] which use round parentheses <code>A(i)</code>, and in [[LISP|lisp]] dialects which use constructions like <code>(ELT A n)</code> for accessing and <code>(SETA A n new_val)</code> for setting (Interlisp) or <code>(vector-ref A n)</code> for accessing and <code>(vector-set! A n new_val)</code> for setting (Scheme). No bracket indexing occurs in [[J]], an array language; instead, the normal syntax of function creation and function calling applies.
In most programming languages, arrays are accessed by using the array brackets <code>[</code> and <code>]</code>, e.g. in <code>A[i]</code>, but exceptions exist, including [[Rexx]] which instead uses the dot operator <code>.</code>, such as in <code>A.i</code>; [[Fortran]], [[Ada]] and [[BASIC]] which use round parentheses <code>A(i)</code>, and in [[LISP|lisp]] dialects which use constructions like <code>(ELT A n)</code> for accessing and <code>(SETA A n new_val)</code> for setting (Interlisp) or <code>(vector-ref A n)</code> for accessing and <code>(vector-set! A n new_val)</code> for setting (Scheme). No bracket indexing occurs in [[J]], an array language; instead, the normal syntax of function creation and function calling applies.