Category:Smalltalk

From Rosetta Code
Language
Smalltalk
This programming language may be used to instruct a computer to perform a task.
Execution method: Compiled (bytecode)
Garbage collected: Yes
Parameter passing methods: By value
Type safety: Safe
Type strength: Strong
Type expression: Implicit
Type checking: Dynamic
See Also:
Listed below are all of the tasks on Rosetta Code which have been solved using Smalltalk.

Smalltalk-80 is an object-oriented, dynamically typed, reflective programming language. It was designed and created in part for educational use, more so for Constructivist teaching, at Xerox PARC by Alan Kay, Dan Ingalls, Ted Kaehler, Adele Goldberg, and others during the 1970s, influenced by Sketchpad and Simula.

The language was generally released as Smalltalk-80 and has been widely used since. Smalltalk-like languages are in continuing active development, and the language has gathered a loyal community of users around it.

Smalltalk-80 is a fully reflective system, implemented in itself. Smalltalk-80 provides both structural and computational reflection. Smalltalk is a structurally reflective system whose structure is defined by Smalltalk-80 objects. The classes and methods that define the system are themselves objects and fully part of the system that they help define. The Smalltalk compiler compiles textual source code into method objects, typically instances of CompiledMethod. These get added to classes by storing them in a class's method dictionary. The part of the class hierarchy that defines classes can add new classes to the system. The system is extended by running Smalltalk-80 code that creates or redefines classes and methods. In this way a Smalltalk-80 system is a "living" system, carrying around the ability to extend itself at run-time.

Implementations

Over time, various implementations ("dialects") of Smalltalk have appeared, some of which target different audiences and/or focus on particular applications.

Their internal implementation (evaluation mechanism) may also differ radically, from bytecode interpretation, just-in-time compilation, dynamic optimizing recompilation to cross-language translators (Smalltalk-to-C, Smalltalk-to-JavaScript, Smalltalk-to-Java). There are also systems which use a mix of static precompilation and dynamic recompilation.

Dialects differ in their class libraries, although a common subset exists which is ANSI standardized. Some class libraries are huge, containing 100k+ methods in the base system alone, others are minimalistic.

Most differences are in their graphic and windowing interface libraries (which are typically wrappers to underlying OS facilities, the window system or a browser).

Is it an IDE, a Scripting Language, a Compiled Language or what?

Well, all of it. Most Smalltalks, when started come up with their own builtin IDE to work inside the so called 'image'. That is the state of all objects (not only classes!) which can be dumped at any time and restarted later. The image includes editors, compilers, apps and everything.

But many Smalltalks can also be started without a UI in a scripting mode (usually by a command line argument), and will then behave like a classic REPL-based scripting language, reading and evaluating expressions (it should be emphasized that this is probably the least efficient use of Smalltalk, as you'll miss all the fancy IDE support...).

There is also (at least one) Smalltalk, in which you can work in the image (with incremental change and just-in-time compilation of entered code), but where you also generate stand alone pre-compiled executables for apps.

Spirit

Smalltalk consist of both the programming language and a more or less standardized library of classes. Most Smalltalks also include a sophisticated IDE, which supports editing, compiling, refactoring, debugging, tools, source code revisioning and packaging/deployment. With a few exceptions, working in a traditional editor (vi, emacs) or general purpose IDE (eclipse) is usually not recommended, as these cannot make use of all the reflective and dynamic coding features and will make the programmer less productive. For example, typical Smalltalk debuggers allow for code to be changed while the program is running (or halted at a breakpoint), allowing for a "programming in the debugger" style of coding. Code is usually developed and tested incrementally, with breakpoints on unimplemented parts, which are filled as encountered. This makes Smalltalk a perfect environment for rapid prototyping and experimental development.

Smalltalk environments are open - almost every aspect of the language and IDE can be changed, enhanced and customized. This includes object representation, metaclass and reflection facilities, language syntax, exception handling and the IDE itself. For this reason, Smalltalk has traditionally been a testbed for new language features, mechanisms and patterns.

Smalltalk is a very high level language, which almost completely hides any underlying machine representation. Implementation details such as integer size, pointer size, byte order etc. are transparent. The numeric tower is highly polymorphic, allowing for transparent use of arbitrary large integers, fractions, floats, complex numbers etc.

Smalltalk is a very late bound language. Object types, classes, references etc. are resolved at execution time, and can be dynamically changed.

Language Syntax

Smalltalk has a very simple and highly orthogonal syntax, which is based exclusively on message sending (aka pure virtual function calling in C-parlance). The language syntax and semantic only consists of literal (compile-time) constants, variable bindings, message sends and block closures (lambda expression objects). Control flow, loop constructs, exception handling etc. are defined as messages to booleans, blocks or by passing blocks as arguments. This makes it trivial and common practice to add new control constructs.

Grammar

This is more of an informal description (1); a formal (and correct) description is found eg. in the ANSI spec. or via the link at the right side.

Note: typical Smalltalk systems do not compile source code from files, but instead have an IDE embedded (i.e. emphasis on "integrated") and provide browsers and other tools to manipulate source code in the running system.

Different dialects use different schemes for source code management and various source code exchange formats exist: XML based/VisualWorks, zip-archive Monticello, GitHub file based (tonel), etc.).

One common interchange format which all systems support is the original bang-separated "chunk format", which consists of Smalltalk expressions (!) separated by "!" characters. These expressions, when read and evaluated, will reconstruct the original classes, methods or arbitrary objects.

Therefore, you will not find any syntax for class declarations in the BNF below: classes are constructed by evaluating expressions like "Object subclass:'NameOfNewClass'" etc. This is very similar to the way Lisp or Scheme code is loaded by a read-eval-print loop.

Lexical Tokens (should/cannot be used as message names) (2)

               ":="           assignment
               "ˆ"            return from method
               "(" ")"        parentheses for grouping in expressions
               "."            period; statement/expression separator
               "|"            vert. bar; var/arg declaration separator
               ";"            cascaded message (NOT a statement separator)

Comment

               "..."          any text in double quotes
               "/ ...         an EOL comment (not all dialects support this)
               "<<TOKEN       an token comment (not all dialects support this)
               ...
               TOKEN

Predefined (reserved) identifiers

               "self"         pseudo variable (receiver of current message)
               "super"        pseudo variable (for super sends)
               "thisContext"  current continuation (stack frame)
               "nil"          refers to the singleton instance of UndefinedObject
               "true"         singleton instance of True class
               "false"        singleton instance of False class

Syntax (in Pseudo BNF)

Text in double quotes are lexical tokens.
"[..]" means: optional.
"*" means: repeat (0..n)

method          ::= <methodSpec> <methodBody> 

methodSpec      ::= <unarySelector>
                    | <binarySelector> <argName>
                    | <keyword> <argName> [ <keyword> <argName> ] *

methodBody      ::= [ <localVarDecl> ] <statementList>

localVarDecl    ::= "|" [ <varName> ]* "|"

statementList   ::= [ <statement> [ "." <statement> ] *

statement       ::= <expression>
                    | "ˆ" <expression>.  // return from method (3);

//
// expressions
//
expression      ::= <assignment>
                    | <messageSend>

assignment      ::= <variable> "::=" <expression>

messageSend     ::= <singleMessage>
                    | <cascadedMessage>

singleMessage   ::= <unaryMessage>
                    | <binaryMessage>
                    | <keywordMessage>

unaryExpression ::= <unaryMessage> 
                    | <primary>

binaryExpression ::= <binaryMessage> 
                    | <unaryExpression>

keywordExpression ::= <keywordMessage> 
                    | <binaryExpression>

unaryMessage    ::= <unaryExpression> <unarySelector>.   (4)

binaryMessage   ::= <binaryExpression> <binarySelector> <unaryExpression>  (4)

keywordMessage  ::= <binaryExpression> <keyword> <binaryExpression> [ <keyword> <binaryExpression> ]*

cascadedMessage ::= <messageSend> ";" <unarySelector> (5)
                    | <messageSend> ";" <binarySelector> <unaryExpression>
                    | <messageSend> ";" <keyword> <binaryExpression> [ <keyword> <binaryExpression> ]*                   

primary         ::= <variable>
                    | <block>
                    | <literal>
                    | <constructedArray>
                    | "(" <expression> ")"

<variable>      ::= (<underline>|<letter>)[<underline>|<letter>|digit]*

<block>         ::= "[" [ <blockArgs> "|" [ <statementList> ]

constructedArray ::= "{" [ <expression> [ "." <expression> ]* "}"

//
// selectors (message names)
//
selector         ::= < unarySelector> | <binarySelector> | <keywordSelector>

unarySelector    ::= (<underline>|<letter>)[<underline>|<letter>|digit]*

binarySelector   ::= <anyAllowedNonDigitNonLetterChar>*

keywordSelector  ::= <keyword> [ <keyword> ]*

keyword          ::=  (<underline>|<letter>)[<underline>|<letter>|digit]*":" 
                      // i.e. a "word" immediately. followed by colon

//
// constants
//
literal        ::= <number>
                    | <stringConst>
                    | <symbolConst>
                    | <characterConst>
                    | <arrayConst>
                    | <byteArrayConst>

number         ::= <integerConst>
                    | <floatConst>
                    | <fractionConst> (6)
                    | <scaledDecimalConst> 
               
stringConst          ::= "'" [ <anyChar> | <doubledQuote> ]* "'"

doubledQuote         ::= "'" "'"

symbolConst          ::= "#" <stringConst>
                        | '#' <selector>

characterConst       ::= "$" <anyChar>

floatConst           ::= [<sign>] <digits> [ "." <digits> ] ["e" [<sign>] <digits> ] (7)

integerConst         ::= [ <radix> "r" ] [<sign>] [ <digit> ]*

radix                ::= <digit>[<digit>]     // value in [2..36]

fractionConst        ::= [<sign>] "(" <integerConst> "/" <integerConst> ")"

scaledDecimalConst   ::= [<sign>] <digits> "s" <scaleDigits>

arrayConst           ::= "#" <arrayElementList>

arrayElementList     ::= "(" [ <arrayElement> | ]* ")"

arrayElement         ::= <literal>
                       | <arrayElementList>  // i.e. you can omit the "#" for arrays of arrays

byteArrayConst       ::= "#[" [ <byteArrayElement> ]* "]"

byteArrayElement     ::= <integerConst>     // must be in 0..255

Notes:
(1) typed in from memory. No warranty whatsoever for correctness.

(2) some dialects allow eg. "|" , "ˆ" or "#" to be used as message selector or as part of a message selector. For portable code, these should not be used. For details, consult the specific dialect's documentation.

(3) a return always returns from the current method. Especially if a return statement is inside a block (!).

(4) left to right

(5) a cascade message is "another message sent to the previous receiver".

(6) the syntax for fraction constants is the same as an expression to create one (i.e. a parenthesized message send of "/"). Thus, implementations which doe not support fraction constants will evaluate the fraction at execution time (usually, the jitter will detect that constant expression and optimize it away)

(7) most dialects allow for multiple precision floats (i.e. single precision, double precision, extendend precision IEEE) to be specified using different expo characters (eg. "1e5" vs. "1f5" vs. "1q5")

Language Semantic

Variables and Scope

Variables are used to name objects. Actually they are bindings of a name to an object. Objects can only be passed and used by reference, and access to an object's internals is not possible from outside the object's class methods, except though getters and setters; although most classes inherit from Object, which provides some reflection protocol to access instance variables or to query for the class. However, such inherited reflection mechanisms could be overwritten in a class or a class could not inherit from Object, to prevent even access from debuggers and inspectors.

Smalltalk is lexically scoped, and outer variable bindings are closed over when a block (=lambda closure) is created.

A rich scoping hierarchy exists, which consists of:

  • inner block locals
  • inner block arguments
  • outer block locals
  • outer block arguments
  • method locals
  • method arguments
  • instance variable slots
  • inherited instance variable slots
  • class variables (static, shared with subclasses; visible in the class-hierarchy only)
  • class instance variables (per-class-instance; private to the class)
  • pool variables (for constant pools; must be explicitly "imported" to be seen)
  • namespaces / globals (for classes to be known by name)

Objects

In Smalltalk, every object is an instance of some class. Objects are only be referred to by reference. Everything is an object, including integers, booleans, nil (the UndefinedObject), classes, stack frames (continuations), exception handlers and code (block closures/lambda closures and methods). There are no builtin primitive types, which are not part of the class hierarchy, or which cannot be changed/enhanced by the user.

Message Sending

Message sends are dynamically resolved, by letting the receiver of the message determine the method (aka code to run). The message consist of a selector (= name of the message) and optional arguments.
The message syntax as: <lang smalltalk>receiver selector</lang> a unary message, without arguments.
In a C-like language, this would be written as "receiver.selector()".

<lang smalltalk>receiver part1: arg1 part2: arg2 ... partN: argN</lang> a keyword message; the selector consists of the concatenation of the keyword parts: 'part1:part2:...partN:'.
In a C-like language (assuming that colons are allowed in an identifier), this would be written as "receiver.part1:part2:...partN:(arg1, arg2,... argN)".

<lang smalltalk>receiver op arg</lang> a so called binary message. The selector 'op' consists of one or more special characters, such as '+', -', '@' etc. These are actually syntactic sugar, especially to make arithmetic look more familiar (i.e. instead of "rcvr add: foo" we can write "rcvr + foo").
These would look the same in a C-like language, although almost any non-letter-or-digit character is allowed as operator in Smalltalk.

The precedence rules are unary > binary > keyword, thus <lang smalltalk>a foo: b bar + c baz</lang> is equivalent to <lang smalltalk>a foo:( (b bar) + (c baz) )</lang>

As messages have no semantic meaning to the compiler (especially, the '+', '*' and other binary messages), the usual precedence rules for arithmetic expressions are not present in Smalltalk. Thus, complex expressions consisting of multiple binary messages usually need to be parenthesized (or are parenthesized for readability).

Message lookup is done by traversing the superclass chain, looking for a class providing an implementation (method) for the messages selector. The standard defines single inheritance, with a lookup based on the receiver's class only. However, some Smalltalk implementations allow for that lookup to be redefined and provide more sophisticated mechanisms (selector namespaces, lookup objects, lookup based on argument types etc.).

If no implementation is found (i.e. no class along the superclass chain provides a corresponding method), the original selector and arguments are packed into a container and a doesNotUnderstand: message is sent instead. This is used for error handling, but can also be used for message forwarding (proxies), delegation or dynamic creation of new code. The default implementation of doesNotUnderstand: raises an exception, which can be caught and handled by the program. Unhandled exceptions typically open a debugger (unless an UnhandledException-exception handler was defined).

Metaclass Hierarchy

Classes themself are objects and as such instances of some Metaclass. As classes define and provide the protocol (=set of methods) for their instances, metaclasses define and provide the protocol for their instances, the corresponding class. Every class has its own metaclass and as such can implement new class-side messages. Typically, instance creation and utility code is found on the class side. Most Smalltalk dialects allow for the metaclass to specify and return the type of compiler or other tools to be used when code is to be installed. This allows for DSLs or other programming language syntax to be implemented seamlessly by defining a metaclass which returns a compiler for a non-Smalltalk. Typical examples for this are parser generators (tgen, ometa, petite parser), data representation specs (asn1, xml etc.) and languages (smallRuby, graphical languages in squeak etc.)

Being objects, classes and metaclasses can be created dynamically, by sending a #subclass:... message to another class, or by instantiating a new metaclass.

Exception Handling

Smalltalk protects itself completely from any invalid memory references, null-pointer, array bounds or unimplemented message situations, by raising a corresponding exception which can be caught by the program. An unhandled exception leads to the evaluation of a default handler, which usually opens a symbolic debugger, which is always part of the Smalltalk environment. Deployed end-user and production systems usually redefine this handler to show a warning, end the program or dump the state of the system.

In contrast to most other exception handling systems, Smalltalk exceptions are both restartable and proceedable. The exception handler is evaluated with the calling chain of continuations (stack frames) still being active and alive. The handler is free to repair the situation and proceed. Proceedability allows for queries and notifications to be implemented easily using exceptions; for example, a compiler's warnings or error correction queries can be implemented this way, without a need to pass in logger objects or other state. Also situations like missing files (OpenError) can be resolved in the handler, by opening a dialog, asking the user for a replacement and proceed.

The exception framework is implemented as part of the class library, and open for change and enhancement.

Examples

As smalltalk code seems to be hard to read for programmers with a C background, some examples are provided below.

Comments

<lang smalltalk>"a comment - everything in between double quotes"</lang> <lang smalltalk>"/ an end-of-line comment can have comments here (dialect specific [1])</lang> <lang smalltalk>"<<TOK a token comment (dialect specific [1])

 ... can have any comment
 ... or EOL comment here
 ... up to a line starting with TOK

TOK</lang> 1) these are not supported out-of-the-box by all dialects; however; since even the Parser is open for extension, it is trivial to add support to all systems, by changing the comment reader in the token scanner code.

Literals

<lang smalltalk>true false "the two Boolean singleton objects"

nil "the UndefinedObject singleton object"

1234 "integer constant; an instance of Integer"

1234567890123456789012345678901234567890 "a rather large integer constant"

16rFF00 "integer constant, base 16"

2r101010 "integer constant, binary base 2"

3r1210210 "integer constant, ternary base 3"

1.234 "float constant; an instance of Float"

1.23e4 "another float constant"

2r1010.110001111 "another float constant (base 2)"

(1/7) "fraction constant; an instance of Fraction"

123s3 "scaled decimal constant with precision (somewhat dialect specific [1])"

(1+5i) "a complex"

$a "character constant"

$界 "another character constant"

'hello' "string constant; an instance of String, a collection of Characters"

'öäü こんにちは世界' "string constant; unicode is supported by most implementations"

  1. 'foo' "symbol constant; similar to symbols in lisp, two symbols are identical if they are equal"
  1. foo "symbol constant; quotes can be omitted, iff the symbol does not contain special chars"
  1. + "symbol constant; quotes can also be omitted, iff the symbol represents a binary message name"
  1. (1 true $a 16rFF 1.0 (10 20)) "array literal constant; the last element being another array; an instance of Array"
  1. ( (a 1) (b 2) (c 3) ) "array literal constant; 3 elements, each being a two element array;
                       inside an array constant, symbols can be written without the # prefix"
  1. [ 10 20 2r1000 16rFE ] "byte-array constant; an instance of ByteArray"
  1. f32( 10.0 20.0 ) "float32-array constant; an instance of FloatArray (dialect specific [1])"

[ ... some code ... ] "a block; an instance of BlockClosure (name is dialect specific);

                      the object represents the piece of code which can be passed around
                      and evaluated later (also known as 'lambda closure')"

[:a1 ... :aN | ... some code ...] "a block with arguments."</lang>

1) not supported by all dialects. If missing in a particular dialect, a Parser extension is required.

Variables

All variables are actually 'bindings' of a name to a value (technically: holding a pointer to an object). Thus, the lifetime of an object is not related to the scope or lifetime of such a binding; although the VM's garbage collector will eventually free the underlying object, when no more binding or other reference exists. This is similar to eg. Java or JavaScript and different from eg. C++, where the scope may or may not imply construction/destruction of an object.

Globals / Namespaces

By convention, global and namespace variables are to be used for classes only. As in every other language, it is considered extremely bad style to use globals for anything else. They are created by telling the namespace to add a binding, as in <lang smalltalk>Smalltalk at:#Foo put:<something></lang> The binding's value can be retrieved by asking the namespace: <lang smalltalk>x := Smalltalk at:#Foo</lang> or simply by referring to the binding by name (if the corresponding namespace is visible in the scope of the code): <lang smalltalk>x := Foo</lang> As seen above, the global named Smalltalk refers to the Smalltalk namespace, which is globally visible. Technically, there is a binding to itself inside the Smalltalk namespace, and the Smalltalk namespace is visible everywhere by default.

Class Variables (Statics)

These are visible inside a class and shared with all of its subclasses. They have the lifetime of the defining class. The same binding is shared with all subclasses, thus the value is seen in the defining class and all of its subclasses. Typically, these are used for constants. Class variables are defined in the class definition message when the class is instantiated or redefined by setters to the class.

Instance Variables

These are the slots where the private state of an object is held. Instances have the same structure (layout), but each has its own values. The instance layout is defined in the class definition message when the class is instantiated or redefined by setters to the class.

Class Instance Variables

This is something not known in most other languages: these are per-class instance variables. be reminded that classes are also objects, and have their own private state. I.e. like with instance variables, the layout is shared with subclasses, but each subclass has its own values. This is typically used to hold on singletons, private per class caches, or private per class redefinable constants. The class instance layout is defined in the class definition message when the class is instantiated or redefined by setters to the class's meta class.

Method Arguments and Locals

These are visible inside a single method and all of its enclosed blocks (lambda closures). Args are specified in the method's first definition line, locals are defined by listing the names between vertical bars at the beginning of a method:<lang smalltalk>| varA varB ... |</lang>

Block Arguments and Locals

This are visible inside a block and all of its enclosed blocks. Args are specified by listing them prefixed by a colon, following a vertical bar, followed by optional local variables: <lang smalltalk>[:arg | ... code here...]. "a one-arg block" [:a1 :a2 :a3 | ... ] "a three-arg block" [ |local1 local2| ... ] "a no-arg block with 2 locals" [:a :b :c | | l1 l2 l3 | "3 args (a,b,c) and three locals (l1,l2,l3)"</lang> Blocks can be nested, and inner blocks can refer to any statically visible outer scope's variable.

Pool Dictionaries

Are additional namespaces (name-value bindings), which are explicitly "imported" to make all their bindings visible. Used to share constants or singletons (or even classes) between cooperating classes, which are not inheriting from a common superclass.

Private Classes

Not supported by all dialects (but can be simulated with namespaces or pool dictionaries). These are bindings to classes which are only locally visible.

Be reminded that all those bindings are dynamic structures, which can be created or changed at run time. This makes it trivial to generate code on the fly, add domain specific languages, anonymous or super private classes etc.

Also, the mapping from method names (called "message selectors") to code (called "methods") is also held in a dictionary inside classes. This can also be changed dynamically or reflected upon by asking the class for a corresponding binding.

Special "builtin" Pseudo Variables

<lang smalltalk>self "refers to the current receiver"

super "for super sends (to call the method in a superclass)"

thisContext "refers to the current context (stack frame/continuation) as an object"</lang>

Message Sends

<lang smalltalk>1000 factorial "send the 'factorial' message to the integer receiver"

a factorial even "send the 'factorial' message to whatever 'a' refers to,

                     then send 'even' to whatever that returned"

a + 1 "send a '+' message, passing the argument '1' to whatever 'a' refers to"

(a + 1) squared "send '+' to 'a', then send 'squared' to whatever we get from it"

a , b "send the (binary) ',' message,

                     which performs collection-concatenation (arrays, strings, etc)"

arr at:1 put:'foo' "send the 'at:put:' message to 'arr',

                     passing two arguments, the integer '1' and a string"

a > b ifTrue: [ a print ] "send the 'ifTrue:' message to whatever 'a > b' returned (a boolean, usually),

                          passing a block closure as argument. 
                          The implementation of boolean will either evaluate 
                          or not evaluate the passed block's code"

a > b ifTrue: [ a ] ifFalse: [b] "send 'ifTrue:ifFalse:' to the object returned by 'a > b',

                                 passing two block closures as arguments. 
                                 The 'ifTrue:ifFalse:' method will evaluate one of them 
                                 and return that block's return value as its own return value"

b := [ ... somCode... ]. "assign a block to the variable 'b'" ... b value "evaluate the block's code (call the lambda closure)"

b2 value:123 "evaluate another block, passing one argument"</lang>

Other

<lang smalltalk>expr1 . expr2 "expressions (statements) within a method or block are separated by a full stop."

'hello' print. 'world' print "expressions are separated by a full stop; just like in english"

foo := bar "assignment; let foo refer to the object to which bar refers to

                    (at that particular point in time)"

foo := bar := 0. "assignment has a value"

^ a + 1 "return; the value of 'a+1' as the value of the current method invocation."

|a b c| "local variables; introduces 'a', 'b' and 'c' in the current scope

                    (let-like local bindings)"

r msg1; msg2 "so called cascade;

                    first send msg1 to r, ignoring the return value, 
                    then send msg2. 
                    The value of the expression is result from last message.
                    Syntactic sugar for (t := r) msg1. t msg2 
                    but an expression, not a statement (with an anonymous variable 't')"</lang>

Class Definition

Classes are not defined by syntactic constructs, but by sending a message to some class (to create a subclass), a metaclass (to create an instance) or to a namespace (to create a class and install it). The details vary slightly among dialects, but usually wrappers/forwarders are provided or easily added if code is to be ported and any is missing in the target system. <lang smalltalk> someClass

   subClass: #'nameOfNewClass'
   instanceVariableNames: '...list of private slots...'
   classVariableNames: '...list of class variables...'
   poolDictionaries: '..list of imported pool bindings...'
   category: 'for documentation only'.

</lang> Classes can be anonymous, in most systems, you can "Class new new" to create a new anonymous class and an instance of it (but be aware, that the class should contain enough state to properly specify their instance's layout, so usually more info is needed (number and names of instance variables, inheritance etc.). Otherwise some tools (Inspector) may fail to reflect on it.

Be reminded that classes are themself objects and therefore instances of some class; in this case a so called Metaclass. They follow the same inheritance and message send semantics as ordinary objects. Thus, you can define class-side methods and redefine them in subclasses. For example, you can redefine the "new" method to implement caches, singletons or any other special feature. Often multiple instance creation methods (with different parameters) are provided by a class.

The class also provides reflection protocol, eg. to retrieve all of its instances, to ask for the names of private slots, to ask for the set of supported messages etc.

Control Structures

As mentioned above, these are defined as messages and their implementation is found in the corresponding receiver's class.

For illustration, here is how conditional execution ('if-then-else') is implemented in Smalltalk. There are two boolean objects named "true" and "false", which are singletons of corresponding classes named "True" and "False" (both inherit from Boolean, which inherits from Object). It is essential, that these are singletons, and that typical relational operators like "<", ">" etc. return one of those two.

Then, in the True class, define: <lang smalltalk>ifYouAreTrueThenDo: arg1 ifNotThenDo: arg2

   ^ arg1 value</lang>

and in False, define: <lang smalltalk>ifYouAreTrueThenDo: arg1 ifNotThenDo: arg2

   ^ arg2 value</lang>

Now, we can send this message to a boolean, and pass the code to be executed conditionally as a lambda block: <lang smalltalk>(a > 0) ifYouAreTrueThenDo:[ 'positive' printCR ] ifNotThenDo:[ 'negative' printCR ]</lang> actually, because these two return the value of the block they evaluated, it can also be used for its value (in C, the ternary if expression), as in: <lang smalltalk>outcome := (a > 0) ifYouAreTrueThenDo:['positive'] ifNotThenDo:['negative']. outcome printCR</lang> Finally, by adding a self-returning "value" method to the Object class, we can also write: <lang smalltalk>( (a > 0) ifYouAreTrueThenDo:'positive' ifNotThenDo:'negative' ) printCR</lang> (knowing that the "ifXX"-methods send "value" to the corresponding arg and return that)

In this style, all of Smalltalk's control structures, loops, enumeration, stream readers and event- or exception handling constructs are built. And since every class is open for extension, you can easily add additional convenient control functions (which is one place, where dialects differ, so very often, one has to transport some of those when porting apps from one dialect to another).

The following is only a tiny subset - there are virtually hundreds or thousands of uses of blocks for control structures in the system. Typical are: <lang smalltalk>boolean ifTrue:[ value if boolean is true ] ifFalse:[ value if boolean is false ] boolean ifTrue:[ block providing value if boolean is true ] boolean ifFalse:[ block providing value if boolean is false ]

[ block for condition ] whileTrue:[ block to be looped over ] [ block for condition ] whileFalse:[ block to be looped over ] [ loop code . condition expression ] whileTrue. [ loop code ] doWhile:[ condition ]. [ loop code ] doUntil:[ condition ].

n timesRepeat:[ block to be looped over ] "n being an integer" start to:stop do:[:i | block to be looped over with index ] "start being a number" start to:stop by:inc do:[:i | block to be looped over with index ] "start being a number"

collection do:[:el | block to be evaluated for each element ] collection reverseDo:[:el | block to be evaluated for each element ] collection findFirst:[:el | condition on element]

[action] on:Error do:[handler] "exception handling" [action] ensure:[unwind action] "unwind handler aka finally"

[some code] fork "starts a new thread"</lang>

Return from a Block

it should be noted that a " ˆ " (return) inside a block will return from the enclosing method, NOT only from the block. And that this is an essential semantic property of the return (technically, it may be a long return from a deeply nested call hierarchy, possibly involving unwind actions).

This makes it possible to pass a block to eg. collections to enumerate elements up-to and until some condition is met. For example, if we need a helper method, which searches the first element in a dataset to some condition and evaluate an action on it, we can write: <lang smalltalk>findSomeElementWhichMeetsCondition:conditionBlock thenDo:actionBlock ifNone:failBlock

   dataSet do:[:eachElement |
       (conditionBlock value:eachElement) ifTrue:[
           ^ actionBlock value:eachElement 
       ]
   ].
   ^ failBlock value</lang>

Here, a block is passed to the dataSet's "do:" method, which will return (if invoked inside the "do:") from the containing findSomeElement method. The above can be used as: <lang smalltalk>myDataSet

   findSomeElementWhichMeetsCondition:[:record | record name = 'John']
   thenDo:[:record | record print ]
   ifNone:[ 'nothing found' print ]</lang>

If a block-return (as eg. in JavaScript) would only return from the inner scope, this wasn't possible, and ugly workarounds (like exceptions or long-jumps) were needed.

There are rare situations, where an explicit block return is needed (for example, to break out of a loop in the middle, without returning from the method). For this, block provides a special "valueWithExit" method, so you can write: <lang smalltalk>1 to:10 do:[:outerLoopsI |

   [:exit |
       1 to:10 do:[:innerLoopsI |
          ...
           someCondition ifTrue:exit
       ]
   ] valueWithExit

]</lang>

Exceptions and Handlers

Originally, Smalltalk used an instance based exception handling scheme, where instances of Signal where created and raised. Now, all implementations have moved to class based exceptions, where the raised exception is a subclass of Exception. As instance based exception handling is still useful in some situations (very lightweight, no need to create a class), some dialects continue to support both.

Smalltalk supports proceedable exceptions.

<lang smalltalk>[ try block to be evaluated ] on:exception do:[:ex | handler code ]</lang> where 'exception' is an Exception class or Signal instance and the 'ex' argument provides detail information (where and why) to the hander and also allows control of how to continue afterwards (proceed, return, restart, reject).
The handler basically has the following options:

  • ex return - return out of the try block
  • ex restart - restart the try block
  • ex reject - handler cannot handle; rethrow the exception for an outer handler
  • ex proceedWith: value - proceed after where the exception was raised (after a repair)

Exceptions may be specified to be nonProceedable, to protect code from proceeding handlers, where proceeding is not possible.

Exceptions form a hierarchy, so a handler will also catch any derived exceptions. If an exception is unhandled, the original exception info is packed up and an UnhandledException is raised (similar to the handling of doesNotUnderstand:). The default handler for UnhandledException opens a debugger for the misbehaving thread (while usually other threads continue to operate as usual).

Handlers can also be defined to handle a collection of non-related exceptions, by creating an exceptionSet:

<lang smalltalk>[ try block to be evaluated ] on:(ZeroDivide, DomainError) do:[:ex | handler ]</lang>

finally, many dialects provide syntactic sugar for common situations: <lang smalltalk>exception catch: [ action ] "to return out of the action, without any particular handler action"

exception ignoreIn: [ action ] "to ignore the exception and proceed

                               (for example: a UserInterruptSignal, as generated by the CTRL-C key)"

</lang>

Unwinding

Ensure blocks to make sure that cleanup is performed correctly even in exception situations are defined as: <lang smalltalk>[ action to be performed ] ensure: [ action to cleanup] "will cleanup in any case (i.e. both in normal and in unwind situations)"

[ action to be performed ] ifCurtailed: [ action to cleanup] "will cleanup only in unwind situations"</lang>

Multithreading

New threads are started by sending 'fork' to a block; this will create a process instance ¹ which executes the block's code in a separate thread (within the same address space): <lang smalltalk>[ do something ] fork.

[ do something ] forkAt: priorityLevel</lang> 1) Notice that these are technically threads, not "unix processes". They execute in the same address (or object-) space. They are named "Process" and created with "fork" in Smalltalk for historic reasons.

The scheduling behavior is not standard among dialects/implementations. Some only manual switch by explicit yield, most provide strict priority based scheduling, and some even provide preemptive timeslicing, and dynamic priorities. The details may also depend on the underlying operating system.

Implementation

Most Smalltalk implementations are based on a bytecode execution engine. Bytecode is emitted and stored in method objects by a compiler which is part of both the development and runtime environment (in contrast to external tools, like javac). Thus new methods can be generated and installed dynamically by reading scripts or a generator. Bytecodes are operations for a virtual stack based machine, which is either interpreted by an interpreter (part of the runtime system), or dynamically compiled to machine code (JITTER). Bytecode is not standardized and usually not compatible among dialects.

Some implementations support source-to-source compilation to C, JavaScript or Java. These may or may not show some limitations in the support for dynamic changes at execution time. Typically, the full dynamic bytecode is used for development, followed by a compilation phase for deployment/packaging.

All Smalltalks use and depend on garbage collection for automatic reclamation of unused objects, and most implementations use modern algorithms such as generation scavenging, incremental background collectors, weak references and finalization support. Imprecise conservative collectors are typically not used. Reference counting was abandoned in the 70s.

As message send performance is critical in Smalltalk, highly tuned cache mechanisms have been invented and are used: inline caches, polymorph inline caches, dynamic recompilation based on receiver and/or argument types etc. Also block (aka lambda) creation and evaluation, especially the treatment of closed over variables has been in the focus of implementors. Algorithms similar to those found in sophisticated Lisp systems such as lambda lifting, inlining, stack allocation and heap migration etc. are also used in Smalltalk.

Influences

Smalltalk syntax is meant to be read like English sentences, and messages look like orders "someone doSomethingWith: someArgument".

The syntax is very compact and almost every semantic feature is implemented via a messsage send to some receiver object, instead of being a syntactic language feature of the compiler. As such, and because the compiler is part of the runtime environment, changes, fixes and enhancements of such features can be made easily (and are made). It has and is therefore often used as a testbed for research.

Smalltalk's symbols correspond to Lisp symbols, blocks are syntactic sugar for lambda closures.

ObjectiveC's message send, syntax and keyword format is a direct subset of the corresponding Smalltalk message send syntax. Also the semantic of its classes and instances are similar. However, the reflection and metaclass facilities of ObjectiveC are a small subset.

Java's container and stream class hierarchy has similarities to Smalltalk collection classes. MVC as used in many toolkits, unit testing frameworks, refactoring tools and many design patterns originated in Smalltalk.

Self a descendent of Smalltalk, uses a similar syntax, blocks and exception facilities, but adds instance based inheritance, dynamic slots and mirrors.

Slate and Newspeak use similar syntax and message send semantics.

Newspeak generalizes the scoping to include nested classes, namespace instantiation and abstracts variable access.

JavaScript's functions are a syntactically different but semantically similar to Smalltalk's blocks, including scoping rules (but lack the capability of returning from their containing function).

Python, Ruby and other modern languages have implemented a number of semantic features which originated in Smalltalk.

Many modern VM technologies, dynamic compilation and garbage collection algorithms were originally implemented in Smalltalk runtime systems (jit, hotspot, dynamic recompilation, inline and polymorph caches etc.)

Implementations

  • Amber Smalltalk
  • CUIs Smalltalk
  • Dolphin Smalltalk; open source
  • GemStone/S; object-oriented Smalltalk database, free for private and commercial use
  • GNU Smalltalk; open source
  • Pharo; open source, mostly compatible to Squeak
  • S#
  • Smalltalk/X; free for private and commercial use
  • Smalltalk-MT; (still actively developed?)
  • Squeak; open source
  • VisualAge Smalltalk (Instantiations; formerly known as IBM Smalltalk); free for private use
  • VisualWorks (Cincom) Smalltalk; free for private use

All systems provide the full source code of all class libraries which can be modified and extended by users. The commercial systems will provide the runtime (VM) as binary only.

A Word About Code Snippets in Rosetta

(a word to non-Smalltalkers wanting to try the code)

Smalltalk code snippets found in Rosetta are usually in the form of expressions which can be copy-pasted into a so-called "workspace" (also called "playground" in other systems) which is a kind of REPL-like (read-eval-print-loop) evaluator. The details vary, but usually there is a tool window, into which code can be entered or pasted and evaluated with a menu function called "doIt" or "printIt" (i.e. select the text and apply "doIt").

Due to differences in how methods are defined in classes when code is to be imported ("fileIn" or "load code"), these snippets are often expressions which can be evaluated out of a class context, often in a functional style (using blocks).

For example, if asked to provide an example for a factorial function, a typical Smalltalk solution would be to define a method in the Integer class called "factorial", which might look like the following naïve version: <lang smalltalk>factorial

   ^ (self <= 1) 
       ifTrue:[1] 
       ifFalse:[ self * (self-1) factorial ]</lang>

(here 'self' is the Integer receiver object, and " ˆ " returns a value from the message send).

To get the factorial value, we'd evaluate in a workspace:<lang>10 factorial</lang>

However, to get this code to be executed in a concrete Smalltalk system, you'd have two ways to go:
a) find the Integer class in the class browser, enter the code and "accept" the code (which means: "compile and install the changes made"); assuming that your Smalltalk has a class browser and is not purely script based.
b) save the snippet to a file (in a fileIn format) and "fileIn" (aka. "load") the file.

In both cases, you'd end up with a system infected with many Rosetta methods, which you'd have to remove afterwards (using "undo" or "delete"). And because Smalltalk keeps track of your changes, it usually involves additional cleanup work in your change history (changeFile/changeSet/changeRepository).

a) is somewhat inconvenient if the code example consists of multiple methods, possibly in multiple classes.
b) comes with the additional trouble that fileIn formats are different (chunk file, vs. XML file, vs. Monticello, vs. GNU-ST etc.).

For example, for export/import, GNU-ST uses a private format, which has the advantage of not needing the chunk format's bangs and especially the ugly bang doubling inside code and the empty chunk at the end: <lang smalltalk>Number extend [

 my_factorial [
   ^ (self < 2) ifTrue:[1] ifFalse:[ self * (self-1) my_factorial]
 ]

]</lang> CUIS Smalltalk takes a similar approach; there it looks like: <lang smalltalk>Number >> my_factorial [

   ^ (self < 2) ifTrue:[1] ifFalse:[ self * (self-1) my_factorial]

]</lang>

However, both are incompatible and not supported by most other dialects, which use the historic Smalltalk-80 chunk format: <lang smalltalk>!Number methodsFor:'math'! my_factorial

   ^ (self < 2) ifTrue:[1] ifFalse:[ self * (self-1) my_factorial]

! !</lang> This chunk format is supported by all systems, but it is somewhat ugly to read and also needs exclamation marks to be doubled in the code (which looks especially bad in string literals).

Inside the Smalltalk IDE, you will never see any of the above, as this is only used as interchange format.

So in which dialect's fileOut format should the example be presented to be most convenient, readable, and to be repeatable in case someone wants to try Smalltalk?


Expression-like snippets work more or less in all dialects, and such snippets are usually presented in a functional or expression style, which works outside any class.
Typically these define a function (here called "block") and then call it.
For the above, this might look like: <lang smalltalk>factorial := [:n |

  n == 1 ifTrue:[ n ]
  n * (factorial value:(n - 1))

]. factorial value:10</lang> (here there is no self, and the code works anywhere outside any class context. A block (aka 'function') is assigned to the "factorial" variable; the value: message sent to the block calls the function. There is no "ˆ" return statement, because the value returned from the block is the value of the last expression in it, and there actually will be no method from which to return).

The advantage is that this code can be simply selected as a whole and evaluated. Any variables created will be local to the evaluation scope and not infect the system.

The disadvantage is that it might look somewhat non-Smalltalk-like to not have it in a class/method.

However, it shows that Smalltalk does have functional aspects in it, albeit being a pure OO-language.

Citations

Subcategories

This category has the following 3 subcategories, out of 3 total.

Pages in category "Smalltalk"

The following 200 pages are in this category, out of 325 total.

(previous page) (next page)
(previous page) (next page)