Sum data type

From Rosetta Code
Sum data type is a draft programming task. It is not yet considered ready to be promoted as a complete task, for reasons that should be found in its talk page.

Data Structure
This illustrates a data structure, a means of storing data within a program.

You may see other such structures in the Data Structures category.


Task

Create a sum data type:

A sum data type is a data structure used to hold a value that could take on several different, but fixed, types. Only one of the types can be in use at any one time.

Sum data types are considered an algebraic data type and are also known as tagged union, variant, variant record, choice type, discriminated union, disjoint union or coproduct.

Related task
See also



Ada

Ada allows the declaration of variant types with discriminant fields. The contents of the variant record can vary based upon the contents of the discriminant field.

The following example is a variant record type used to return a value from a function that returns the index position of the first occurrence of a value in an array. If the value is found the result returns the index of the first occurrence of the value. If the value is not found in the array it returns no index value.

type Ret_Val (Found : Boolean) is record
   case Found is
      when True =>
         Position : Positive;
      when False =>
         null;
   end case;
end record;

The record requires a boolean value for its discriminant field named Found. When Found is True the record contains a Position field of the Ada subtype Positive. When Found is False the record contains no additional fields.

A function specification using this variant record is:

type Array_Type is array (Positive range <>) of Integer;

function Find_First (List  : in Array_Type; Value : in Integer) return Ret_Val 
with
   Depends => (Find_First'Result => (List, Value)),
   Post    =>
   (if
      Find_First'Result.Found
    then
      Find_First'Result.Position in List'Range
      and then List (Find_First'Result.Position) = Value);

The function Find_First is written with two aspect specifications.

The Depends aspect specifies that the result of this function depends only on the List parameter and the Value parameter passed to the function.

The Post aspect specifies the post condition for this function using predicate logic. This post condition states that if the Found field of the return value of the function is True then the position field in the returned value is an index value within the range of index values in the parameter List and the element of List indexed by the Position field equals the Value parameter.

Use of the variant record ensures that a logically correct value is always returned by the function Find_First. Not finding a specified value within an array is not an error. It is a valid result. In that case there is no Position field indicating the index value of the element in the array matching the value parameter. The function should not respond with an error condition when the value is not found. On the other hand, it must respond with a correct array index when the value is found.

Use of the variant record allows the function to return either one or two fields of information as is appropriate to the result.

The implementation of the Find_First function is:

function Find_First (List  : in Array_Type; Value : in Integer) return Ret_Val is
begin
   for I in List'Range loop
      if List (I) = Value then
         return (Found => True, Position => I);
      end if;
   end loop;
   return (Found => False);
end Find_First;

The function iterates through the List array comparing each array element with the Value parameter. If the array element equals the Value parameter the function returns an instance of Ret_Val with Found assigned the value True and Position assigned the value of the current array index. If the loop completes no array element has been found to match Value. The function then returns an instance of Ret_Val with the Found field assigned the value False.

ALGOL 68

Translation of: OCaml

Algol 68's UNION MODE allows the definition of items which can have different types.

MODE LEAF = INT;
MODE NODE = STRUCT( TREE left, TREE right );
MODE TREE = UNION( VOID, LEAF, REF NODE );

TREE t1   = LOC NODE := ( LEAF( 1 ), LOC NODE := ( LEAF( 2 ), LEAF( 3 ) ) );

Note that assignment/initialisation of UNION items is just of a matter of specifying the assigned/initial value, as above; however to use the value requires a CASE clause, such as in the example below (which would print "node", given the above declarations).

CASE t1
  IN (REF NODE n): print( ( "node",     newline ) )
   , (    LEAF l): print( ( "leaf ", l, newline ) )
   , (    VOID  ): print( ( "empty",    newline ) )
ESAC

C

C has the union data type which can store multiple variables at the same memory location. This was a very handy feature when memory was a scarce commodity. Even now this is an essential feature which enables low level access such as hardware I/O access, word or bitfield sharing thus making C especially suited for Systems Programming.

What follows are two example programs. In both an union stores an integer, a floating point and a character at the same location. If all values are initialized at once, data is corrupted, as shown in the first example. Proper use of unions require storing and retrieving data only when required.

Incorrect usage

#include<stdio.h>

typedef union data{
        int i;
        float f;
        char c;
}united;

int main()
{
        united udat;

        udat.i = 5;
        udat.f = 3.14159;
        udat.c = 'C';

        printf("Integer   i = %d , address of i = %p\n",udat.i,&udat.i);
        printf("Float     f = %f , address of f = %p\n",udat.f,&udat.f);
        printf("Character c = %c , address of c = %p\n",udat.c,&udat.c);

        return 0;
}

Output :

Integer   i = 1078529859 , address of i = 0x7ffc475e3c64
Float     f = 3.141557 , address of f = 0x7ffc475e3c64
Character c = C , address of c = 0x7ffc475e3c64

Correct usage

#include<stdio.h>

typedef union data{
        int i;
        float f;
        char c;
}united;

int main()
{
        united udat;

        udat.i = 5;

        printf("Integer   i = %d , address of i = %p\n",udat.i,&udat.i);

        udat.f = 3.14159;

        printf("Float     f = %f , address of f = %p\n",udat.f,&udat.f);

        udat.c = 'C';

        printf("Character c = %c , address of c = %p\n",udat.c,&udat.c);

        return 0;
}

Output:

Integer   i = 5 , address of i = 0x7ffd71122354
Float     f = 3.14159 , address of f = 0x7ffd71122354
Character c = C , address of c = 0x7ffd71122354

C++

#include <iostream>
#include <optional>
#include <string>
#include <variant>
#include <vector>

// A variant is a sum type, it can hold exaclty one type at a time
std::variant<int, std::string, bool, int> myVariant{"Ukraine"}; 

struct Tree
{
  // Variants can be used in recusive data types to define structures like
  // trees.  Here the node of a tree is represented by a variant of either
  // an int or a vector of sub-trees.
  std::variant<std::vector<Tree>, int> Nodes;
};

Tree tree1; // empty tree
Tree tree2{2}; // a tree with a single value

// a bigger tree
Tree tree3{std::vector{Tree{3}, Tree{std::vector{Tree{2}, Tree{7}}}, Tree{8}}};

// optional is a special case of a sum type between a value and nothing
std::optional<int> maybeInt1;    // empty optional
std::optional<int> maybeInt2{2}; // optional containing 2

// In practice pointers are often used as sum types between a valid value and null
int* intPtr1 = nullptr;  // a null int pointer

int value = 3;
int* intPtr2 = &value; // a pointer to a valid object 

// Print a tree
void PrintTree(const Tree& tree)
{
  std::cout << "(";
  if(holds_alternative<int>(tree.Nodes))
  {
    std::cout << get<1>(tree.Nodes);
  }
  else
  {
    for(const auto& subtree : get<0>(tree.Nodes)) PrintTree(subtree);
  }
  std::cout <<")";
}

int main()
{
  std::cout << "myVariant: " << get<std::string>(myVariant) << "\n";

  PrintTree(tree1); std::cout << "\n";
  PrintTree(tree2); std::cout << "\n";
  PrintTree(tree3); std::cout << "\n";

  std::cout << "*maybeInt2: " << *maybeInt2 << "\n";
  std::cout << "intPtr1: " << intPtr1 << "\n";
  std::cout << "*intPtr2: " << *intPtr2 << "\n";
}
Output:
()
(2)
((3)((2)(7))(8))
*maybeInt2: 2
intPtr1: 0
*intPtr2: 3

CLU

start_up = proc () 
    % A sum data type is called a `oneof' in CLU.
    % (There is also a mutable version called `variant' which works
    % the same way.)
    irc = oneof[i: int, r: real, c: char]
    
    % We can use the new type as a parameter to other types
    ircseq = sequence[irc]
    
    % E.g., fill an array with them
    ircs: ircseq := ircseq$[ 
        irc$make_i(20),
        irc$make_r(3.14),
        irc$make_i(42),
        irc$make_c('F'),
        irc$make_c('U'),
        irc$make_r(2.72)
    ]
    
    % 'tagcase' is used to discriminate between the various possibilities
    % e.g.: iterate over the elements in the array
    po: stream := stream$primary_output()
    for i: irc in ircseq$elements(ircs) do
        tagcase i
            tag i (v: int): 
                stream$putl(po, "int: " || int$unparse(v))
            tag r (v: real): 
                stream$putl(po, "real: " || f_form(v, 1, 2))
            tag c (v: char):
                stream$putl(po, "char: " || string$c2s(v))
        end
    end
end start_up
Output:
int: 20
real: 3.14
int: 42
char: F
char: U
real: 2.72

Delphi

See Pascal

EMal

^|EMal is a dynamic language, it has the var supertype
 |that can be used as data type, then we can control and validate the access
 |to the value as shown in the IntOrText example.
 |
 |Otherwise we can force the type system using the keyword "allows";
 |like in the NullableIntOrText example: since it's a user data type it's nullable.
 |^
type NullableIntOrText allows int, text
type IntOrText
fun check = var by var value
  if generic!value != int and generic!value != text
    Event.error(0, "Value must be int or text.").raise()
  end
  return value
end
model
  fun getValue
  fun setValue
  new by var value
    check(value)
    me.getValue = var by block do return value end
	me.setValue = void by var newValue do value = check(newValue) end 
  end
end
type Main
^|testing NullableIntOrText|^
# NullableIntOrText v1 = 3.14 ^|type mismatch|^
NullableIntOrText v2 = 42
watch(v2)
v2 = null
watch(v2)
v2 = "hello"
watch(v2)
# v2 = 3.14  ^|type mismatch|^

^|testing IntOrText|^
#IntOrText v3 = IntOrText(3.14) ^|Value must be int or text.|^
IntOrText v3 = IntOrText(42)
watch(v3.getValue())
#v3.setValue(null) ^|Value must be int or text.|^
v3.setValue("hello")
watch(v3.getValue())
Output:
Org:RosettaCode:NullableIntOrText, Integer: <42>
Org:RosettaCode:NullableIntOrText: ∅
Org:RosettaCode:NullableIntOrText, Text: <hello>
Variable, Integer: <42>
Variable, Text: <hello>

Factor

This is accomplished by defining a tuple with only one slot. The slot should have a class declaration that is a union class. This ensures that the slot may only contain an object of a class that is in the union. A convenient way to do this is with an anonymous union, as in the example below. An explicit UNION: definition may also be used. Note that as Factor is dynamically typed, this is only a runtime restriction.

In the example below, we define a pseudo-number tuple with one slot that can hold either a number (a built-in class) or a numeric-string — a class which we have defined to be any string that can parse as a number using the string>number word.

USING: accessors kernel math math.parser strings ;

PREDICATE: numeric-string < string string>number >boolean ;
TUPLE: pseudo-number { value union{ number numeric-string } } ;
C: <pseudo-number> pseudo-number   ! constructor

5.245 <pseudo-number>   ! ok
"-17"   >>value         ! ok
"abc42" >>value         ! error

FreeBASIC

These are called unions in FreeBASIC

type p2d
    'a 2d point data type; used later to show unions can hold compound data types
    x as integer
    y as integer
end type

union foobar
    'a union
    small as ubyte
    medium as integer
    large as ulongint
end union

union thingo
    'a FreeBASIC union can hold various data types:
    text as string*8
    num1 as double
    num2 as ulongint
    posi as p2d             'structs
    union2 as foobar        'even another union!
end union

Free Pascal

See Pascal. The type variant is implemented as a variant record.

Go

Go doesn't natively support sum types, though it's not difficult to create one (albeit verbosely) as the following example shows.

Normally, the IPAddr type (and associated types/methods) would be placed in a separate package so its 'v' field couldn't be accessed directly by code outside that package. However here, for convenience, we place it in the 'main' package.

package main

import (
    "errors"
    "fmt"
)

type (
    IpAddr struct{ v interface{} }
    Ipv4   = [4]uint8
    Ipv6   = string
)

var zero = Ipv4{}

func NewIpAddr(v interface{}) (*IpAddr, error) {
    switch v.(type) {
    case Ipv4, Ipv6:
        return &IpAddr{v}, nil
    default:
        err := errors.New("Type of value must either be Ipv4 or Ipv6.")
        return nil, err
    }
}

func (ip *IpAddr) V4() (Ipv4, error) {
    switch ip.v.(type) {
    case Ipv4:
        return ip.v.(Ipv4), nil
    default:
        err := errors.New("IpAddr instance doesn't currently hold an Ipv4.")
        return zero, err
    }
}

func (ip *IpAddr) SetV4(v Ipv4) {
    ip.v = v
}

func (ip *IpAddr) V6() (Ipv6, error) {
    switch ip.v.(type) {
    case Ipv6:
        return ip.v.(Ipv6), nil
    default:
        err := errors.New("IpAddr instance doesn't currently hold an Ipv6.")
        return "", err
    }
}

func (ip *IpAddr) SetV6(v Ipv6) {
    ip.v = v
}

func check(err error) {
    if err != nil {
        fmt.Println(err)
    }
}

func main() {
    v4 := Ipv4{127, 0, 0, 1}
    ip, _ := NewIpAddr(v4)
    home, _ := ip.V4()
    fmt.Println(home)
    v6 := "::1"
    ip.SetV6(v6)
    loopback, _ := ip.V6()
    fmt.Println(loopback)
    _, err := ip.V4()
    check(err)
    rubbish := 6
    ip, err = NewIpAddr(rubbish)
    check(err)
}
Output:
[127 0 0 1]
::1
IpAddr instance doesn't currently hold an Ipv4.
Type of value must either be Ipv4 or Ipv6.

Java

Java does not support sum data types. However, generic data types are supported. An example of generic data types is shown.

import java.util.Arrays;

public class SumDataType {

    public static void main(String[] args) {
        for ( ObjectStore<?> e : Arrays.asList(new ObjectStore<String>("String"), new ObjectStore<Integer>(23), new ObjectStore<Float>(new Float(3.14159))) ) {
            System.out.println("Object : " + e);
        }
    }
    
    public static class ObjectStore<T> {
        private T object;
        public ObjectStore(T object) {
            this.object = object;
        }
        @Override
        public String toString() {
            return "value [" + object.toString() + "], type = " + object.getClass();
        }
    }

}
Output:
Object : value [String], type = class java.lang.String
Object : value [23], type = class java.lang.Integer
Object : value [3.14159], type = class java.lang.Float

Julia

Julia allows the creation of union types.

    
    julia> using Sockets # for IP types

    julia> MyUnion = Union{Int64, String, Float64, IPv4, IPv6}
    Union{Float64, Int64, IPv4, IPv6, String}

    julia> arr = MyUnion[2, 4.8, ip"192.168.0.0", ip"::c01e:fc9a", "Hello"]
    5-element Array{Union{Float64, Int64, IPv4, IPv6, String},1}:
     2
     4.8
      ip"192.168.0.0"
      ip"::c01e:fc9a"
      "Hello"

Nim

"object variants" are a tagged union discriminated by an enumerated type

type
  UnionKind = enum nkInt,nkFloat,nkString
  Union = object
    case kind:UnionKind
    of nkInt:
      intval:int
    of nkFloat:
      floatval:float
    of nkString:
      stringval:string
proc `$`(u:Union):string =
  case u.kind
  of nkInt:
    $u.intval
  of nkFloat:
    $u.floatval
  of nkString:
    '"' & $u.stringval & '"'
when isMainModule:
  let 
    u = Union(kind:nkInt,intval:3)
    v = Union(kind:nkFloat,floatval:3.14)
    w = Union(kind:nkString,stringval:"pi")
  echo [u,v,w]
Output:
[3,3.14,"pi"]

OCaml

type tree = Empty
          | Leaf of int
          | Node of tree * tree

let t1 = Node (Leaf 1, Node (Leaf 2, Leaf 3))

Odin

package main

V4 :: distinct [4]u8
V6 :: distinct string

IpAddr :: union { V4, V6 }

ip1, ip2 : IpAddr

main :: proc() {
  ip1 = V4{127, 0, 0, 1}
  ip2 = V6("::1")
}

Pascal

type
	someOrdinalType = boolean;
	sumDataType = record
			case tag: someOrdinalType of
				false: (
					number: integer;
				);
				true: (
					character: char;
				);
		end;

Naming a tag can be omitted, but then introspection, i. e. retrieving which alternative is “active”, can not be done. A record can have at most one variant part, which has to appear next to the end of the record definition.

Perl

No native type in Perl for this, use a filter to enforce the rules.

use strict;
use warnings;
use feature 'say';

sub filter {
    my($text) = @_;
    if (length($text)>1 and $text eq reverse $text) {
        return 1, 'Palindromic';
    } elsif (0 == length(($text =~ s/\B..*?\b ?//gr) =~ s/^(.)\1+//r)) {
        return 1, 'Alliterative';
    }
    return 0, 'Does not compute';
}

for my $text ('otto', 'ha ha', 'a', 'blue skies', 'tiptoe through the tulips', 12321) {
    my($status,$message) = analyze $text;
    printf "%s $message\n", $status ? 'Yes' : 'No ';
}
Output:
Yes Palindromic
Yes Alliterative
No  Does not compute
No  Does not compute
Yes Alliterative
Yes Palindromic

Phix

Phix has the object type, which can hold an integer, float, string, (nested) sequence, or anything else you can think of.

User defined types can be used to enforce restrictions on the contents of variables.

Note however that JavaScript is a typeless language, so no error occurs under pwa/p2js on the assignment, but you can still explicitly check and crash, as shown.

with javascript_semantics
type ipv4(object o)
    if not sequence(o) or length(o)!=4 then
        return false
    end if
    for i=1 to 4 do
        if not integer(o[i]) then
            return false
        end if
    end for
    return true
end type
 
type ipv6(object o)
    return string(o)
end type
 
type ipaddr(object o)
    return ipv4(o) or ipv6(o)
end type
 
ipaddr x
x = {127,0,0,1}  -- fine
x = "::c01e:fc9a"  -- fine
x = -1  -- error (but no such error under p2js)
if not ipaddr(x) then crash("however this works/crashes properly under p2js") end if

Raku

(formerly Perl 6)

Raku doesn't really have Sum Types as a formal data structure but they can be emulated with enums and switches or multi-dispatch. Note that in this case, feeding the dispatcher an incorrect value results in a hard fault; it doesn't just dispatch to the default. Of course those rules can be relaxed or made more restrictive depending on your particular use case.

enum Traffic-Signal < Red Yellow Green Blue >;

sub message (Traffic-Signal $light) {
    with $light {
        when Red    { 'Stop!'                                       }
        when Yellow { 'Speed Up!'                                   }
        when Green  { 'Go! Go! Go!'                                 }
        when Blue   { 'Wait a minute, How did we end up in Japan?!' }
        default     { 'Whut?'                                       }
    }
}

my \Pink = 'A Happy Balloon';


for Red, Green, Blue, Pink -> $signal {
    say message $signal;
}
Output:
Stop!
Go! Go! Go!
Wait a minute, How did we end up in Japan?!
Type check failed in binding to parameter '$light'; expected Traffic-Signal but got Str ("A Happy Balloon")

REXX

The REXX language is untyped,   it is up to the program to decide if it's valid and how to deal with an invalid structure.

/*REXX pgm snipette validates a specific type of data structure, an IP v4 address (list)*/
ip= 127 0 0 1
if val_ipv4(ip)  then say                'valid IPV4 type: '    ip
                 else say '***error***  invalid IPV4 type: '    ip
...

exit                                             /*stick a fork in it,  we're all done. */
/*──────────────────────────────────────────────────────────────────────────────────────*/
val_ipv4: procedure; parse arg $;          if words($)\==4  |  arg()\==1  then return 0
            do j=1  for 4;   _=word($, j);    #=datatype(_, 'W');    L= length(_)
            if verify(_, 0123456789)\==0  |  \#  | _<0  |  _>255  |  L>3  then return 0
            end   /*j*/
          return 1                               /*returns true (1) if valid, 0 if not. */

Rust

enum IpAddr {
    V4(u8, u8, u8, u8),
    V6(String),
}

let home = IpAddr::V4(127, 0, 0, 1);

let loopback = IpAddr::V6(String::from("::1"));

Scala

Output:
See it yourself by running in your browser either by ScalaFiddle (ES aka JavaScript, non JVM) or Scastie (remote JVM).
Works with: Scala version 2.13
case class Envelop[T](member: T)

val list = List(
  Envelop("a string"),
  Envelop(732), // an integer
  Envelop('☺'), // a character
  Envelop(true) // a boolean value
)

list.foreach { case Envelop(element) => println(element) }

Standard ML

datatype tree =
    Empty
  | Leaf of int
  | Node of tree * tree

val t1 = Node (Leaf 1, Node (Leaf 2, Leaf 3))

Wren

Wren is dynamically typed and doesn't support sum types as such.

However, we can simulate one by creating a class wrapper and restricting the kinds of values it can accept at runtime.

In the following example, the Variant type can only accept numbers or strings.

class Variant {
    construct new(v) {
        // restrict 'v' to numbers or strings
        if (v.type != Num && v.type != String) {
            Fiber.abort("Value must be a number or a string.")
        }
        _v = v
    }

    v { _v }

    kind { _v.type }

    toString { v.toString }
}

var v1 = Variant.new(6)
System.print([v1.v, v1.kind])
var v2 = Variant.new("six")
System.print([v2.v, v2.kind])
var v3 = Variant.new([6]) // will give an error as argument is a List
Output:
[6, Num]
[six, String]
Value must be a number or a string.
[./Sum_data_type line 5] in init new(_)
[./Sum_data_type line 8] in 
[./Sum_data_type line 21] in (script)
Library: Wren-dynamic

We can also automate the process using the Union class from the above module.

import "./dynamic" for Union

var Variant = Union.create("Variant", [Num, String])

var v1 = Variant.new(6)
System.print([v1.value, v1.kind])
var v2 = Variant.new("six")
System.print([v2.value, v2.kind])
var v3 = Variant.new([6]) // will give an error as argument is a List
Output:
[6, Num]
[six, String]
Invalid type.
[./dynamic line 4] in init new(_)
[./dynamic line 6] in 
[./Sum_data_type_2 line 9] in (script)

zkl

zkl is untyped - it is up to the container to decide if it wants to deal with a type or not.

ip:=List(127,0,0,1);
addrs:=Dictionary("ip",ip);
class Addr{
   fcn init(addr){
      var ip = addr;
      if(not List.isType(addr)) throw(Exception.TypeError);
   }
}
ip:=Addr(List(127,0,0,1));
Addr(127,0,0,1);	// TypeError : Invalid type
Addr(List("abc"));	// doesn't fail, would need more error checking
ip.ip=L(192,168,1,1);	// this doesn't type check either