Talk:Find first and last set bit of a long integer: Difference between revisions

From Rosetta Code
Content added Content deleted
(→‎Errors: new section)
(→‎Errors: comparison.)
Line 85: Line 85:


The Algol 68 example though, I cannot see how those figures (especially the LSB) could possibly be correct on a machine that uses binary digits. They are just wildly off. I don't know ''exactly'' what's wrong, but I truly don't believe them. Or are they counting from the other end? That'd just be weird given that virtually everyone numbers bits from the LSB, as that gives consistent answers for small integers irrespective of the size of machine integers being used. –[[User:Dkf|Donal Fellows]] 09:01, 8 December 2011 (UTC)
The Algol 68 example though, I cannot see how those figures (especially the LSB) could possibly be correct on a machine that uses binary digits. They are just wildly off. I don't know ''exactly'' what's wrong, but I truly don't believe them. Or are they counting from the other end? That'd just be weird given that virtually everyone numbers bits from the LSB, as that gives consistent answers for small integers irrespective of the size of machine integers being used. –[[User:Dkf|Donal Fellows]] 09:01, 8 December 2011 (UTC)

Compare:
* C's '''unsigned''' has the '''bits''' organised right to left, but - IIRC - not numbered in any way.
** The numbering is "virtual" depending on whether << or >> is used to extract the bit.
** Maybe C's closest "ordering" concept would be: <lang c>struct {signed int a:1, b:1, c:1, d:1, e:1, f:1, g:1, h:1;} ordered;</lang>
*** IIRC this won't work: <lang c>struct {signed int bits:1[32];} ordered;</lang>
** In particular unsigned '''bits''' (or unsigned) cannot ever be <u>coerced</u> to a '''bool''' array.
** There is no need or use in an actual number or index to the '''bits'''.
* Algol68 has two places where the bit number/index is used:
** '''elem''' operator, eg x '''elem''' 1 give the "first" bit of x, which is the bit on the extreme left.
** When a '''bits''' variable (eg x) is coerced to a '''bool''' array - []'''bool'''(x) - again the "first" bool - x[1] - is the extreme left bit.
** One nice effect is x[LWB x:UPB x] would exactly extract the non-zero bits 2r11101101 of 2r0000000111011010000
* Weirdly, when a short '''bits''' literal e.g. '''bits''' x:=2r11101101 is assigned into the '''bits''' variable x, it is padded to the <u>left</u> 24 with zeros and become 2r0000000000000000000011101101! i.e <u>right</u> justified just like C!
* '''shl''' and '''shr''' work the same as C's << and >>
[[User:NevilleDNZ|NevilleDNZ]] 12:01, 8 December 2011 (UTC)

Revision as of 12:01, 8 December 2011

More Clarification "set bit"

The example gives

INT: find first & last set bit  |LWB|UPB|Bits
                               0| 33| 32|0

How are there any set bits in an unsigned integer? --Dgamey 03:43, 8 December 2011 (UTC)

Can you clarify? For it would be an interesting aside to see a host code sample that is not for Two's complement. NevilleDNZ 04:39, 8 December 2011 (UTC)

Hrm, if it's not in two's compliment, what motivates you to find bits in it to begin with? --Ledrug 04:49, 8 December 2011 (UTC)

Wikipedia has a few non-Two's complement ideas:

Sometimes it is just done for fun:

NevilleDNZ 05:52, 8 December 2011 (UTC)

Clarification

The task is unclear in what number should be computed. It mentions "first" and "last" set bits, but with bits written in most-significant to least-significant order, or the other way around? (i.e. which of "upb" and "lwb" should compute the least significant set bit? and which computes the most significant set bit?) For a 32-bit integer should the least significant bit be numbered 0, 1, 31, or 32? Also, what values should be returned if there are no set bits? And for negative integers, should the most significant set bit be the sign bit?

The ALGOL example seems to indicate that: the least significant bit is numbered 32 (and the most significant bit is numbered 1). This seems the reverse of most people's bit numberings (where the least significant bit should be numbered 0, and the most significant 31), and also wouldn't make sense for arbitrary-precision integers (where would you start counting?). Also, it seems that "lwb" computes the most significant set bit, while "upb" computes the least significant set bit. Is this the example we should follow for other languages? --208.80.119.67 23:38, 7 December 2011 (UTC)

The whole thing doesn't make sense for multi-precision integers. If you are willing to use language/library bigint implementation, you basically willingly forfeit control over how it's done behind the scenes, it doesn't make sense to "efficiently find a set bit" -- who says the numbers are internally represented by bits to begin with? --Ledrug 00:30, 8 December 2011 (UTC)

/* Clarification */ integers are used in the test case only to conveniently supply binary specimen for testing.

re: This seems the reverse of most people's bit numberings.

C also - correct me if I am wrong - prints the most significant bit/octel/hex digit on the left, and the least on the extreme right, hence the operators << and >>. Also python, and most other programming languages display and (pretend to) store right to left. It's probably a human thing.

I was referring to the bit numbering, not how they are printed. From what I've seen, the least significant bit is usually numbered "0" and the most significant bit "31". --208.80.119.67 02:34, 8 December 2011 (UTC)

re: And for negative integers

The task is focused on binary/bits. Whether it is signed or unsigned is not so important. The test case requires only +ve. Beyond that just do the natural thing for the specific language & CPU.

re: wouldn't make sense for arbitrary-precision integers

Agreed. It does seem odd having the least significant bit at (algol68g's) 116 for long bits/long ints. Thus this could make Arbitrary Precision integers particularly weird. But this task is about bit manipulation. The test case happens to use long int. Maybe a particular language/CPU puts these in reverse. In such a case just find the location of first and last non zero bits on this platform. Try not to worry about what is being represented. Try not to worry about the significance of the actual bits.

Also: The task doesn't request for Arbitrary Precision bits per se, rather it asks for an algorithm that works 32 bits, and then another algorithm that works for any particular precision implementation. Hopefully his would not preclude an Arbitrary Width Bits algorithm. ¢ A 'not of 0 on Arbitrary Precision bits is going to be very interesting! :-) ¢

FYI: Algol68 permits the coercion/casting/widening of bits to a bool array, eg<lang algol68>[]BOOL bools a = 2r101,

      bools b = (TRUE,FALSE,TRUE); 

BITS bits c = 2r101,

    bits d = bits pack((TRUE,FALSE,TRUE)); # Hence ... #
    print((bits width ELEM bits a, []BOOL(bits a)[bits width])) # prints "1F" - the same last bit #</lang> Hence there are strong synergies between bits and []bool type, but they are not the same type.  Similarly the types int and bits are distinct and require explicit casting with the operators bin and abs.  C - by contrast - does not have this distinction, in C an int also happens to be bits.

Good luck, and Merry Xmas!

NevilleDNZ 02:02, 8 December 2011 (UTC)

Initial C code specimen requires a long int algorithm

The use of #define in the C code specimen works really well when unrolling a loop.

The code issues are:

  • code specimen lacks any more generally algorithm that would work - for example - with 128, or 256 width bits.
  • the code confuses "lowest set bit" (a bit order concept) with "least significant bit" (an integer concept). These are actually almost opposites on most machines.
  • ditto for "upper set bit" and "most significant bit".
  • the results are in hex eg. 0xc0ffee, not binary 0b110000001111111111101110 making it difficult to visually discover/confirm the results.

Try using Duff's device to unroll wider bits types.

A thought: maybe ( as C has given us << left shift and >> right shift ) this bits task should be asking for the first left bit set, and the first right bit set? This would be "type agnostic" avoiding confusion with integers arithmetic properties.

NJoy NevilleDNZ 03:17, 8 December 2011 (UTC)

From a practical point of view (C is not likely your go-to language for proof of concept stuff), anything longer than the longest native type doesn't need its individual method, because it has to be in a struct or array, and there's no atomic bit operations anyway -- we are worried about efficiency (are we?), so a generic "any length bit operation" is out of question. Just scan whatever struct or array from one side to the other and deal with the first non-zero item. You can hardly do better than that without special hardware. Unless we are talking about say, OpenCL, where it's a whole different matter.
For the first/last nomenclature, an integer type has well defined least/most signicant bit concept, while first/last/left/right all depends on endianness. If the bits are transmitted on a wire, we also need to know the bit ordering in a byte, so these are hazy at best. The 32-bit number 257 can be stored in memory as either "01 01 00 00" or "00 00 01 01", so what's the "last" set bit? Or the "left most" one? Are we going to consider 32 bit as smallest unit, or 8 bit? Unless you have a spec about using these bits, lsb/msb is a better way to refer to them.
As to the hex vs binary output, I personally find reading 8 hex digit is less stressful than reading 32 1s and 0s. I did provide the positions of the related bits, though. --Ledrug 04:22, 8 December 2011 (UTC)

C has given us << left shift and >> right shift operators for int, long int, and long long int. Clearly C knows the difference between left and right. That could be a good starting point. NevilleDNZ 04:31, 8 December 2011 (UTC)

Errors

The C example goes one power too far and runs into problems with losing the MSB. Minor correction needed though; it's not the core task itself that is wrong.

The Algol 68 example though, I cannot see how those figures (especially the LSB) could possibly be correct on a machine that uses binary digits. They are just wildly off. I don't know exactly what's wrong, but I truly don't believe them. Or are they counting from the other end? That'd just be weird given that virtually everyone numbers bits from the LSB, as that gives consistent answers for small integers irrespective of the size of machine integers being used. –Donal Fellows 09:01, 8 December 2011 (UTC)

Compare:

  • C's unsigned has the bits organised right to left, but - IIRC - not numbered in any way.
    • The numbering is "virtual" depending on whether << or >> is used to extract the bit.
    • Maybe C's closest "ordering" concept would be: <lang c>struct {signed int a:1, b:1, c:1, d:1, e:1, f:1, g:1, h:1;} ordered;</lang>
      • IIRC this won't work: <lang c>struct {signed int bits:1[32];} ordered;</lang>
    • In particular unsigned bits (or unsigned) cannot ever be coerced to a bool array.
    • There is no need or use in an actual number or index to the bits.
  • Algol68 has two places where the bit number/index is used:
    • elem operator, eg x elem 1 give the "first" bit of x, which is the bit on the extreme left.
    • When a bits variable (eg x) is coerced to a bool array - []bool(x) - again the "first" bool - x[1] - is the extreme left bit.
    • One nice effect is x[LWB x:UPB x] would exactly extract the non-zero bits 2r11101101 of 2r0000000111011010000
  • Weirdly, when a short bits literal e.g. bits x:=2r11101101 is assigned into the bits variable x, it is padded to the left 24 with zeros and become 2r0000000000000000000011101101! i.e right justified just like C!
  • shl and shr work the same as C's << and >>

NevilleDNZ 12:01, 8 December 2011 (UTC)