Mission Vishwakarma

Internal IDs


Any complex engineering project comprises of multiple elements. It could consists of billions of steel plates, bolts, wires and so on. When we want to represent these elements in computer, and more importantly, we want to mimic the real world relationship between various elements, we need to identify each element with a unique IDs. As you might be aware, computers have a “finite” amount of memory ( called RAM ). So the question arises, how long should the IDs be ? Example: Mobile numbers in India (excluding country codes) have 10 digits, i.e. can represent ~10 Billion unique numbers. That is well above the population of 1.4 Billion. Well within the same order of magnitude. So now all the forms got 10 boxes for mobile number. Now let’s come back to creating IDs for things inside computer memory. How many digits for everything?

Computers memory are measured in number of bits/bytes. 1 Decimal digits is approximately 3.1 bits. So mobile number are already 10 digit = ~33 bits. Fortunately every time we increment the number of bits, the number of unique elements we can identify doubles. So if our mega refinery had total of 100 Billion unique things, we will need approx. LogBase2(100,000,000,000) = 37 bits. That’s how much we genuinely need.

Now let’s see the state of art in computer science. Since computers prefer working in multiple of 2, the choices are 32 bits, 64 bits, 128 bits, 256 bits and so on. 32 bits is obviously less than our minimum requirement. Perhaps 64 or 128. 256 is so obviously super-duper over designed. The field of computer science have mostly decided on 128 bits by default. The reason for choosing 128 is mostly about letting everyone (all computers) assign his own unique IDs for the elements they generate, without a chance of 2 different things assigned same IDs on different computer. These are called UUID ( Universally Unique Identifier ). UUID has multiple versions, v1 / v2 / v3 / v4 / v5 / v6 / v7. Some other variants are ULIDs and so on. All with various trade-offs. AWS has 256 bit ids for some reason. Now some smart peoples in the computers science ! those at Facebook / Instagram are able to mange the entire website with 64 bits IDs. The downside? They need to maintain a loosely coupled central authority assigning new IDs to every post/messages/phots/comments/likes and so on.

As we saw before, we need nearly 37 bits minimum, and every new bits doubles the number of IDs available, even 64 bit is more than sufficient. We just have to do some upfront book-keeping engineering.

So I have made the decision to go ahead with 64 bits for Mission Vishwakarma. 64 Bits = 8 Bytes = Approx. 10 to the power 19 unique IDs. Just to give some sense of scale, the largest memory super computer in the world, Fugaku, has got 32 petabytes RAM = 2^55 Bytes. It is still well less than 2^64. So here we go.

Our design goal is that 1000s of engineers should be able to work parallel in a project. All creating new thing ( with new IDs) and so no. So who gets to assign what IDs ? While we do want the IDs to be sequential, we con’t want every new IDs to be generated by a central authority. If we did that, people will not be able to work when their internet connections disconnects. We want people to keep working on their laptops, even when they are offline. To address this, I have decided upon some conventions for book-keeping of IDs. We will take inspirations from excellent concept of IP-Address management followed across the world, called CIDR ( Classless Inter-Domain Range ). So we declare upfront, how these 64 bits shall be used. Here we go.

  1. Out of 64 bits, top 16 bits are reserved. Always zero. This is more of a temporary measure to future proof ourselves. This leaves us with 64-16 = 48 bits. 2^48 is still plenty. In CIDR lingo, it’s 0:0/16. Our choose 0:0/16 (=2^48) IDs consists of 256 Nos. /24 (=2^40) IDs.

  2. The first 2^40 IDs (0:0/24) are reserved for use by Mission Vishwakarma Software developer’s assigned items/catalogue items.

  3. Next 2^40 IDs are assigned to be local use IDs. I.e. Whenever a computer assigns new IDs, it will assign in range 2^40+1 to 2^41. However, when they save / sync their work to the central computer/server, server will assign them new IDs and inform the computer to updates it’s memory. So multiple computers can have duplicate IDs in this range, until they save their work.

  4. Central computer/server assigns IDs starting with 2^42. Increasing sequentially. That’s it. Initially we are using sequential increment, however in future, we may improve our algorithms/implementations to reuse deleted IDs. Perhaps after 2035! Remember, this auto incrementing IDs can never cross 2^48 since all IDs more than 2^48 are reserved. This gives us ~280,000 Billion unique things / IDs. plenty huh !

Now let’s calculate how many IDs a reasonable workstation computer can use simultaneously. All elements/entities are expected to have some extra information in addition to IDs. Afterall, IDs themselves are just dumb numbers. For example, consider a 3D coordinate, It will have at minimum: A) 64 Bit (=8 Byte) ID, B) 4 Byte Element Type identifying it as co-ordinate, C) 3 Nos 8 Bytes Co-ordinates, D) Around 16 Bytes for Name, E) Time t’s created and so on. 8 + 4 + 3 x 8 + 8 = 44 Bytes. A line will have 2 co-ordinates. Conservatively consider on an average 64 Byte per entity. So a high end computer with 64 GB RAM will be able to hold 64 GB / 64 Bytes = ~ 1 Billion entity with unique IDs. You see, we run out of RAM much faster than running out of possible unique IDs. Remember, a project is expected to have much more data. The single computer will in general load a subset of data.

When one entity has a relationship to another, it will refer using other’s IDs. So having a shorter 8 Byte ID takes up half the RAM compared to 16 Byte ( = 128 bits ) IDs. We want to store more of engineering information, than just the IDs. Hence the emphasis on upfront engineering to go ahead with 64 bit ids.

In this article we learned about importance and planning of IDs in Mission Vishwakarma. This is also the 1st of the data structure we have designed ! Let’s get deeper in subsequent articles.