HackPig520

HackPig520 的博客

我是HackPig520,一个前端工程师,喜欢Web3和Minecraft。
github
gitlab
bilibili
tg_channel
keybase
email
twitter
zhihu
pixiv

What is UUID? What is it used for?

UUID stands for Universally Unique Identifier.

0x01 Version#

UUID has different versions, each with different use cases. For example, Version 4 recommends generating all variable factors randomly. In many scenarios, this is a very convenient implementation method. Version 1 uses a combination of timestamp, clock sequence, and node information (machine information) to ensure global uniqueness in some distributed system scenarios. Twitter's snowflake can be seen as a simplified version of UUID Version 1. So far, there are a total of 5 implementation versions of UUID:

  • Version 1: Strictly implemented according to the meaning of each field defined by UUID, using the variable factors of timestamp, clock sequence, and node information (Mac address).
  • Version 2: Basically the same as Version 1, but it is mainly used with DCE (IBM's set of distributed computing environments). However, this version is not specifically described in the IETF, but it is mentioned in the document "DCE 1.1: Authentication and Security Services". Therefore, this version is rarely used now, and many implementations in many places have ignored it.
  • Version 3: Implements variable factors based on the hash of name and namespace, and Version 3 uses the md5 hash algorithm.
  • Version 4: Implements variable factors using random or pseudo-random methods.
  • Version 5: Implements variable factors based on the hash of name and namespace, and Version 5 uses the sha1 hash algorithm.
    Regardless of which version of UUID, its structure is the same. This structure is defined according to Version 1, but in other versions, several variable factors in Version 1 have changed.

0x02 Basic Structure#

UUID has a length of 128 bits (16 bytes), which can be represented by 32 hexadecimal values (each 4 bits represent a value). They are separated by 4 hyphens in the order of 8-4-4-4-12. Including the hyphens, UUID has 36 characters. For example: 3e350a5c-222a-11eb-abef-0242ac110002.

The format of UUID is as follows: xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx
The position of N can only be 8, 9, a, b
The position of M represents the version number. Since there are 5 versions in the standard implementation of UUID, it can only be 1, 2, 3, 4, 5.

One Timestamp#

Timestamp is a 60-bit unsigned number. For UUID with version 1, it starts from 1582-10-15 00:00:000000000 to the current UTC time, with an increment of 100 nanoseconds. For systems that cannot obtain UTC time, if UTC cannot be obtained, you can uniformly use localtime (in fact, the same system time zone is sufficient).

With the timestamp, the time_low, time_mid, and time_hi in the structure diagram are known.

time_low represents bits 0 to 31 of the 60-bit timestamp, a total of 32 bits.

time_mid represents bits 32 to 47 of the 60-bit timestamp, a total of 16 bits.

time_hi_and_version has two parts, version and time_hi. The version occupies 4 bits, representing a maximum of 31 versions. time_hi represents the remaining 12 bits of the timestamp, a total of 16 bits.

Two Clock Sequence#

If the machine calculating the UUID has adjusted the time or the nodeId has changed (the host has replaced the network card) and conflicts with other machines, a variable factor needs to be changed to ensure the uniqueness of the generated UUID.

In fact, the algorithm for changing the Clock Sequence is very simple. When the time is adjusted or the nodeId changes, you can directly use a random number or increment it by one on the original Clock Sequence value.
Clock Sequence is a total of 14 bits.

clock_seq_low represents bits 0 to 7 of the Clock Sequence, a total of 8 bits.
clock_seq_hi_and_reserved contains two parts, reserved and clock_seq_hi. clock_seq_hi represents bits 8 to 13 of the Clock Sequence, a total of 6 bits, and reserved is 2 bits. reserved is generally set to 10.

Three Node#

Node is a 48-bit unsigned number. For UUID with version 1, it selects the IEEE 802 MAC address, which is the MAC address of the network card. When there are multiple network cards in the system, any valid network card can be used as the Node data. For systems without network cards, the value is a random number.

0x03 Differences in Different Versions#

The above content has explained the structure of UUID. Basically, this structure constitutes the definition of UUID version 1. We can see that its variable factors include timestamp, clock sequence, and node. However, the meanings of these variable factors are different in different versions.

In version 4, the timestamp, clock sequence, and node are all random or pseudo-random.

But in versions 3 and 5, they are generated based on the hash algorithm of name and namespace.
The name and namespace are similar to the namespaces and class names in many languages. The basic requirement is that name + namespace is the standard for determining the uniqueness of the hash string. In other words, the same namespace + name must use the same hash algorithm (such as md5 in version 3) to calculate the same result, but the same name in different namespaces will generate different results.

The three variable factors in versions 3 and 5 are guaranteed by hash algorithms, md5 for version 3 and sha1 for version 5.


Alright, that's the end of this tutorial~ If I think the Cloud Shell experience is good, I will continue to update it in the future! If you think this article is helpful to you, consider sponsoring, okay? Thank you~

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.