diff --git a/encoding/README.md b/encoding/README.md index 2f6bd55..933ef2d 100644 --- a/encoding/README.md +++ b/encoding/README.md @@ -1,19 +1,53 @@ # Encoding -The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of system, e.g. binary data being sent over the response of an API calling, or viewing special characters on a debug console or unit test function. The goal is not to keep information secret, but rather to ensure that it’s able to be properly consumed. -Encoding transforms data into another format using a scheme that is publicly available so that it can easily be reversed. It does not require a key as the only thing required to decode it is the algorithm that was used to encode it. +## Table of contents -In cryptography, every object is converted to a byte array to be used as the input of a process and the result of process will be a byte array. But in action, we encounter to objects from different types. So the question is how should we convert them to a byte array? or vice versa, How can we encode the result byte array to a human-readable string for debugging, storable value in a database table or transmittable through a RESTful API response body? +- ### [Purpose](#purpose) +- ### [How it works](#how-it-works) +- ### [Sample](#sample) +- ### [Best practices](#best-practices) -Using `Hello, World` is the most popular example in developers world. So, let start with it and try to represent it as a byte array : +## Purpose + +The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of +system. The goal is not to keep information secret, but rather to ensure that it’s able to be properly consumed. +Encoding transforms data into another format using a scheme that is publicly available so that it can easily be +reversed. + +## How it works + +An algorithm and a key are needed things to encode data. The encoded data could be any kind of data like sample binary +data which will be sent over an API response or special characters on a debug console or unit test functions. On the +other hand, to decode the encoded data you need the algorithm that was used to encode data plus the key of encryption. + +But what is happening under the hood? In cryptography, every object will be converted to an array of bytes. This array +will be used as the input of a process and the result of the process will be another array of bytes. The last array of +bytes is our encoded data. + +In action, we encounter two objects from different types as input and output. So the question is how we should convert +things to an array of bytes. Or vice versa how we can decode the array of bytes to a human-readable string for +debugging, storable value in a database or transmittable through a RESTful API response? You'll find the +solution [Here](#sample). + +## Sample + +Using `Hello, World` is the most popular example in the developers' world. So, let's try to represent it as an array of +bytes. ``` echo -n 'Hello, World' | od -vt x1 -0000000 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 +0000000 48 65 6c 6c 6f 2c 20 57 6f 72 6c 64 ``` -Each represented byte in output, is equal to the `Hello, World` correspondent character in ASCII Table. e.g. `H` is equal to `0x48` and so on. -The `String` is a basic data type. The main question is what should we do if we'd like to represent complex data structures in byte array and vice versa. It seems easy at first glance, but I highly recommend to persuade the next sections : +Each represented byte in output is equal to the `Hello, World` correspondent character in ASCII Table. e.g. `H` is equal +to `0x48` and so on. So every letter in the string is converted into a character of ASCII table. The `String` is a basic +data type. The main question is what we should do if we'd like to represent complex data structures in an array of bytes +and vice versa. It seems easy at first glance, it's not though. For that, it's highly recommended to persuade the next +sections. + +- [Binary to Text Encoding](https://github.com/KeyvanArj/cryptography-in-use/tree/main/encoding/binary-to-text): This + method will be used as the primitive tool in binary and text data manipulation. +- [Data Structure Encoding](https://github.com/KeyvanArj/cryptography-in-use/tree/main/encoding/data-structure-encoding): + This method will be used to manipulate the complex data structures in the cryptography world. -- [Binary to Text Encoding](https://github.com/KeyvanArj/cryptography-in-use/tree/main/encoding/binary-to-text) : which will be used as the primitive tools in binary and text data manipulation. -- [Data Structure Encoding](https://github.com/KeyvanArj/cryptography-in-use/tree/main/encoding/data-structure-encoding): which will be used to manipulate the complex data structures in cryptography world. +## Best practices \ No newline at end of file diff --git a/encoding/binary-to-text/README.md b/encoding/binary-to-text/README.md index 6f949bc..9ccb37a 100644 --- a/encoding/binary-to-text/README.md +++ b/encoding/binary-to-text/README.md @@ -40,46 +40,71 @@ In [src/java/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/cryptog # Binary-to-text Encoding -The purpose of encoding is to transform data so that it can be properly (and safely) consumed by a different type of system, e.g. binary data being sent over the response of an API calling, or viewing special characters on a debug console or unit test function. The goal is not to keep information secret, but rather to ensure that it’s able to be properly consumed. -Encoding transforms data into another format using a scheme that is publicly available so that it can easily be reversed. It does not require a key as the only thing required to decode it is the algorithm that was used to encode it. +## Table of contents + +- ### [Purpose](#purpose) +- ### [Hexadecimal (Base16)](#Hexadecimal-(Base16)) + - #### [Advantages](#advantages) + - #### [Disadvantages](#disadvantages) +- ### [Base64](#base64) + - ### [Examples](#examples) + - #### [Manual encoding](#manual-encoding) + - #### [Create a binary file](#create-a-binary-file) + - #### [Encode to standard Base64](#encode-to-standard-base64) + - #### [Decode from standard Base64](#decode-from-standard-base64) +- ### [Text-to-binary decoding](#text-to-binary-decoding) + +## Purpose + +To understand the purpose of Encoding, please check [here](../../README.md#purpose) ## Hexadecimal (Base16) -Base16 can also refer to a binary to text encoding belonging to the same family as Base32, Base58, and Base64. +The Hexadecimal is a numeral system made up of 16 symbols to write and share numerical values. Base16 can also refer to +a binary to text encoding belonging to the same family as Base32, Base58, and Base64. -In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16 symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with standard written notation for hexadecimal numbers. +In this case, data is broken into 4-bit sequences, and each value (between 0 and 15 inclusively) is encoded using 16 +symbols from the ASCII character set. Although any 16 symbols from the ASCII character set can be used, in practice the +ASCII digits '0'–'9' and the letters 'A'–'F' (or the lowercase 'a'–'f') are always chosen in order to align with +standard written notation for hexadecimal numbers. + +### Advantages There are several advantages of Base16 encoding: -- Most programming languages already have facilities to parse ASCII-encoded hexadecimal -- Being exactly half a byte, 4-bits is easier to process than the 5 or 6 bits of Base32 and Base64 respectively -The symbols 0-9 and A-F are universal in hexadecimal notation, so it is easily understood at a glance without needing to rely on a symbol lookup table -- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"), making it more efficient in hardware than Base32 and Base64 +- Most programming languages already have facilities to parse ASCII-encoded hexadecimal. +- Being exactly half a byte (4-bits) is easier to process than the 5 or 6 bits of Base32 and Base64 respectively. The + symbols 0-9 and A-F are universal in hexadecimal notation, so it would be easily understood at a glance without + needing to rely on a symbol lookup table. +- Many CPU architectures have dedicated instructions that allow access to a half-byte (otherwise known as a "nibble"), + making Base16 more efficient in hardware than Base32 and Base64. + +### Disadvantages The main disadvantages of Base16 encoding are: -- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively. +- Space efficiency is only 50%, since each 4-bit value from the original data will be encoded as an 8-bit byte. In + contrast, Base32 and Base64 encodings have a space efficiency of 63% and 75% respectively. - Possible added complexity of having to accept both uppercase and lowercase letters. ## Base64 -Here, we are talking about the `Base64` encoding from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648). +Here, we are talking about the `Base64` encoding +from [RFC4648 - The Base16, Base32, and Base64 Data Encodings](https://tools.ietf.org/html/rfc4648). There are two different versions defined in RFC 4648: * Standard * With URL and Filename Safe Alphabet -The encoding process represents 24-bit groups of input bits as output -strings of 4 encoded characters. Proceeding from left to right, a -24-bit input group is formed by concatenating 3 8-bit input groups. -These 24 bits are then treated as 4 concatenated 6-bit groups, each -of which is translated into a single character in the base 64 -alphabet. +The encoding process takes 24-bit groups as input and represents 4 encoded characters string as output. -Each 6-bit group is used as an index into an array of 64 printable -characters. The character referenced by the index is placed in the -output string. +The encoding process represents 24-bit groups of input bits as output strings of 4 encoded characters. Proceeding from +left to right, a 24-bit input group is formed by concatenating 3 8-bit input groups. These 24 bits are then treated as 4 +concatenated 6-bit groups, each of which is translated into a single character in the base 64 alphabet. + +Each 6-bit group is used as an index into an array of 64 printable characters. The character referenced by the index is +placed in the output string. The Base 64 Alphabet Table @@ -102,23 +127,22 @@ The Base 64 Alphabet Table 15 P 32 g 49 x 16 Q 33 h 50 y -Special processing is performed if fewer than 24 bits are available -at the end of the data being encoded. A full encoding quantum is -always completed at the end of a quantity. When fewer than 24 input -bits are available in an input group, bits with value zero are added -(on the right) to form an integral number of 6-bit groups. -Since it encodes by group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for padding. +Special processing is performed if fewer than 24 bits are available at the end of the data being encoded. A full +encoding quantum is always completed at the end of a quantity. When fewer than 24 input bits are available in an input +group, bits with value zero are added (on the right) to form an integral number of 6-bit groups. Since it encodes by +group of 3 bytes, when last group of 3 bytes miss one byte then = is used, when it miss 2 bytes then == is used for +padding. -In `URL/Filename safe` version, the `-` is used for `62` instead of `+` , -and the `_` is used for `63` instead of `/` . This encoding may be referred to as "base64url". -This encoding should not be regarded as the same as the "base64" encoding and -should not be referred to as only "base64". +In `URL/Filename safe` version, the `-` is used for `62` instead of `+` , and the `_` is used for `63` instead of `/`. +This encoding may be referred to as "base64url". +This encoding should not be regarded as the same as the "base64" encoding and should not be referred to as only "base64" +. -In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021. +In `OpenSSL` , the `Standard` version has been implemented since OpenSSL 1.1.1j 16 Feb 2021. -### Example +### Examples -#### manual encoding +#### Manual encoding Suppose that the input byte array is [0xff, 0xe2]. @@ -140,31 +164,30 @@ The output length is not the multiplier of 4, so add `=` as the padding characte `/` `+` `I` `=` -If we try to do same one for `base64url` : +If we try to do same one for `base64url`: `_` `-` `I` `=` -##### create a binary file +#### Create a binary file -You can use `echo` in command line interface : +You can use `echo` in command line interface: ``` $ echo -n -e \\xff\\xe2 > data_binary.bin ``` -To check the content of the binary file : +To check the content of the binary file: ``` $ hexdump data_binary.bin ``` -##### encode to standard Base64 +#### Encode to standard Base64 ``` $ openssl enc -base64 -e -in data_binary.bin ``` - -##### decode from standard Base64 +#### Decode from standard Base64 ``` $ openssl enc -base64 -d <<< /+I= | od -vt x1 @@ -177,3 +200,25 @@ In [src/python/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/crypt ##### Java In [src/java/cryptography-in-use/cryptolib](https://github.com/KeyvanArj/cryptography-in-use/tree/main/src/java/cryptography-in-use/src/main/java/cryptolib) folder, you can find the `BinaryEncoder.java` source code contains the `hex` and `base64` encoder/decoders implementations. Their unit-tests also are available in [src/java/cryptography-in-use/test](https://github.com/KeyvanArj/cryptography-in-use/tree/main/src/java/cryptography-in-use/test/java/cryptolib) folder as the `BinaryEncoderTest.java` source code. +======= +In many situations, we have some text values which should be decoded to an equivalent byte arrays to use as the input of +a cryptographic process. For example, assume that we have message for an authorized party in text and we need to encrypt +it before transmission. The encryption process accepts a byte array as the input, so we need to convert the message to a +byte array : + +``` +$ echo -n 'Hello, World' | od -t x1 +0000000 48 65 6c 6c 6f 20 57 6f 72 6c 64 +``` + +or in other representation way: + + ``` + $ echo -n 'Hello, World' | xxd -ps + 48656c6c6f2c20576f726c64 + ``` + +But what does it mean really? It's very important for you to understand what happens exactly in this conversion. Take a +look at the `ASCII Table` again. `0x48` refers to the hexadecimal representation of `H` character, `0x65` refers to `e` +character and so on. So, every character in the `Hello, World` message is converted to a hexadecimal value +from `ASCII Table`. It means that we have done the `ASCII` decoding process. Did we have any other option? Yes, diff --git a/encoding/data-structure-encoding/README.md b/encoding/data-structure-encoding/README.md index 85d580c..e744ad9 100644 --- a/encoding/data-structure-encoding/README.md +++ b/encoding/data-structure-encoding/README.md @@ -1,33 +1,69 @@ # Data Structure Encoding -In computing, serialization is the process of translating a data structure or object state into a format that can be stored (for example, in a file or memory data buffer) or transmitted (for example, across a computer network) and reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread according to the serialization format, it can be used to create a semantically identical clone of the original object. For many complex objects, such as those that make extensive use of references, this process is not straightforward. Serialization of object-oriented objects does not include any of their associated methods with which they were previously linked. - -This process of serializing an object is also called marshalling an object in some situations.The opposite operation, extracting a data structure from a series of bytes, is de-serialization, (also called un-serialization or un-marshalling). +## Table of contents + +- ### [Definition](#definition) +- ### [Text-based encoding formats](#text-based-encoding-formats) + - #### [PEM](#pem) + - ##### [Advantage and disadvantage](#advantage-and-disadvantage) + - ##### [File content](#file-content) + - ##### [Public key](#public-key) +- ### [Base64](#base64) + - ### [Examples](#examples) + - #### [Manual encoding](#manual-encoding) + - #### [Create a binary file](#create-a-binary-file) + - #### [Encode to standard Base64](#encode-to-standard-base64) + - #### [Decode from standard Base64](#decode-from-standard-base64) +- ### [Text-to-binary decoding](#text-to-binary-decoding) + +## Definition + +In computing, serialization is the process of translating a data structure or object state into a format that can be +stored (for example, in a file or memory data buffer) or transmitted (for example, across a computer network) and +reconstructed later (possibly in a different computer environment). When the resulting series of bits is reread +according to the serialization format, it can be used to create a semantically identical clone of the original object. +For many complex objects, such as those that make extensive use of references, this process is not straightforward. +Serialization of object-oriented objects does not include any of their associated methods with which they were +previously linked. + +This process of serializing an object is also called marshalling an object in some situations.The opposite operation, +extracting a data structure from a series of bytes, is de-serialization, (also called un-serialization or +un-marshalling). [Comparison of data-serialization formats](https://en.wikipedia.org/wiki/Comparison_of_data-serialization_formats) [Serialization](https://en.wikipedia.org/wiki/Serialization) ## Text-based encoding formats -### PEM [RFC 7468](https://tools.ietf.org/html/rfc7468) - -Several security-related standards used on the Internet define ASN.1 -data formats that are normally encoded using the Basic Encoding Rules -(BER) or Distinguished Encoding Rules (DER) [X.690](https://en.wikipedia.org/wiki/X.690), which are -binary, octet-oriented encodings. -A disadvantage of a binary data format is that it cannot be -interchanged in textual transports, such as email or text documents. -One advantage with text-based encodings is that they are easy to -modify using common text editors; for example, a user may concatenate -several certificates to form a certificate chain with copy-and-paste -operations. -The content of a PEM file begins with a header such as `-----BEGIN CERTIFICATE-----` in a stand-alone line and ends with a -footer like `-----END CERTIFICATE-----` in the same way. The contents between header and footer tags are base64 encoded string of the related object in DER-encoded format. Except the header, the last line of content and footer lines, each line has the length of 64 characters. So, to parse a PEM file, you need to know the exact definition of the encoded object in ASN.1 syntax. you can use this online tool to check the content of a PEM file [PEM Parser](https://8gwifi.org/PemParserFunctions.jsp) or [Decode PEM data](https://report-uri.com/home/pem_decoder). - - +### PEM + +[RFC 7468](https://tools.ietf.org/html/rfc7468) + +Several security-related standards used on the Internet define [ASN.1](https://en.wikipedia.org/wiki/ASN.1) data formats +that are normally encoded using the Basic Encoding Rules (BER) or Distinguished Encoding Rules ( +DER) [X.690](https://en.wikipedia.org/wiki/X.690), which are binary, octet-oriented encodings. + +#### Advantage and disadvantage + +A disadvantage of a binary data format is that it cannot be interchanged in textual transports, such as email or text +documents. One advantage with text-based encodings is that they are easy to modify using common text editors; for +example, a user may concatenate several certificates to form a certificate chain with copy-and-paste operations. + +#### File content + +The content of a PEM file begins with a header such as `-----BEGIN CERTIFICATE-----` in a stand-alone line and ends with +a footer like `-----END CERTIFICATE-----` in the same way. The contents between header and footer tags are base64 +encoded string of the related object in DER-encoded format. Except the header, the last line of content and footer +lines, each line has the length of 64 characters. So, to parse a PEM file, you need to know the exact definition of the +encoded object in ASN.1 syntax. you can use this online tool to check the content of a PEM +file [PEM Parser](https://8gwifi.org/PemParserFunctions.jsp) +or [Decode PEM data](https://report-uri.com/home/pem_decoder). + #### Public Key -a PEM file which contains a public key begins with the line `-----BEGIN PUBLIC KEY-----` and ends with the line `-----END PUBLIC KEY-----`. Between these two tags is the base64 encoded string of `SubjectPublicKeyInfo` object in DER-encoded format: +A PEM file which contains a public key begins with the line `-----BEGIN PUBLIC KEY-----` and ends with the +line `-----END PUBLIC KEY-----`. Between these two tags is the base64 encoded string of `SubjectPublicKeyInfo` object in +DER-encoded format: ``` -----BEGIN PUBLIC KEY----- @@ -35,7 +71,8 @@ MFkwEwYHKoZIzj0CAQYIKoZIzj0DAQcDQgAEk1qnJZfju7Cs3mcFHkaNv30Y14EX wLpQUpi1k2W+KWVSb1dnBTkavBRZ8bp0Ip1NR59PwuN/9Nf1pKu77a3PaQ== -----END PUBLIC KEY----- ``` -To parse the `SubjectPublicKeyInfo` object, you need to follow these steps : + +To parse the `SubjectPublicKeyInfo` object, you need to follow these steps: - decode the base64 string (e.g. use this online tool [Cryptii](https://cryptii.com/pipes/base64-to-hex)): @@ -49,7 +86,8 @@ or you can use the following `OpenSSL` command : $ openssl ec -pubin -inform DER -in certificate.cer -outform PEM -out certificate.pem ``` -- parse the resulting byte array(`DER` formatted) according to the ASN.1 syntax of `SubjectPublicKeyInfo` [RFC5280](https://datatracker.ietf.org/doc/html/rfc5280#section-4.1.1.2): +- parse the resulting byte array(`DER` formatted) according to the ASN.1 syntax + of `SubjectPublicKeyInfo` [RFC5280](https://datatracker.ietf.org/doc/html/rfc5280#section-4.1.1.2): ``` SubjectPublicKeyInfo ::= SEQUENCE { @@ -63,7 +101,7 @@ AlgorithmIdentifier ::= SEQUENCE { parameters ANY DEFINED BY algorithm OPTIONAL } ``` -you can use the [ASN.1 Javascript decoder](https://lapo.it/asn1js/) online tool to check the parser result : +you can use the [ASN.1 Javascript decoder](https://lapo.it/asn1js/) online tool to check the parser result: ``` SEQUENCE (2 elem) @@ -73,18 +111,25 @@ SEQUENCE (2 elem) BIT STRING (520 bit) 0000010010010011010110101010011100100101100101111110001110111011101100… ``` -We know that the `SEQUENCE` tag is `0x30` so the byte array is started with this value. Here, `0x59` equals to the length of the `SEQUENCE` object in bytes. The next `0x30` means that there is another `SEQUENCE` as we expect from the `AlgorithmIdentifier` definition syntax. The `SubjectPublicKey` contains the public key bytes and included as a `BIT STRING` in the object. `BIT STRING` tag is `0x03` which you can find it in the byte array easily followed by `0x42`(it's length in bytes). +We know that the `SEQUENCE` tag is `0x30` so the byte array is started with this value. Here, `0x59` equals to the +length of the `SEQUENCE` object in bytes. The next `0x30` means that there is another `SEQUENCE` as we expect from +the `AlgorithmIdentifier` definition syntax. The `SubjectPublicKey` contains the public key bytes and included as +a `BIT STRING` in the object. `BIT STRING` tag is `0x03` which you can find it in the byte array easily followed +by `0x42`(it's length in bytes). -Sometimes, the cryptographic objects such as `Certificate`s, `Public Key`s, ... may be stored or transmitted in `DER` format (`.der`). -For example the following command will export the former public key (an EC Public Key) from `PEM` format to its equivalent `DER` one: +Sometimes, the cryptographic objects such as `Certificate`s, `Public Key`s, ... may be stored or transmitted in `DER` +format (`.der`). For example the following command will export the former public key (an EC Public Key) from `PEM` +format to its equivalent `DER` one: ``` $ openssl ec -pubin -inform PEM -in public-key.pem -outform DER -out public-key.der ``` -The resulting `.der` file contains the base64 decoded of `.pem` file. Please note that the `OpenSSL` command for a `RSA Public Key` is as the following one: +The resulting `.der` file contains the base64 decoded of `.pem` file. Please note that the `OpenSSL` command for +a `RSA Public Key` is as the following one: For `RSA Public Key` + ``` $ openssl rsa -pubin -inform PEM -in public-key.pem -outform DER -out public-key.der ``` @@ -92,6 +137,7 @@ $ openssl rsa -pubin -inform PEM -in public-key.pem -outform DER -out public-key #### Certificate For `Certificate` + ``` $ openssl x509 -inform PEM -in certificate.pem -outform DER -out certificate.der ``` @@ -102,9 +148,15 @@ Note: `Certificate` in `DER` format may be stored in `.der`, `.cer` or `.crt` fi ### [ASN.1](https://en.wikipedia.org/wiki/ASN.1) -Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer networking, and especially in cryptography. +Abstract Syntax Notation One (ASN.1) is a standard interface description language for defining data structures that can +be serialized and deserialized in a cross-platform way. It is broadly used in telecommunications and computer +networking, and especially in cryptography. -Protocol developers define data structures in ASN.1 modules, which are generally a section of a broader standards document written in the ASN.1 language. The advantage is that the ASN.1 description of the data encoding is independent of a particular computer or programming language. Because ASN.1 is both human-readable and machine-readable, an ASN.1 compiler can compile modules into libraries of code, codecs, that decode or encode the data structures. Some ASN.1 compilers can produce code to encode or decode several encodings. +Protocol developers define data structures in ASN.1 modules, which are generally a section of a broader standards +document written in the ASN.1 language. The advantage is that the ASN.1 description of the data encoding is independent +of a particular computer or programming language. Because ASN.1 is both human-readable and machine-readable, an ASN.1 +compiler can compile modules into libraries of code, codecs, that decode or encode the data structures. Some ASN.1 +compilers can produce code to encode or decode several encodings. [X.690](https://en.wikipedia.org/wiki/X.690) is an ITU-T standard specifying several ASN.1 encoding formats: @@ -112,9 +164,12 @@ Protocol developers define data structures in ASN.1 modules, which are generally - Canonical Encoding Rules (CER) - Distinguished Encoding Rules (DER) -Any ASN.1 encoding begins with two common bytes (or octets, groupings of eight bits) that are universally applied regardless of the type. The first byte is the type indicator, which also includes some modification bits we shall briefly touch upon. The second byte is the length header. +Any ASN.1 encoding begins with two common bytes (or octets, groupings of eight bits) that are universally applied +regardless of the type. The first byte is the type indicator, which also includes some modification bits we shall +briefly touch upon. The second byte is the length header. -We will use the [asn1parse](https://www.openssl.org/docs/manmaster/man1/openssl-asn1parse.html) command of `OpenSSL` with [ASN1_generate_nconf](https://www.openssl.org/docs/manmaster/man3/ASN1_generate_nconf.html) formatted file. +We will use the [asn1parse](https://www.openssl.org/docs/manmaster/man1/openssl-asn1parse.html) command of `OpenSSL` +with [ASN1_generate_nconf](https://www.openssl.org/docs/manmaster/man3/ASN1_generate_nconf.html) formatted file. Some of the more applicable data types are: @@ -142,22 +197,24 @@ Some of the more applicable data types are: - SET, SET OF : Constructed, tag = 0x11 -The header byte is always placed at the start of any ASN.1 encoding and is divides into three parts: the classification, the constructed bit, and the primitive type. The header byte is broken as shown here : +The header byte is always placed at the start of any ASN.1 encoding and is divides into three parts: the classification, +the constructed bit, and the primitive type. The header byte is broken as shown here : - bits 8,7 : Classification -- bit 6 : Constructed +- bit 6 : Constructed - bits 5..1 : Primitive Type The classification bits refer to : -| Class | Bit 8 | Bit 7 | +| Class | Bit 8 | Bit 7 | | :---------------| :-----| :-----| -|universal | 0 | 0 | -|application | 0 | 1 | -|context-specific | 1 | 0 | -|private | 1 | 1 | +|universal | 0 | 0 | +|application | 0 | 1 | +|context-specific | 1 | 0 | +|private | 1 | 1 | -`Primitive` method applies to simple types and types derived from simple types by implicit tagging. It requires that the length of the value be known in advance. +`Primitive` method applies to simple types and types derived from simple types by implicit tagging. It requires that the +length of the value be known in advance. Simple Integer : put this lines as the content of `int.cnf` file: @@ -172,8 +229,8 @@ openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der 000000 02 01 04 ``` -As we expected, `0x02` refers to the `INTEGER` tag, `0x01` is the length of it and `0x04` is its value. -Now, change the value in `int.cnf` file to `65889` and run it again: +As we expected, `0x02` refers to the `INTEGER` tag, `0x01` is the length of it and `0x04` is its value. Now, change the +value in `int.cnf` file to `65889` and run it again: ``` openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der @@ -192,19 +249,21 @@ asn1=NULL openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der 000000 05 00 ``` + `0x05` is the corresponding tag to `NULL`. -Tagging is useful to distinguish types within an application; it is also commonly used to distinguish component types within a structured type. For instance, optional components of a SET or SEQUENCE type are typically given distinct context-specific tags to avoid ambiguity. -There are two ways to tag a type: implicitly and explicitly. +Tagging is useful to distinguish types within an application; it is also commonly used to distinguish component types +within a structured type. For instance, optional components of a SET or SEQUENCE type are typically given distinct +context-specific tags to avoid ambiguity. There are two ways to tag a type: implicitly and explicitly. -Implicitly tagged types are derived from other types by changing the tag of the underlying type. +Implicitly tagged types are derived from other types by changing the tag of the underlying type. [[class] number] IMPLICIT Type class = UNIVERSAL | APPLICATION | PRIVATE -where Type is a type, class is an optional class name, and number is the tag number within the class, a nonnegative integer. -If the class name is absent, then the tag is context-specific. +where Type is a type, class is an optional class name, and number is the tag number within the class, a nonnegative +integer. If the class name is absent, then the tag is context-specific. Keep going and put an `IMPLICIT` tag on it : @@ -217,7 +276,8 @@ openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der 000000 81 01 04 ``` -`8` octet shows that it has context-specific class and is a `primitive` not a `constructed`. and `1` is the tag number of it. +`8` octet shows that it has context-specific class and is a `primitive` not a `constructed`. and `1` is the tag number +of it. Now, let try this one : @@ -232,19 +292,21 @@ openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der `4` octet shows that it has application class. -A real example : KCS #8's `PrivateKeyInfo` type has an optional attributes component with an implicit, context-specific tag: +A real example : KCS #8's `PrivateKeyInfo` type has an optional attributes component with an implicit, context-specific +tag: -PrivateKeyInfo ::= SEQUENCE { - version Version, - privateKeyAlgorithm PrivateKeyAlgorithmIdentifier, - privateKey PrivateKey, - attributes [0] IMPLICIT Attributes OPTIONAL } +PrivateKeyInfo ::= SEQUENCE { version Version, privateKeyAlgorithm PrivateKeyAlgorithmIdentifier, privateKey PrivateKey, +attributes [0] IMPLICIT Attributes OPTIONAL } -Here the underlying type is Attributes, the class is absent (i.e., context-specific), and the tag number within the class is 0. +Here the underlying type is Attributes, the class is absent (i.e., context-specific), and the tag number within the +class is 0. -`Constructed, definite-length` method applies to simple string types, structured types, types derived simple string types and structured types by implicit tagging, and types derived from anything by explicit tagging. It requires that the length of the value be known in advance. +`Constructed, definite-length` method applies to simple string types, structured types, types derived simple string +types and structured types by implicit tagging, and types derived from anything by explicit tagging. It requires that +the length of the value be known in advance. -For example a `SEQUENCE` will be shown by `0x30` tag, because it's a constructed type so the `6`th bit will be `1` and makes the `0x10` tag to `0x30`. The same approach cause that a `SET` will be started by `0x31`. +For example a `SEQUENCE` will be shown by `0x30` tag, because it's a constructed type so the `6`th bit will be `1` and +makes the `0x10` tag to `0x30`. The same approach cause that a `SET` will be started by `0x31`. Explicit tagging denotes a type derived from another type by adding an outer tag to the underlying type. @@ -252,7 +314,8 @@ Explicit tagging denotes a type derived from another type by adding an outer tag `class` = UNIVERSAL | APPLICATION | PRIVATE -where `Type` is a type, `class` is an optional class name, and `number` is the tag number within the class, a nonnegative integer. +where `Type` is a type, `class` is an optional class name, and `number` is the tag number within the class, a +nonnegative integer. If the `class` name is absent, then the tag is `context-specific`. @@ -269,10 +332,10 @@ openssl asn1parse -genconf int.cnf -noout -out int.der && hexdump int.der 000000 a1 03 02 01 04 ``` -We do not specified the class in `int.cnf` file, so its class is `context-specific` as the default : `1 0` in bits 8,7 and `constructed` `1` in bit 6. -The Tag number is also appeared in second octet of byte `1`. +We do not specified the class in `int.cnf` file, so its class is `context-specific` as the default : `1 0` in bits 8,7 +and `constructed` `1` in bit 6. The Tag number is also appeared in second octet of byte `1`. -No try to determine the class of object an set it to `Application` : +No try to determine the class of object an set it to `Application` : ``` asn1=EXPLICIT:1A, INTEGER:4