GSM standard and how it applies to SMS


The Global System for Mobile Communications (or GSM, acronym for Groupe Spécial Mobile, the French group that initially took care of the development.) Is a 2G mobile telephony standard (abbreviation used in the field of mobile telephony to indicate second generation technologies commercially launched since 1991) approved by ETSI (European Standardization Body).

The first GSM network was developed by Telenokia and Siemens (Nokia Networks). In 2017 it became the most widely used mobile phone standard globally, relying on more than 3 billion users located in 200 different states. The GSM standard is based on the coding algorithm for digital transmissions developed by Andrea Giacomo Viterbi. It is a very widespread algorithm in the field of error correction codes (Forward Error Correction or FEC).

This algorithm is used in digital transmissions and not only from the GSM system but also in systems such as:

  • UMTS (Universal Mobile Telecommunications System: a 3G mobile phone standard, evolution of GSM);
  • DVB-T (Digital Video Broadcasting - Terrestrial: is a standard for digital terrestrial television broadcasting);
  • LTE (Long Term Evolution: new generation for mobile broadband access systems);
  • WiMAX (Worldwide Interoperability for Microwave Access: is a technical transmission standard that allows wireless access to broadband telecommunications networks).

The introduction of GSM represented a real revolution in the field of mobile telephone systems. Basically the numerous advantages over the previous systems were:

  • possibility, through unifying procedures, of interchange and interaction between different networks that adhere to a single international standard;
  • digital communications (ie with numbers encoded in sequences of bits).

The introduction of a digital transmission increases the transmission speed allowing the birth of new services (for example SMS) and guaranteeing higher security standards in terms of communication encryption. In fact, even if the GSM standard was initially developed with the main objective of managing vocal telecommunications, it was soon exploited also for data exchange. The strength of this system was the possibility for users to access a whole series of new services at very low cost. On the other hand, one of the main advantages for operators was the possibility of purchasing infrastructures and equipment at very low costs. The universal spread of the GSM standard has also allowed the birth of the so-called Roaming (it is used by mobile telephone operators to allow users to connect to each other using networks not owned by them for a fee). Since 2006, the GSM network has made it possible to use the DTM protocol (a mobile terminal that allows simultaneous voice communications and data transmissions. This terminal has made it possible, for example, to video calls over the GSM network). Although the standard is constantly evolving, GSM systems have always maintained full backwards compatibility with previous versions.

In mobile telephony GSM 03.38 (or 3GPP 23.038) is a character encoding for SMS (Short Message Service). This standard defines the default 7-bit GSM alphabet which is mandatory in GSM networks. An SMS message using this type of encoding can contain up to 160 characters including spaces and line breaks if there are no special characters among them. The following table lists all the characters that are part of the default alphabet (or “Basic set”). An “Extension set” of characters is also available, each of these however, is counted as two characters since it also includes an invisible escape character.

Finally, non-printed characters are also included:

  • space character;
  • line feed control, which indicates the end of one line of text and the beginning of another;
  • carriage return control, which passes at the beginning of a line of text (usually after a line feed character);
  • escape control, not visible in the text, is automatically added to the characters that are part of the “Extension set”.
Set Characters
GSM 03.30 Basic set A B C D E F G H I J K L M N O P Q R S T U V W X Y Z a b c d e f g h i j k l m n o p q r s t u v w x y z à Å å Ä ä Ç É é è ì Ñ ñ ò Ø ø Ö ö ù Ü ü Æ æ ß 0 1 2 3 4 5 6 7 8 9 & * @ : , ¤ $ = ! > # - ¡ ¿ ( < % . + £ ? “ ) § ; “ / _ ¥ Δ Φ Γ Λ Ω Π Ψ Σ Θ Ξ
GSM 03.30 Extension set ^ { } ¥ [ ] ~ \ €

Concatenated SMS are also supported: if the text exceeds 160 characters, multiple SMS are sent (up to a maximum of 5 or 7 depending on the operator). Depending on the operator and the recipient’s device, multiple messages may appear as a single SMS or as a sequence of separate SMS. Concatenated SMS require a continuation prefix and a sequence number, which is why the maximum allowed characters for each drops to 153. The remaining 7 characters are used by the GSM 03.38 protocol to allow concatenation. This “Standard set” defines a character alphabet suitable only for a certain number of languages, including English.

Since version 8 of the 3GPP 23.038 standard of March 2008 some languages can access an additional character set through the use of “National Language Shift Tables”. These tables allow you to use different character sets depending on the writing language. The choice is selected in the “User Data Header” section of an SMS. A “Locking shift table” can be specified for the entire text, replacing the default 7-bit GSM standard table. Or a “Single shift table” can be specified for a single character to replace the default 7-bit GSM extension table. Both the default standard table and the default extension table in the same SMS can be replaced. Using the “National Language Shift Tables”, a message can still use 7-bit encoding, but you can choose a different set to correctly show the specific writing characters for each language. Initially, the “National Language Shift Tables” were specified only for Turkish, Spanish and Portuguese. Version 9 introduced ten other languages used in India that use Brahmic scriptures (Bengali, Gujarati, Hindi, Kannada, Malayalam, Oriya, Punjabi, Tamil, Telugu) and Urdu. There is currently no support for languages that have too many distinct script characters such as Japanese written in basic Kanas, or for Korean written in Hangul Jamos, or for Chinese written in Han characters.

If you decide to insert a “special” character in the message, that is, not present in the “Basic set” and in the “Extension set” of characters available, the SMS will use a 16-bit UCS-2 (or Unicode) encoding . This encoding allows the use of a wider range of characters thus supporting more languages, at the cost of a greater expense in terms of space. A Unicode SMS allows the sending of 70 characters. The “special” characters then reduce the limit of characters allowed for each SMS from 160 to 70. The concatenation of SMS is supported but for it to be allowed the length of each SMS will go from 70 to 67 characters.

Nicola Valente

Article written by

Nicola Valente