Assembly language is a low-level programming language that exposes the instructions that are used by the computer to operate its hardware. In this article, we explore assembly language as an introduction to programming with code examples and explanations. If you’ve ever wondered what a “pointer” or “stack” is and whether you should learn assembly in your pursuit of becoming a programmer, read on. This article explains what assembly language is, its advantages and disadvantages, examples of common use cases for assembly languages, and some resources if you want to learn more about this topic. Let’s begin!
Assembly Language Definition
Assembly language is a low-level programming language that is specific to a particular type of processor. It is usually written in a mnemonic form, which is a symbolic representation of the machine code that is generated by the assembler. In assembly language, the opcodes of machine instructions are replaced by mnemonics , and the addresses of instructions or operands are replaced by address symbols or labels . In different devices, assembly language corresponds to different machine language instruction sets , and is converted into machine instructions through the assembly process. There is a one-to-one correspondence between a specific assembly language and a specific machine language instruction set, and it is not directly portable between different platforms.
History of Assembly Language
The history of assembly language dates back to the early days of computing. One of the first assembly languages was created in the 1950s for the IBM 701 computer. This language, called Autocode, was developed by Alick Glennie and was based on the existing code used for the 701’s predecessor, the IBM 650.
Autocode was soon followed by a number of other assembly languages, including FORTRAN Assembly Language (FAL) and COBOL Assembly Language (CAL). These languages were designed to make it easier to program in their respective high-level languages, and they were very successful.
Programming language
Since the advent of the world’s first electronic computer in 1946, the way and language of communication between humans and machines has become the main research direction of software engineers and computer practitioners. More efficient and simpler programming languages have become the new darling of software engineers. With the rapid development of computers, the hardware upgrade speed of computers is getting faster and faster, and the requirements for programming languages are becoming more and more strict. Programming languages have come a long way over the past few decades, and there have been 3 generations of languages to date. In order to meet the programming requirements and software functions in different fields, a large number of programming languages have undergone the process of being modified, replaced, and developed, and finally developed into the diversification of current programming languages. Despite many attempts to find a universal language that can adapt to all programming environments, none of them have been successful. The programming language is leaping ahead with modern technology, and the wisdom of human beings is increasingly manifested.
1 generation: Machine Language
Machine language is the first generation programming language. At the beginning of the invention of the computer, in order to control the computer to complete their own tasks or projects, people could only write binary strings of numbers such as “0” and “1” to control the computer. This language is machine language. Intuitively, machine language is very obscure and difficult to understand, and its meaning is often understood by looking up tables or manuals. It is very painful to use, especially when you need to modify the completed program. This kind of disordered machine language It will make you unable to start, and it is difficult to find program errors. Moreover, the operating environment of different computers is different, and the instructions and operation methods are also different. So when you have a specificity in this machine language, you can only execute it on a specific computer. And once you change the machine, you need to reprogram, which greatly reduces the use and promotion efficiency of the program. However, due to the specificity of machine language, it is perfectly adapted to a specific type of computer, so its operating efficiency is much higher than other languages.
2 generation: Assembly language
It is not difficult to see that machine language, as a programming language, has poor flexibility and readability. In order to alleviate the discomfort brought by machine language to software engineers, people have upgraded and improved machine language: use some easy-to-understand and remember letters, words in place of a specific instruction. Through this method, it is easy for people to read the completed program or understand the function that the program is performing, and the bug repair and operation and maintenance of the existing program become easier and more convenient. This language is what we call assembly language, The second generation computer language.
Compared with machine language, assembly language has higher machine dependencies and is easier to remember and write, but at the same time retains the high speed and high efficiency of machine language. Assembly language is still a machine-oriented language, it is difficult to understand the program design intention from its code, and the designed program is not easy to be transplanted, so it is not as widely used as most other high-level computer languages. Therefore, in today’s highly developed high-level languages, it is usually used at the bottom level, usually for program optimization or hardware operation.
3 generation: high level language
After the programming language has undergone the update of machine language, assembly language, etc., people have discovered the key factor limiting the generalization of programs – program portability. Need to design a program that can run on different machines independent of computer hardware. In this way, many repetitive processes of programming can be avoided, and the efficiency can be improved. At the same time, the language should be close to the mathematical language or the natural language of human beings . In the 1950s, when computers were still scarce, the first high-level programming languages were born. At that time, the cost of computers was high, but the amount of calculation per day was limited. How to effectively use the limited computing power of computers became a problem that people faced at that time. At the same time, because of the scarcity of resources, the operating efficiency of computers has also become the goal pursued by engineers in that era.
Assembly Language Composition
Due to the huge assembly instruction system, it is necessary to build an instruction system system, which has a large number of instructions, complex formats, and poor memorability. The most difficult part of the instruction is the addressing mode supported by the instruction, and its essence is how to obtain the operand in the instruction. For the processor, it is how to find the data he needs. However, for the assembly language at the bottom of the computer, this addressing method will involve a large number of computing storage formats, and is closely related to the complex storage management method, so it is difficult to understand. Finally, assembly instructions are also related to how to affect flags, but processor flags are very complex, so it is more difficult to grasp the mechanism.
send command
The send command instructions include:
- general data transfer instruction: MOV;
- conditional transfer instruction: CMOVcc;
- stack operation instruction: PUSH, PUSHA, PUSHAD, POP, POPA, POPAD;
- exchange instruction: XCHG, XLAT, BSWAP;
- address or segment descriptor selector transfer instruction: LEA, LDS, LES, LFS, LGS, LSS, etc.
logic operation
This part of the instructions is used to perform arithmetic and logical operations, including:
- addition instruction: ADD, ADC;
- subtraction instruction: SUB, SBB;
- plus one instruction: INC;
- decimal adjustment instructions: AAA, AAS, DAA, DAS;
- minus one instruction: DEC;
- comparison operation instruction: CMP;
- sign extension instructions: CBW, CWDE, CDQE;
- multiplication instruction: MUL, IMUL;
- division instruction: DIV, IDIV;
- logical operation instructions: AND, NOT, OR, XOR, TEST.
shift instruction
This part of the instruction is used to move a register or memory operand a specified number of times.
- logical left shift instruction: SHL;
- logical right shift instruction: SHR;
- arithmetic left shift instruction: SAL;
- arithmetic right shift instruction: SAR;
- circular left shift instruction: ROL ;
- circular right shift instruction: ROR, etc.
bit manipulation
This part of the instructions include:
- the bit test instruction: BT;
- the bit test and set instruction: BTS;
- the bit test and reset instruction: BTR;
- the bit test and negation instruction: BTC;
- the bit forward scan instruction: BSF;
- the bit backward scan instruction: BSR.
transfer of control
This part includes:
- unconditional transfer instruction: JMP;
- conditional transfer instruction: JCC, JCXZ;
- loop instruction: LOOP, LOOPE, LOOPNE;
- procedure call instruction: CALL;
- sub-procedure return instruction: RET;
- interrupt instruction: INTn, INT3, INTO, IRET and so on.
String manipulation
This part of the instructions is used to operate the data string, including:
- the string transfer instruction: MOVS;
- the string compare instruction: CMPS;
- the string scan instruction: SCANS;
- the string save instruction: STOS;
- the string load instruction: LODS.
input Output
This part of the instructions is used to exchange data with peripheral devices, including port input instructions IN/ INS , port output instructions OUT/OUTS.
Assembly Language Example
Here is a simple assembly language program that will add two numbers together:
; add.asm
;
; This program adds two numbers together
;
section .data
; These are the two numbers we will be adding
; We must store them in memory so the CPU can access them
num1: dw 1234
num2: dw 5678
section .text
; This is the code section of the program
; The code is a set of instructions for the CPU to execute
global _start
_start:
; Load the two numbers into registers
mov eax, [num1]
mov ebx, [num2]
; Add the numbers together
add eax, ebx
; Store the result in the num1 memory location
mov [num1], eax
; Exit the program
mov eax, 1
int 0x80
Features of Assembly Language
1. machine dependency
This is a low-level machine-oriented language, usually designed for a particular computer or family of computers. Because it is a symbolic representation of machine instructions, different machines have different assembly languages.
2. High speed and high efficiency
Assembly language maintains the advantages of machine language, and has the characteristics of directness and simplicity. It can effectively access and control various hardware devices of the computer, such as disks, memory, CPU, I/O ports, etc., and takes up less memory and has fast execution speed. , is an efficient programming language.
3. The complexity of writing and debugging
Because the hardware is directly controlled, and simple tasks also require a lot of assembly language statements, it is necessary to cover all aspects when designing programs. All possible problems need to be considered, and various software and hardware resources should be allocated and used reasonably. In this way, it will inevitably increase the burden on the programmer. In the same way, when debugging a program, once there is a problem with the operation of the program, it is difficult to find out.
Advantages
As a second-generation programming language above machine language, assembly language also has many advantages:
- Can easily read memory status and hardware I/O interface status.
- The code written can be executed accurately because there are many less compilation links.
- As a low-level language, it is highly extensible.
Disadvantages
- Because the code is very monotonous and there are few special instruction characters, it makes the code verbose and difficult to write.
- Because the assembly still needs to call the memory to store the data by itself, it is easy to have bugs , and it is not easy to debug.
- Even if a program is completed, it will take a lot of time to maintain it later.
- Because of the particularity of the machine, the defect of poor code compatibility is caused.
Assembly Language vs Machine Language
The main difference between assembly language and machine language is that assembly language is a low-level programming language that requires a separate assembler to translate it into machine code, while machine language is a direct representation of the underlying machine code.
Assembly language is more human-readable than machine language, but it is still fairly difficult to read and write. It is designed to be close to the native machine code, making it easy for the programmer to understand what the code is doing. However, this also makes it more difficult to write portable code, since the code is closely tied to the specific architecture of the machine.
Machine language, on the other hand, is the native code for the machine, and is not readable by humans. It is directly executed by the processor, and is thus more efficient than assembly language. However, it is much more difficult to program in, since the programmer must deal with the underlying machine code instructions.
Resources for Learning Assembly Language
If you want to learn more about assembly language, there are many different resources available to you. You can find books about assembly language on Amazon and at local bookstores. You can also find online courses on websites like Coursera, edX, and Udemy. If you want to learn more about computer architecture, there are many books on this topic as well. These resources can help you understand what assembly language is and its implications on computer hardware and software design.