top of page

Adler & Partners Group

Public·9 members
Adrian Foster
Adrian Foster

Mastering Inline Assembly for ARM: A Review of the ARM GCC Inline Assembler Cookbook


ARM GCC Inline Assembler Cookbook




Are you a C or C++ programmer who wants to write fast and efficient code for ARM processors? Do you want to learn how to use inline assembly in GCC to access low-level features and optimize your applications? If so, this article is for you. In this article, you will learn:




ARM GCC Inline Assembler Cookbook



  • What is inline assembly and why you should use it



  • How to use inline assembly in GCC with basic syntax and rules



  • How to write inline assembly for ARM with instruction set and registers



  • How to optimize inline assembly for ARM with tips and tricks



By the end of this article, you will have a solid understanding of how to write and use inline assembly in GCC for ARM processors. You will also have a handy reference of the most common and useful inline assembly statements, operands, functions, and instructions for ARM. So let's get started!


What is inline assembly?




Inline assembly is a feature of some compilers that allows you to embed assembly code directly into your C or C++ source code. This way, you can mix high-level language code with low-level machine code without having to write separate assembly files or use external tools.


There are many benefits of using inline assembly, such as:


  • You can access hardware-specific features that are not available or supported by the compiler or the standard library.



  • You can optimize critical sections of your code for performance, size, or power consumption.



  • You can implement custom algorithms or functions that are faster or more efficient than the compiler-generated ones.



  • You can learn more about the architecture and instruction set of the processor you are targeting.



However, there are also some drawbacks of using inline assembly, such as:


  • You have to deal with the syntax and rules of both the high-level language and the assembly language.



  • You have to ensure that your inline assembly code is compatible with the compiler's optimization level, calling conventions, and register usage.



  • You have to maintain your inline assembly code separately for different platforms, architectures, or compilers.



  • You have to debug your inline assembly code with more difficulty than the high-level language code.



Therefore, you should use inline assembly only when necessary and when you are confident that it will improve your code quality or performance. You should also follow some best practices when using inline assembly, such as:


  • Use inline assembly sparingly and only for small and simple code snippets.



  • Use comments and documentation to explain the purpose and logic of your inline assembly code.



  • Use macros or functions to encapsulate your inline assembly code and make it reusable.



  • Test and benchmark your inline assembly code thoroughly and compare it with the compiler-generated code.



How to use inline assembly in GCC?




GCC is one of the most popular and widely used compilers for C and C++ languages. It supports many platforms and architectures, including ARM. GCC also supports inline assembly with a syntax that is similar to the one used by the GNU assembler (GAS).


The basic syntax of inline assembly in GCC is:


asm ("assembly code" : output operands : input operands : clobbered registers);


Let's break down this syntax and see what each part means.


Inline assembly operands




The operands are the arguments that are passed to or from the inline assembly code. They are enclosed in parentheses and separated by colons. There are three types of operands: output, input, and clobbered.


Output operands are the variables or memory locations that receive the results of the inline assembly code. They are preceded by an equal sign (=) to indicate that they are write-only. They are also followed by a constraint that specifies the type and location of the operand. For example:


int result; asm ("add %1, %2, %0" : "=r" (result) : "r" (a), "r" (b));


This inline assembly code adds the values of variables a and b and stores the result in variable result. The output operand result is constrained to a register (=r) and is assigned to the first placeholder (%0) in the assembly code.


Input operands are the variables or constants that provide the inputs for the inline assembly code. They are not preceded by any sign, but they are also followed by a constraint that specifies the type and location of the operand. For example:


int a = 10, b = 20; asm ("add %1, %2, %0" : "=r" (result) : "r" (a), "r" (b));


This inline assembly code uses the same assembly instruction as before, but now the input operands a and b are constrained to registers (r) and are assigned to the second (%1) and third (%2) placeholders in the assembly code.


Clobbered operands are the registers or memory locations that are modified or used by the inline assembly code. They are preceded by a tilde () to indicate that they are read-write. They are also followed by a modifier that specifies the mode or condition of the operand. For example:


asm ("add %1, %2, %0" : "=r" (result) : "r" (a), "r" (b) : "cc");


This inline assembly code uses the same assembly instruction as before, but now it also indicates that it clobbers the condition codes register (cc). This means that the inline assembly code may change the flags that are used for conditional execution.


There are many types of constraints and modifiers that can be used for inline assembly operands. Some of the most common ones are:


ConstraintDescription


rA general-purpose register


mA memory operand


iAn immediate operand


+A read-write operand


&An early-clobber operand


%A commutative operand


ccThe condition codes register


memoryThe memory clobber


Inline assembly statements




The statements are the actual assembly instructions that are executed by the processor. They are enclosed in double quotes and separated by newlines or semicolons. They can also contain placeholders that refer to the operands. For example:


asm ("mov r0, %1\n\t" "mov r1, %2\n\t" "add r0, r0, r1\n\t" "mov %0, r0" : "=r" (result) : "r" (a), "r" (b));


This inline assembly code performs the same addition as before, but now it uses four ARM instructions instead of one. The placeholders %1 and %2 refer to the input operands a and b, while the placeholder %0 refers to the output operand result.


Inline assembly functions




Sometimes, you may want to write a whole function in inline assembly instead of just a code snippet. This can be useful when you want to implement a custom function that is not provided by the compiler or the standard library, or when you want to optimize a function for a specific processor or platform.


To write an inline assembly function in GCC, you need to use some special attributes and directives that tell the compiler how to handle the function. For example:


__attribute__((naked)) void my_function (int a, int b) asm ("push lr\n\t" "add r0, r0, r1\n\t" "pop pc");


This inline assembly function takes two integer parameters (a and b) and returns their sum. The attribute naked tells the compiler not to generate any prologue or epilogue code for the function, such as saving and restoring registers or setting up the stack frame. The directive push lr saves the link register (lr) on the stack, which contains the return address of the function. The directive pop pc restores the program counter (pc) from the stack, which causes the function to return.


There are many attributes and directives that can be used for inline assembly functions. Some of the most common ones are:


AttributeDescription


nakedSuppresses prologue and epilogue code


always_inlineForces the compiler to inline the function


noinlinePrevents the compiler from inlining the function


noreturnIndicates that the function does not return


usedMarks the function as used and prevents it from being removed by the linker


DirectiveDescription


push/popSaves and restores registers on/from the stack


b/bl/bx/blxCalls or returns from a function with or without changing the instruction set


stm/ldmStores and loads multiple registers to/from memory


mov/movs/movw/movtMoves or sets values to registers with or without affecting flags or using immediate values


bic/orr/eor/andPerforms bitwise operations on registers or immediate values


How to write inline assembly for ARM?




ARM is a family of processors that are widely used in embedded systems, mobile devices, and low-power applications. ARM processors have a simple and elegant instruction set that is easy to learn and use. However, there are also some features and variations that you need to be aware of when writing inline assembly for ARM.


The most important thing to know about ARM processors is that they have two instruction sets: ARM and Thumb. The ARM instruction set is the original and more powerful one, which uses 32-bit instructions and can access all 16 general-purpose registers (r0-r15). The Thumb instruction set is a newer and more compact one, which uses 16-bit instructions and can access only 8 general-purpose registers (r0-r7). The Thumb instruction set can also be extended with Thumb-2, which adds some 32-bit instructions for more functionality.


The advantage of using Thumb instructions is that they take less space in memory and can improve code density and performance. The disadvantage is that they have less flexibility and functionality than ARM instructions. Therefore, you need to choose wisely which instruction set to use for your inline assembly code, depending on your target processor and optimization goals.


To switch between ARM and Thumb instruction sets in GCC, you need to use some special directives and attributes that tell the compiler how to handle your inline assembly code. For example:


asm (".arm\n\t" "add r0, r0, r1\n\t" ".thumb\n\t" "add r0, r0, #1");


This inline assembly code uses the directive .arm to switch to the ARM instruction set and then performs an addition with two registers. Then it uses the directive .thumb to switch to the Thumb instruction set and then performs an addition with a register and an immediate value.


To write an inline assembly function that uses a specific instruction set, you need to use the attribute target that specifies the instruction set as an argument. For example:


__attribute__((target("thumb"))) void my_function (int a, int b) asm ("add r0, r0, r1");


This inline assembly function uses the attribute target("thumb") to indicate that it uses the Thumb instruction set and then performs an addition with two registers.


There are many directives and attributes that can be used to switch between ARM and Thumb instruction sets in GCC. Some of the most common ones are:


DirectiveDescription


.armSwitches to the ARM instruction set


.thumbSwitches to the Thumb instruction set


.thumb_funcMarks the following function as using the Thumb instruction set


.code 16Equivalent to .thumb


.code 32Equivalent to .arm


AttributeDescription


target("arm")Indicates that the function uses the ARM instruction set


target("thumb")Indicates that the function uses the Thumb instruction set


target("+thumb-mode")Indicates that the function uses the Thumb-2 instruction set


ARM data processing instructions




The data processing instructions are the most basic and common instructions in the ARM instruction set. They perform arithmetic, logical, and comparison operations on registers or immediate values. They can also affect the condition codes register, which is used for conditional execution.


The general syntax of data processing instructions is:


opcodeconds Rd, Rn, Operand2


Let's break down this syntax and see what each part means.


  • The opcode is the operation code that specifies the type of operation to be performed, such as add, sub, cmp, orr, etc.



  • The cond is the condition code that specifies when the instruction will be executed, such as eq, ne, gt, le, etc. If no condition code is specified, the instruction will be executed unconditionally (al).



  • The s is the suffix that indicates whether the instruction will affect the condition codes register or not. If s is present, the instruction will update the flags according to the result of the operation. If s is absent, the instruction will not affect the flags.



  • The Rd is the destination register that will receive the result of the operation.



  • The Rn is the first source register that will provide one of the operands for the operation.



  • The Operand2 is the second operand for the operation. It can be either a register or an immediate value. It can also be shifted or rotated by a constant or a register.



Here are some examples of data processing instructions:


add r0, r1, r2 // r0 = r1 + r2 sub r0, r1, #10 // r0 = r1 - 10 cmp r0, r1 // compare r0 and r1 and update flags orr r0, r1, r2, lsl #2 // r0 = r1 OR (r2 left shift by 2)


ARM load and store instructions




The load and store instructions are used to access memory and transfer data between registers and memory locations. They can also perform address calculations and multiple transfers.


The general syntax of load and store instructions is:


LDR/STRcondB/H/SH/SBT Rd, address


Let's break down this syntax and see what each part means.


  • The LDR/STR is the prefix that specifies whether the instruction will load from memory or store to memory.



  • The cond is the same as before.



), H means halfword (16 bits), SH means signed halfword, and SB means signed byte. If no suffix is specified, the default size is word (32 bits).


  • The T is the suffix that indicates whether the instruction will use post-indexed addressing or not. If T is present, the instruction will update the base register with the calculated address after the transfer. If T is absent, the instruction will use pre-indexed or offset addressing.



  • The Rd is the same as before.



  • The address is the memory address where the data will be loaded from or stored to. It can be specified in various ways, such as:



  • A register that contains the address.



  • A register that contains the base address and an immediate offset.



  • A register that contains the base address and a register offset.



  • A register that contains the base address and a register offset with a shift or a rotation.



  • A label that represents an absolute or a relative address.



  • A pseudo-register (pc or sp) that contains the program counter or the stack pointer.



Here are some examples of load and store instructions:


ldr r0, [r1] // r0 = memory[r1] str r0, [r1, #4] // memory[r1 + 4] = r0 ldr r0, [r1, r2] // r0 = memory[r1 + r2] strb r0, [r1, r2, lsl #1] // memory[r1 + (r2 left shift by 1)] = lower byte of r0 ldr r0, =my_label // r0 = address of my_label str r0, [sp, #-8]! // memory[sp - 8] = r0 and sp = sp - 8


ARM branch instructions




The branch instructions are used to alter the flow of execution and jump to different locations in the code. They can also call or return from subroutines.


The general syntax of branch instructions is:


B/BL/BX/BLXcond target


Let's break down this syntax and see what each part means.


  • The B/BL/BX/BLX is the prefix that specifies the type of branch to be performed. B means branch, BL means branch with link, BX means branch and exchange instruction set, and BLX means branch with link and exchange instruction set.



  • The cond is the same as before.



  • The target is the destination of the branch. It can be either a register that contains the address or a label that represents an absolute or a relative address.



Here are some examples of branch instructions:


b loop // branch to loop unconditionally bl my_function // branch to my_function and save return address in lr bx r0 // branch to address in r0 and switch instruction set according to bit 0 of r0 blx r1 // branch to address in r1, save return address in lr, and switch instruction set according to bit 0 of r1


ARM special instructions




The special instructions are used to perform some advanced or system-level operations that are not covered by the previous categories. They include coprocessor instructions, system instructions, and synchronization instructions.


The coprocessor instructions are used to communicate with and control coprocessors, which are special units that can perform some tasks faster or better than the main processor. For example, some ARM processors have coprocessors for floating-point arithmetic (VFP), vector operations (NEON), or security extensions (TrustZone).


The system instructions are used to access system registers or modes, which control the configuration and status of the processor and its components. For example, some ARM processors have system registers for interrupt handling (CPSR), memory management (TTBR0), or performance monitoring (PMCR).


The synchronization instructions are used to ensure data consistency and order in multiprocessor or multi-core systems. They can also perform atomic operations that read, modify, and write memory locations in one step. For example, some ARM processors have synchronization instructions for barriers (DSB), semaphores (LDREX/STREX), or exclusive access (CLREX).


The general syntax of special instructions is:


CDP/MCR/MRCcond coprocessor, opcode1, Rd, CRn, CRm, opcode2


MRS/MSRcond Rd/Rn, source/destination


PLD/PLIcond address


DSB/DMB/ISBcond option


LDREX/STREXcondB/H/D Rd, Rt, address


CLREXcond


Let's break down this syntax and see what each part means.


  • The CDP/MCR/MRC is the prefix that specifies the type of coprocessor instruction to be performed. CDP means coprocessor data processing, MCR means move to coprocessor register from ARM register, and MRC means move to ARM register from coprocessor register.



  • The cond is the same as before.



  • The coprocessor is the number of the coprocessor to be accessed, such as p10 for VFP or p15 for system control.



  • The opcode1 and opcode2 are the operation codes that specify the operation to be performed by the coprocessor.



  • The Rd is the ARM register that is used as a source or destination for the data transfer.



  • The CRn and CRm are the coprocessor registers that are used as sources or destinations for the data transfer or processing.



The MRS/MSR is the prefix that specifie


About

Welcome to the group! You can connect with other members, ge...

Members

  • Dominator Jay
    Dominator Jay
  • Janet Gee
    Janet Gee
  • Wilson Ali
  • Martin Ma
    Martin Ma
  • Promise Love
    Promise Love
bottom of page