You are here

Basic Data Structures

From a machines perspective, there is not distinction drawn between instructions in memory and data in memory. A value in memory is considered to be an instruction when the processor loads that value from memory and uses it as an instruction. Similarly, a value in memory is considered to be data when the processor loaded that value from memory and uses it as data. The contents of the memory are simply values with no semantics applied. Semantics are applied when the processor loads the value from memory and uses it in a certain way.

When programming the machine, however, it is useful and appropriate to abstract the values in memory into "instructions" and "data" by designing our programs such that the "instructions" are always treated as instructions by the processor and the "data" is always treated as data. In this way our "instructions" really are processor instructions and our data really is processor data.

In order to accomplish this, we design our programs using different syntax for instructions and data. For instance, in the C programming language the syntax + is always used to indicate the instruction "add". Similarly, the syntax int is always used to declare data of word size. You cannot used + to declare data and you cannot use int as an instruction.

The same concept applied in assembly programming. Data is declare in your program using special syntax, called directives in assembly, which are distinct from the directives used for instructions. The remainder of this tutorial will discuss the basic data structures that you will use when programming in assembly.

Declaring Strings

Strings in assembly, as in most programming languages, are arrays of byte data which are declared using special syntax. When compiling the declared string into machine language, the compiler simply takes the string text and places bytes in consecutive locations in memory. This forms an array of bytes. Strings, and arrays in general, do not have an explicit size associated with them. If the programmer needs to track the size of an array, and thus a string, they must do so manually.

The GNU AS assembler used for MicroBlaze assembly programming supports two different directives for declaring strings, .ascii and .asciz. The former version declares a string as a simple array of bytes are described. The latter forms the array in the same way as described but also null-terminates the string by placing the value 0 as the last byte of the string. For example, the following declares one string, named str1, as a simple array of bytes and another string, named str2, as a null-terminated array of bytes:

    .data
    .align 4
str1:
    .ascii "This string is not null-terminated."

    .data
    .align 4
str2:
    .asciz "This string is null-terminated."

Notices the first string uses .ascii and the second one uses .asciz. These two directives actually declare the string. The other directives are important as well, and you should use all of the directives shown for every string that you declare. The first directive, .data, informs the compiler that any directives following should be assembled into the data segment of the program. The details of program segments are beyond the scope of this tutorial, however, you should declare all of your strings in the data segment.

The second directive, .align, performs memory alignment for the data. The MicroBlaze processors only supports aligned reads and writes and so your data must be properly aligned or it will not work correctly. The number after the .align directive is the number of bytes to align to. Aligning to four bytes means that the address of the string will be 4-byte aligned, i.e. the address modulus 4 will be zero. In the MicroBlaze, byte data does not need to be aligned, half-word data must be 2 byte aligned, and word data must be 4 byte aligned. The astute reader will notice that a string is byte data but is aligned to 4 bytes. While not necessary, aligning data to 4 byte boundaries provides greater flexibility in loading that data into the processor at byte, half-word, or word data.

The third directive, either str1: or str2:, declares an assembly label. Labels in assembly are analogous to variable names in most programming languages. They are the names that you use to refer to something. Labels in the GNU AS assembler are any sequence of letters and underscores followed by a colon. The label itself can be used to refer to data declared after it.

Declaring Arrays

Arrays in assembly are declared in a very similar manner to the way strings are declare. The only difference being the directive used. Whereas strings used the .ascii and .asciz directives, arrays are declared using the .fill directive. For instance, the following declares three arrays:

    .data
    .align 4
byte_array:
    .fill 64, 1, 0

    .data
    .align 4
halfword_array:
    .fill 64, 2, 0

    .data
    .align 4
word_array:
    .fill 64, 4, 0

In this example, the .fill directive is used to create an array. This directive uses three parameters. The first parameter is the number of locations in the array, in this case 64. The second parameter is the number of bytes for this location. In this case the byte array uses 1 byte per location, the half-word array uses 2 bytes per location, and the word array uses 4 bytes per location. Thus, the byte array consumes a total of 64 bytes of memory, the half-word array consumes a total of 128 bytes of memory, and the word array consume 256 bytes of memory. The last parameter is the value to which every element of the array is initialized. In this case all elements in all arrays are initialized to the value 0.

Theme by Danetsoft and Danang Probo Sayekti inspired by Maksimer