Blog Archive

Wednesday, November 4, 2009

Introduction To 8086 Assembly Language For Beginners Part 1

Hello Folks!
All of you eager to know about this mostly misunderstood programming language, are
welcome here.

First of all, for those of you who had never been in computer era, i suggest to read
this whole article, not to escape any single paragraph.

Assembly language is a programming language that is used for creating programs for
computer.But there is a big difference between other programming languages like c++,c,java,C#,
Assembly language is low level programming language these languages are
High level language.
So the question appears that what is high level and low level languages does mean.

In computer science, a low-level programming language is a language that provides
little or no abstraction from a computer's instruction set architecture. The word
"low" refers to the small or nonexistent amount of abstraction between the language
and machine language; because of this, low-level languages are sometimes described
as being "close to the hardware."

In computing, a high-level programming language is a programming language with
strong abstraction from the details of the computer. In comparison to low-level
programming languages, it may use natural language elements, be easier to use, or be
more portable across platforms. Such languages hide the details of CPU operations
such as memory access models and management of scope.

So Assembly language is a low level programming language, that is why it is directly
influenced by architecture of computer.

Because we are studying 8086 assembly language we should know what is this 8086,
well 8086 is the name of intel's processor architecture now it is called x86
architecture,because all processors with preceding 86 is belongs to same processor
family,by writing x86 we c refer to all processors to this family there is another
architecture called x64 but in this article we will study 8086(or x86) architecture.

So because Assembly language is low level language and directly references
processor hardware, so first we need to understand the hardware and working
of 8086 (or x86) processor.

As we know every information in a computer is a number. If it is a name or address
,a file or a image every thing is collection of numbers. So every computer needs a
mechanism to store these numbers. There is a circuit called flip-flop, these
flip-flops are circuits which stores voltage state. flip-flops can only hold two
states of electric signal(high and low).These states are used to specify 0 and 1 in
computer system, a high state is represents 1 and low state is represents 0.
So memory is a large sequence of flip-flops. a single block of memory is consists
with 8 flip-flops(each flip-flop represents a bit in a block).
This block is smallest allocatable element of memory, this block is called byte in
programming terminology.

So these bytes are smallest unit of memory.A Ram(Ram is memory chip in a computer which is
used for storage purpose) chip in contain millions of bytes in it. But there is small problem
with the ram, it is relatively slow in comparison to
processor. For any operation to complete by processor, it is needed every time to
access memory for data to be processed. And if ram is not working at a speed of
processor, then every time processor will tool a long time to execute a small
operation, the execution will become very difficult and slow.
There are one more problem with the ram as memory. We all know that every executable
program in computer is stored in some standard formats like bin file,com file or exe
file and many more. All of these files contains programs to executes, all of these
programs have some common factors like data to be used for operations(variables),
operations to be performed on data(instructions for processors) and a data structure
to maintain procedure (function calls). The problem is that when a program is loaded
in ram for execution, how processor will know that where to get data, and from what section in
program to get instruction to perform on data. if processor have only ram for storing something for temporary use.

To resolve above problems registers have been introduced in processors.
registers are nothing but small storage elements, they are consists with flop-flops
just like ram , but with some differences.
registers are inbuilt in processor chip and ram is a separate device in computer
system. Because registers are inbuilt in processor chip they have same speed like
processor it self.
Second difference is that ram has very large storage area and registers generally
have just one or two bytes.
Because registers are used for short time storage in processor, after operation is
complete result is moved from register to ram.
processor always perform operations on values in registers insted of ram.
Whenever a operation is to be performed in processors all the values related to
that particular operation is first transferred to registers and operation is performed.
So that is what the solution of our first problem of speed.
But what about to second one, how processor will know that where is data in program
file and where is instructions to be execute.
That is why processor has many registers not just one or two, a register for each
work which is needed to execute a instructions.
Many register has assigned special work , and that particular register can only do
the work which is assigned to it and nothing else.
But there are some general purpose registers which can be used to store any value we
need, we can change values of these registers in a program.

In 8086 processor have many registers in it but we will discuss it after some time.
First we need to talk about one more important part of a processor which is Arithmetic Logical Unit (ALU).
ALU is responsible for arithmetic operations.
ALU is consist with many logic circuits which accepts some inputs and perform
operation according to their design. ALU has many circuits, every circuit have
a designed to do a single operation. There are a circuit for every operation,
like adding to values , compering values, subtracting values or multiplying values.
Every circuit is takes values from registers and saves result in a register.

But know a Questions a rises that how ALU is decide to which circuit is to active,
to perform desired operation.
So the answer is that every circuit in ALU has a specific number, and there is a
control unit to decode this number and activate needed circuit for operation.

So there is a number for every operation to be performed by ALU, so there is a number for add, for subtract,
for multiply , for compare and so on.
so we need to write these numbers in our program to perform operations.

now lets back to our discussion over registers. As we know that 8086(x86) has
certain general purpose registers and some special purpose registers, we can
assign any value to them according to our need.

So below is the list of general purpose registers of 8086 (x86) processor.
Every register has size of two bytes.

1.The accumulator register:- this register is used in many arithmetic operations,
like multiply and divide takes one input from accumulator and stores result in it.

2.The base address register:-this is the only register that can be used to extend
addressing (we will discuss it later), and it can also be used for computation.

3. The count register:- this is used for temporarily save any value. It can also be
used for computation.

4. The data register:- Some input/output operations require it.and it can also used
to hold computational values.

Following aere some index registers. These are used for sequence addressing (like
arrays in c).

1.Source index register.
2.destination index register.

There are many more registers in the 80886(x86) processor but we will discuss them in later chapter.

Now!!!! finally lets talk about assembly language. So did you thought that why are
we studying all above processor hardware and architecture.
So the answers is that assembly language is nothing but is a way to write
instructions for processor in which every component(hardware device) or circuit has
a spacial and understandable name. Like, as we know that in ALU every circuit has
a unique number, and we have to mention that number in our program, in order to
perform that particular operation. In machine language we to remember all the
numbers and write them in our program, this will leads to many difficulties in
order to read the and maintain program.
But in assembly language we don't need to remember all the numbers of alu,
suppose in ALU adder is assigned number 122. In Machine language we have
to write 122 for addition, but in assembly language we can write 'ADD' on the
place of 122, this makes assembly language easy to understand in comparison with
machine language.
Similarly you have to write numbers for registers also in order to use them.
But in Assembly every register has a unique keyword so we can use them on the
place of register numbers.
For Example Accumulator Register has keyword AX in assembly language.
By Writing AX we can refer to accumulator register which is again a hardware device.
Similarly DX for Data register,CX for counter register ,BX for base address register
, SI for source index register and DI for destination index register.

So now its time to do some programming, but before that we need to know about
formats of program files.

There are many formats of programs, but we will now discuss only formats used
by DOS(Microsoft Disk Operating System). They are .com,.exe and .bin. But we are
mostly concerned about exe and com file formats in this article.

First we will talk about a very much important factor 'Addressing modes of 8086(x86)

The most often work in program is to access memory, because all values related to
program operations, instructions to execute and all data, variables and files related to program is resides in memory. So it is a obvious to a program to access
memory every time.

We discussed earlier that memory is a large sequence of byte registers (collection
of flip-flops). Every byte assigned a number, numbering of bytes is starts from 0
up to n(0,1,2,3,4.....n),very first byte assigned number 0 and next byte assigned
number 2 and next is 3 the 4 ,5 ,6 ,7 and so on.

Starting bytes of memory is reserved for system, 0 byte is starting byte of memory
and operating system stores a program which in first few bytes of memory starting
from 0 byte, which starts operating system every time when computer starts.
This program is called boot loader and bytes starting from 0 bytes where boot loader
is located is called boot sector of memory.

So after memory area of operating system a program can store its data. Whenever a
program stores a value in memory, value get stores in some free bytes in memory.
Next time when a program needs to access previously stored value, it must know byte
number where value is stored.
But it is not the program which allocates byte for storing value, it is operating
system which find a free bytes in memory and store value, so we have to remember
byte number allocated by operating system.
This byte number is called address of byte, so every value stored in memory has
a address. When a instructions is stored in memory it also has a address, every
thing stored in memory has a address.
So for a program to execute we must know where it is stored in memory.Every program
has two important thing first is instructions to execute and second is values to be
used in operations .So we must know where values are stored and where is
instructions are stored.

So when a program is executed we use many registers to maintain values and
instructions address differently. One register is stores address of that area of
memory where a programs instructions are stored, and one register is used to store
address of memory are where data value is stored in memory.
It is not necessary that instruction fit in single byte, generally instructions
are many bytes long. In memory instructions related to a single program is stored
together in a sequence in which they should get execute.

So every instruction or value stored in memory occupy a space in memory, this space
can be one , two or many byte long depending on the size of value or instruction.
This memory space is called a memory location. And the address of first byte of
memory location is address of memory location.

There so data files and program files are collection of bytes, or collection of
variable length memory locations.
There is something called array, collection of same size memory locations in
continuous row is called array.
For using array we must have address of first byte of array and then we must know
size of a single memory location of array, for accessing all memory locations of
Array, we have increment address of first byte by the size of a memory location
,because every memory location of array has same size.

For maintaining all these memory addresses and memory location there are 17
different ways to access memory in 8086 (x86) processor. They are called memory
addressing modes.

Before studying Addressing modes lets have a introduction of the structure of
instruction of assembly language.
As we know that in assembly language every component of processor is assigned a
unique keyword. Like adder circuit has keyword 'ADD'. Adder circuit takes two
value and then add them, then result is stored in a register. That is why ADD
instruction in assembly takes two values. Some components needs two values and some
needs just one. There are also some components in processor, that is why in assembly
language some instruction takes two operands ,some needs one operand and some dont
need any operand. Keyword associated to any operational component in assembly
language is called opcode.

Following is general structure of Assembly language.

[Opcode] first operand(if needed), second operand(if needed)

MOV destination,source

The MOV instruction transfer data from source operand to destination operand.

MOV instruction is very useful instruction in assembly language, because we
MOV instruction is used to transfer data from memory to processor register.
In assembly program MOV instruction is used very often.

Like operational circuits (ALU circuits), every register of processor has
a keyword in assembly language. Many registers are of two bytes in size, we
can also access these single bytes with the use of h and l preceding with register's
name. L for lower byte (left side byte) and H for higher byte (right side byte). for
example Accumulator register has name AX in assembly language in assembly language.
lower byte of accumulator can be access by using name AL and Higher byte of
accumulator can be accessed by using AH.

Following is list of keywords which are used for different registers.

* AX - the accumulator register (divided into AH / AL).
* BX - the base address register (divided into BH / BL).
* CX - the count register (divided into CH / CL).
* DX - the data register (divided into DH / DL).
* SI - source index register.
* DI - destination index register.
* BP - base pointer register.
* SP - stack pointer register.
* DS - data segment register.

Now lets take a short discussion on execution cycle of 8086 processor.
There are three steps involve in execution cycle.
1. Fetch:- There is a register called Program counter or Instruction Pointer
in 8086 which has address of first by of code segment. Code segment is that area
of program where all the instructions are contained. So Program Counter of Instruction Pointer is points to the first instruction of program.For executing
program, a instruction is fetched from the address stored in instruction pointer.And
stored in a register.Then instruction pointer is incremented to the next instruction.

2 Decode:- Instruction has stored in memory in the form of binary format. So before
executing instruction should be decode to actual operation to be performed.

3 Execute:- After decoding a instruction it should be executed by the processor.

4. after executing the instruction, processes is again starts from step one. This
cycle keep going until all instructions get finished.

To be continued...

1 comment:

  1. thank u rahul sir, this will help beginners a lot..