Console Development – Week 1,2,3 Semester 2

Reminder

Console Development is a module that is running though both semesters this year. As a reminder, Console Development is a module in which we will be developing a program to run on a PSP, the main focus of the module is to increase our knowledge of developing with system limitations in mind and Optimisation will play a large part of the module.

In the previous semester as part of the module I had to learn about MIPS Assembly Language and write a few different programs using this Assembly Language. Learning this was a great help and increase my level of knowledge about computer architecture and how our code we write in higher level languages gets translated into assembly code and eventually into machine code. Also as part of last semester I learnt a lot about the inner workings of computers and mainly about how the Cache and memory can be a critical part of optimisation.

Week 1

At the start of this semester we had to take a sample of Sonys which runs on the PSP and improve its memory organisation in groups of 3 to 4. The main optimisations that we were to focus on were Preloading cache lines, changing the layout of structs and helping the compiler overcome aliasing issues. We picked a sample which was called Moonball, it was a game in which you had to shoot balls before the moon reach one end of the screen. At first glance this program was quite efficient and looked as though not a lot could be done to change this.

After searching through the source code for a while we came across a struct which look a little peculiar, the struct its self had a flag in it which was used to see if a Particle needed to be drawn, if the flag inside the struct was set to true then it would draw the particle, if false the it would bypass that particle and would not draw it. The flag and the data which needed to be accessed was all in the same struct, we came up with an idea which involved taking the Data out of the struct and putting it into another struct on its own, then each instance of the old struct has a pointer to another instance of the new struct. The reason behind doing this was to improve the number of structs which could fit onto one cache line, and hopefully improving and reducing the cache miss rate. We had to use some basic profiling tools to figure out if this changed worked and according to the Sonys kernel profiling code we reduced the Cache miss rate from 280 down to 220 misses per frame!

We had to present our findings to the rest of the class in the form of a powerpoint presentation.

Week 2 and 3

This week and the previous week we have been given a tool called Tuner which helps us with profiling, this tool can tell us a lot more information that we could find out ourselves and it gives it us a lot faster! Using this tool we had to use the same sample as before and give proof that it has improved the performance of the application.

Using Tuner we profiled the original sample and our changed version only to find out that the time it take to run each of the versions was the same, we may have improved the cache misses but the time it took to render each frame was still the same as before, so all our hard work from the previous week was undone.

We have been profiling the code ever since trying to improve some other parts of the code but the sample so far looks very clean and is hard to change! We have to present our findings this week to the class once again.

Console Development – The Last Month

So, Console Development lectures for the last few weeks have all been of the same format, we have been put into groups of 3 people and we have been told to go away and find out as much as possible about a certain topic. We then had to present the results of the findings to the rest of the teams and the lecturer via any means we wished. Here is what we have been up too for the last 4 weeks:

Team Presentations

Creating a Program

The first task as part of the group was a basic task which was to create a basic program that will run on the PSP Development Kits that we have scattered across the labs, the basic program had to link between 3 different files. I was self appointed team leader, which was quite a vague title but it just meant that as well as making one file for the program i also made the MakeFile, which is a file used to call the compiler with some special flags that we had set and with a name we had provided. The program that i decided we make is a basic program that inside a loop calls 2 functions, one being the Computer and the other being the PSP and they have a conversation between each other as the index of the loop is passed to each function.

As stated before i also had the task of creating the MakeFile, i took this quite far and started picking apart the MakeFile that was given to us in an example piece of code. I picked it apart finding out what each flag did, whether it was needed or not to actually make our program compile the makefile was also doing quite a bit which was not written down, its given for granted that it does it automatically, but for extra clarification i also wrote out the bits of the makefile so that i could explain it slightly better during the presentation. The final MakeFile is as follows:-

#
# Conversation Makefile
#

FLAGS = -W -Wall -g -c -o
TARGET = Conversation
OBJS = main.o hello.o world.o

all: $(TARGET).prx

clean:
rm -f $(TARGET).prx *.o *~ *.bak

$(TARGET).prx: $(OBJS)
psp-gcc $(OBJS) -startfiles -o $(TARGET).prx

main.o : main.c
psp-gcc $(FLAGS) $@ $^

hello.o : hello.c
psp-gcc $(FLAGS) $@ $^

world.o : world.c
psp-gcc $(FLAGS) $@ $^ ($@ and $^ refer to the file to be created and the dependencies)

The above is how our MakeFile looked, the declarations at the top are basically templates so that when that it said anywhere else in the file the right hand side will get put in place.

The all: and the clean: functions and all the other red coded parts can be called from the Cygwin Bash Shell, which is basically a command prompt that simulates a Linux environment on Windows, if you just type ‘Make’ it calls the make file which is set by default to all: if you type ‘Make clean’ in the command prompt then it calls the clean: function and so on.

If ‘Make all’ or just ‘Make’ is called it will try to create $(TARGET).prx, which is Conversation.PRX, to create this file it looks down to the $(TARGET).prx function and the $(OBJS) next to that are the dependencies for the .prx to be made, without the ($OBJ) files there it will not create the .prx, so before it creates the .prx it must first create the .o files, which it then looks down the rest of the MakeFile and it creates them as it goes, finally giving out the compiled Conversation.prx which can then be loaded onto the PSP.

The Programs Memory Allocations

We then had to run the program in the PSP De-bugger and find out about where the variables were stored in memory and how different types of variables refer to different types of memoy:

  • Static Memory Allocation – Global variables, they stick though the entire programs running and get deleted at the end of program execution – Therefore are stored in the program data.
  • Dynamic Memory Allocation – When ever memory on the heap is being allocated at run time, such as using the new keyword in C++ or Malloc in C.
  • Automatic Memory Allocation – Local variables, they are stored on the stack and pushed onto the stack when created and poped when the function it is local too is finnished.

 

We had to do a presentation on this to the other class members once again. This time we made a small powerpoint to explain what we found, the findings we made we as above.

We also had to modify our program slightly to get a better presentation to give, we had to create a global variable somewhere as we only had Local variables at the moment. We also tried to use the new Keyword, then found out it did not work and we needed to use malloc() in C.

Optimisations

Using the flags -O0, -O1, -O2 and -O3 we had to find out the differences between what each flag did, we set up some runtime profilers in our code and found out what they were. We also made a few comparisons with the assembly code that was created as the optimiser sometimes changed the code dramatically!

The main findings we came away from the presentation with were that the different levels of optimisation do not have lots of change on the small program we created, there was a visible but only slightly change to the speed at which is took to run the code but a slightly larger file size and longer compile times as a result.

The Lectures

The lectures that we have been having have been talking more about the inner workings of the computer, talking about the computer architecture and how this relates to the PSP that we will be working with for the next year.

Intro to PSP Development

We first went into an introduction to PSP Development, to get us ready for creating the program above, this took us though a tutorial in our teams that taught us how to create a basic program and how to navigate and call things through the Cygwin Bash Shell, which as i mentioned before is a program that tricks Windows into thinking its running on Linux, and the command prompt can be used to compile programs and view files etc.

Toolchains and Linking

The next thing we looked at were toolchains and linking, how the files get compiled and linked. They start of with a PreProcessor which moves all of the #incudes and macros and such into the correct positions into the file, for example if you #Include and header the whole contents of the Header file are placed into the .c. After the PreProcessing stage then comes the compilation stage, which is a stage which converts the .c into assembly language that is readable by the platform required, in our example we compile the program to be ran on the PSP so the compiler compiles MIPS Assembly code. After compilation stage then comes the assembler stage, which outputs machine code which can be read by the machine, at this point it can still have some unresolved externals that it is yet to know. The final stage is linking, which links all files, resolves all unresolved externals and outputs a loadable file.

Performance Fundamentals

We then went on to talk about Performance Fundamentals and how we define performance in computers. This can be effected by three factors, the Instruction Count, which can be changed in software by us, the programmers, by laying out our code better, using better algorithms and such. The second factor is the Clock rate of the processor, which is a hardware problem and for a single computer can not be changed (unless you want to risk overclocking and blowing it up). The final factor is the CPI, or Cycles per Instruction, which an average of the cycles it takes to execute a set Instruction Set. This can be changed by the programmer but is also affected by the Hardware.

Pipelines

Here we talked about the Pipeline Architecture and how it can be split up into 5 separate parts, the Instruction Fetch, Instruction Decode and Read Registers, Execution and Address Calculation, Data Memory Access and Write Back. To improve the time it takes to execute these parts of the Architecture each of the 5 parts can be worked on at the same time, passing 5 instructions through the pipeline at once. This is known as Pipelined Microarchitecture and it is used it most computers today. If each one of the 5 stages were to take 1 clock cycle there would be an instrction completed each clock cycle! There are a few hazards which can be created by working in this way though, you could be reading an out of date memory value or register value as the instruction before it is working on it at the time. Also if a branch or jump instruction is called then it takes 3 cycles before it knows the outcome of the jump, meaning if it did branch the instructions after the branch would be useless and need to be flushed.

Cache

In this weeks lecture we were discussing the cache and the speeds of the different memory area’s, as well as what is meant by a Cache Hit and a Cache Miss and what implications these can create.The purpose of a Cache is to copy values to a fast local store so that the processor can work on them much quicker as it takes 100 times less speed to work from cache as it does working from main memory. But what data gets fetched? Well this is where the Principle of Locality comes in, it will fetch either huge blocks of memory at a time by thinking that memory usually runs linearly so it would copy the needed value and the values from its neighbours in memory back into the Cache, this is called Spatial Locality. The other Principle of locality is Temporal Locality which would try to keep recently used items in the cache, such as loops, if a counter is used that would be stored as it is used each run though the loop.

After talking about the Principles of Locality we went on to talking about Cache hits and misses, there are two types of cache hit and cache miss, these are created when the cache tries to find data it needs in the cache while reading the data, if  it does this is a read data cache hit, if not its a miss. The other is when data is trying to be written, if the data is in the Cache then it is a cache hit, if not, a cache miss. The aim of all programmers is to try to get as many cache hits as possible, as if there is a cache hit reading data can be done without stalling the pipeline and writing can be done with minimal stall. If a Cache miss occurs then a major stall is created in the pipeline, depending on how it is set up the Cache will respond to misses in different ways.

That marks the end of this Blog post, and the end of Console Development for Semester 1, next week is used for the other 2 modules which i will blog about soon and keep you updated as the deadline approaches!

Console Dev Week 5-6 – More Mips!

I was rather busy last week as i went to Game City in Nottingham for a day, i had no time to blog! So this blog is for last week and this weeks work.

Week 5

Over the last 2 weeks we have been doing more MIPS Assembly programming, in week 5 we learnt about Subroutines and the use of pseudo instructions to make the program more readable by us! Once again we have been working through some chapters at this website: http://chortle.ccsu.edu/AssemblyTutorial/index.html. During week 5 we looked through Chapters 19 through 26 and work through the programs at the end of chapter 19 and 26. The exercises at the end of Chapter 19 were simple, and i worked through them quite quick. The programs at the end of Chapter 26 were a lot more challenging, it was requesting us to use all the things learnt from previous exercises and lots of things we had just learnt. After pondering over them for a long while they finaly got completed. After working through the harder chapters i feel that i am much better at programming in general now that i know what goes off behind the scenes.

Week 6

This week, Week 6, we had to look into The Call Stack and how to save addresses and information into the call stack. Although we were ment to have learnt about the Stack last year, we had been taught it poorly. Being taught about the stack now we know a lot more about programming and while learning MIPS made the learning a lot easier, and now i know how the stack is used in programming a lot more!

Once we learnt about The Call Stack and how to call it using MIPS we were told to do Chapters 27 and 28 on the usual page. So far i have completed all of the tutorials on Chapter 27 and am currently looking through Chapter 28. Chapter 27 was quite simple, it just took quite a while to get your head around how to write the program but once you had it in your head it was easy to write and very similar to the programs in the previous chapter.

Later in the week i will post another blog explaining in more detail what we did and what happens in chapter 28.

MIPS – Week 4 Console Dev

Once again in Console Development we have been using MIPS assembly language to program a few things, again we have been asked to use This Website to research about MIPS, read the exercises and do the programs on the end of the Chapters.

This week I have been doing chapters 17 and 18, i found them less of a chalange then i thought and am happy programming in MIPS. Chapters 17 and 18 consist of Jumps (loops), Branches (Ifs) and Set Instructions ( > and <).

The last program of Chapter 18 which we had to make was a program that worked out the Median of 3 numbers stored in memory, i coped well with this exercise and it works, wether or not its the best way to do it is yet to be found out. Here it is:

MIPS Program to work out the Median

MIPS Program to work out the Median

 

Thats all for this week, hope all is correct.

MIPS Programming

Here i am again talking about MIPS, as MIPS is used in the Sony PSP which we will be programming for later in the year, our task this week was to research more about MIPS and do the exercises on Chapter 13+15 from this page – http://chortle.ccsu.edu/AssemblyTutorial/index.html#part4. I read through all chapters from 1 to 15 to get my bearings, chapters 1 through 9 were quite simple where as 9 onwards i needed to read thoroughly. After finding out lots more about MIPS, reading up on loading and storing, arithmetic, the .data area and how all of this is arranged in memory I have become very fond of MIPS. Its a challenge writing so close to the machine code, a challenge i enjoyed very much!

So after reading up on these chapters I finally did the programs / exercises at the end of Chapter 13 and 15. The first set i found reletivly simple, it is just basic arithmetic. The second chapter exercises were much more challenging and drew on everything you should have learnt from that point. Here is my full program that i wrote for the last question on Chapter 15, i believe it is correct.

Chapter 15 Exercise 4 - MIPS Programming
Chapter 15 Exercise 4 – MIPS Programming

This program was writen by me using Horner’s Method, which is a method used to reduce code and speed up the workings out when finding the results to a Polynomial. I have done it using 3 registers as that is what i thought the exercise wanted, it could also be done by loading each value into different registers and working it out that way, i think that either way is acceptable but this way leaves more registers open for other uses if needed.

MIPS Architecture + Hello World

As you know if you have read any of my past blogs on Console Development we have recently been taking apart the Playstation (PSX) having a look inside it and researching into what parts are which and what they all do. After giving a presentation I was told to look into the ISA (Instruction Set Architecture) for my console, here are my findings.

The playstation uses a MIPS Architecture, which is also used in many other consoles such as the N64, PSP and PS2. MIPS Instruction set was quite confusing to look at at first, as it is a totaly new concept to code in such a low level language. I soon picked up a few things after looking at examples using MIPS instrcution set and on Wikipiedia about the instructions used. After looking at the language and examples for a while I was happy with what i had learned. I think i could confidently write a basic program that for example would write out a string to the screen.

You would start off with .data < this is where you declare all your variables used in the program -> Then name the variable you want after this – string: followed by the type of variable, to use the print_string system call later you need to use the variable type .asciiz and follow this by the string, “Hello World”.

After the .data section its the .text section, this section is the main portion of your code. start this off with .text followed by main: on the line down, the first thing to do is copy the string into the register that is read later using – la $a0, string – then set the $v0 register to 4 by typing – li $v0, 4 – followed by a syscall, which reads the value in $v0 and as this number is 4 it prints_string in $a0, which is the string “Hello World” specified above.

To exit the program use the service code 10 in $v0 and call syscall again.

Here is how the code looks running in MARS 3.5 MIPS Editor:

 Hello World in MIPS

Hello World in MIPS

(Read More)

Console Development Week 1 and 2

During our first 2 lessons of Console Development we have been looking at the architecture of old consoles, the group i am in were assigned the Playstation 1. First things first! Lets dismantle it!

The insides of a PS1

The insides of a PS1

 So we were inside of it, we had a search on google to find which parts were which and we had to present this to the rest of the people in our group. We presented to the group, telling them about what the CPU, GPU, Audio and Memory was, how it worked, how large or fast it was and where it was on the motherboard. We also compared it to the closest rival, which was the Nintendo 64.

After all of these presentations in week 2 we had a lecture about Alan Turing and the Turing Machine and how it gave rise to all modern computers, the Turing machine had one problem with it though, that the instructions were inbuilt into the board. John Von Neumann came up with a solution to this problem, with what we now call software. The basics to all computers is a Fetch and Execute cycle, in which the computer fetches and executes instructions.