Matt Zand is a programmer, businessman, IT Consultant, and writer. He is the founder and owner of WEG2G Group. He is also the founder of DC Web Makers. His hobbies are hiking, biking, outdoor activities, traveling, and mountain climbing.
Machine Language Overview
It’s amazing. You can type some words and numbers on a screen and poof. You just created a software program. That is essentially how all modern high-level programming languages work, as there is a great amount of abstraction involved. But as any programmer knows, the computer is not actually reading the words you write on screen; rather, it can only interpret machine language or binary code. Of course, this means that these high-level languages must be converted into lower-level languages, like assembly, then finally into machine code. But how do words on a screen that makes sense to us become thousands of ones and zeros?
Complier and Assembly Language
As you can see from the previous image, the compiler is the program that takes every individual line of code and converts it into object code, which is essentially assembly language for our purposes. What actually happens is that the compiler runs through the high-level language statements multiple times until it completely builds the object code. The compiler cannot just take one look at each line of the high-level code and convert it due to some high-level language programs’ complexity. For example, a variable may change values throughout the run of a program, or a line in the code below might affect a line in the code above, meaning the compiler will have to take a look at the code above again. So, the object code that is finally built by the compiler goes through many iterations. But how does a compiler actually “choose” how a line of code gets converted? How does it know to convert “int x = 3;” to “ldx r22, 3”?
Well, essentially, the compiler breaks up the code you typed into many small tokens. Then, it looks for any keywords that always lead to the same object code. For example, if a token contains “int,” then the compiler may have a map that directly associates the word “int” with the command “ldx [register value], [integer value].” It may then convert all of these tokens into a parse tree for efficiency after doing this multiple times. After the parse tree is created, it is possible to visualize all possible object code commands created from the initial higher-level code. However, the compiler is not finished. There may still be inefficiencies within the code. Using the parse tree, the compiler can eliminate any unused variables or unnecessary loops that don’t affect the lower-level code, leading to a smaller parse tree and a more efficient program. After this is done, the object code has been created and optimized, which allows the linker to come in and combine all of the object code files, then convert that combined executable into machine language. It is worth noting that the best way to learn how compiler & Assembly work is via projects and practice. For instance, DC Web Makers Company only offers project-based training where students learn concepts through real-world projects.
How Data Are Executed in a Machine
Combining object code files into a single executable is not a simple process. The linker basically is the exception catcher. It looks through all of the libraries that all the object code files referenced to ensure the syntax is correct. If a particular symbol in the object code does not match up with any libraries, it throws an exception, potentially halting the program from continuing. If all of the object code files manage to have no exceptions, then the linker will look at the symbols’ memory addresses in the referenced libraries. It will then begin storing the object code in particular addresses based on where their symbols are located within the memory. If identical constants are referenced in one object file or multiple, instead of placing multiple identical constants in multiple memory locations, it will just put it into one. This will be the only address mentioned. Once every object file has been gone through and their contents efficiently stored within memory addresses by the linker, an executable is produced. Based on a token table similar to the one a compiler uses, the linker can convert the object code to machine instructions, effectively making a program that a computer can actually understand.
And violà! Now the program is an executable written in machine code! Ironically, some abstraction explains how a high-level language goes to machine code, as the technical lingo is complicated to understand. There are serious university courses dedicated to only talking about how language conversion works and how low-level languages work, so of course, this was only a brief overview of the subject.
Summary
Software engineers and mobile App designers usually learn and work with low-level coding languages such as C and high-level programming languages such as Java. Thus, new learners or students must learn both high-level and low-level programming. There are lots of online resources for learning software engineering. For teenagers and high school students, High School Technology Services offers a variety of hands-on training. Coding Bootcamps institute offers many basic to advance programming classes for adults and professionals, focusing on projects and algorithm design. I suggest reading the IT career roadmap article for those who wish to learn more about coding and technology career.