Building My Own Programming Language

Over the past few weeks, I've been working on a fascinating project: building a new programming language designed to make learning to code easier. My initial goal was to develop a pseudocode interpreter with an integrated flowchart visualizer — a feature I may still add later. But as I dug into the research, the idea evolved into creating a full "programming language" inspired by Karel, a robot emulator I used as a kid that was designed to teach Pascal-style programming.

With that in mind, I started the project with the goal of building a platform for learning logic, basic algorithms, loops, and more. To achieve that, I decided the language needed the following characteristics:

Simplicity: Since it's designed for beginners, it only includes simple data types like numbers, strings, and arrays — and possibly dictionaries.
Verbosity: To make code easier to understand and read, I chose to favor words over symbols.
In Spanish: Learning is easier in your native language, so Spanish was the natural choice. I'm even planning to support ñ and accented characters in the code itself.
Easy access: The language should be easy to use, so I designed it to run in the browser without any additional installation or internet access.

Once those goals were clear, it was time to start. But how? And in what language? To answer those questions, I researched how programming languages actually work — what parts they're made of, how code is executed — and discovered that an interpreter is typically composed of:

Tokens: The symbols or words that carry specific meaning in the code, like operators +, -, /, >, <, or keywords like if, else, for, while. Each token has a type and a literal — the type is the meaning it carries, and the literal is the actual symbol. For example, the token (ASSIGN, =) means every = in the source code will be interpreted as a value assignment.
Lexer: The lexical analyzer reads the source code character by character and produces tokens. For example, the source 5; generates the token (INT, '5').
AST: The Abstract Syntax Tree is a data structure that represents the grammatical structure of a program in a hierarchical and abstract way. It's used as an intermediate representation during the parsing phase.
Parser: The parser analyzes the syntax of the code using the tokens, and builds the AST for further processing.
Evaluator: The evaluator is the part of an interpreter that actually executes the program represented by the AST. It's responsible for interpreting and running the expressions and statements according to the language's rules and semantics.

With that knowledge in hand, it was time to get started. I chose TypeScript, primarily because I wanted it to run in the browser. There are other alternatives — running it server-side, for example — but that would require an internet connection. WebAssembly would also be an option, but I have far more experience with JavaScript and TypeScript than with C or Rust.

So here we are, at the beginning of this exciting journey. In the following posts, I'll share more details about the progress and challenges that come up as I build this language.

Conclusion

Building a programming language can seem like a daunting task, but with the right determination and a solid understanding of the core concepts, it's a feasible and rewarding project. My goal with this language is to make programming education more accessible — especially for people whose native language is Spanish. I hope that by sharing my experience and the technical details behind this project, I can inspire others to embark on their own language-building adventures. See you in the next post!