Login
User Name:

Password:



Register
Forgot your password?
Vote for Us!
auth_update crash
Dec 23, 2017, 10:15 pm
By Remcon
check_tumble
Dec 18, 2017, 7:21 pm
By Remcon
parse description bug
Dec 15, 2017, 10:08 pm
By Remcon
Couple bugs
Dec 12, 2017, 5:42 pm
By Remcon
Bug in disarm( )
Nov 12, 2017, 6:54 pm
By GatewaySysop
LoP 1.46
Author: Remcon
Submitted by: Remcon
LOP 1.45
Author: Remcon
Submitted by: Remcon
LOP Heroes Edition
Author: Vladaar
Submitted by: Vladaar
Heroes sound extras
Author: Vladaar
Submitted by: Vladaar
6Dragons 4.3
Author: Vladaar
Submitted by: Vladaar
Users Online
CommonCrawl, Yahoo!, Bing

Members: 0
Guests: 6
Stats
Files
Topics
Posts
Members
Newest Member
478
3,708
19,242
612
Jacki72H
Today's Birthdays
There are no member birthdays today.
Related Links
» SmaugMuds.org » General » Coding » Kasji's Tiny PHP Implementati...
Forum Rules | Mark all | Recent Posts

Kasji's Tiny PHP Implementation.... for MUDs!
< Newer Topic :: Older Topic > A Better MudProg Solution ;)

Pages:<< prev 1, 2, 3 next >>
Post is unread #1 Dec 23, 2007, 6:42 pm
Go to the top of the page
Go to the bottom of the page

Kasji
Apprentice
GroupMembers
Posts62
JoinedDec 23, 2007

Hello hello, everyone. Some of you may remember me. Most of you, probably not. :P

Anyway, I've been needing to get around to this for month, but I want this to be available to everyone, especially in this community.

The next few paragraphs offer a background, and a slightly less steep learning curve for those who aren't familiar with how a scripting language works. If I'm wrong on anything, feel free to let me know so I can correct this post.

What is Tiny PHP?
Tiny PHP is a cut down version of PHP. Tiny PHP is not in existence yet, but the rules for this language have been thought out by Jonathan T Cotton (http://jtcotton.info) and you can find a copy of his Tiny PHP paper at http://www.jtcotton.info/papers/TPS.pdf . Note that his includes SQL support. I do not plan on implementing that at the moment as most MUDs are still using flat files AFAIK.

In order to create a scripting language of our own (or my own, if it turns out no one is interested :P ) we've got two paths to choose from: hand craft a parser, or use a parser generator to make a parser. What is parser generator you ask? Well, thus far in my experiences, I've come across three kinds of compilers. You've got your standard compiler that generates native byte code, then you've got a cross compiler. A cross compiler will take any old code, but instead of generating native byte code (at least, native for the system it's compiling on,) it generates byte code for another system architecture, or a virtual machine. Then, you have a compiler compiler. A compiler that creates a compiler. Or in our case, creates an interpreter.

This is a bit long-winded, I know. There's nothing imperative here though, so feel free to skip down.
I've gone over the options semi-thoroughly, and decided that the route I'd like to take is to use a parser generator. The next step is to find a parser generator that meets the needs of the project. If I were a software engineer, then you'd probably be expecting me to say something along the lines of Portable, Simple, and Fast (in terms of processing speed.) Those are good project goals, but unfortunately I wasn't in a position to use ideals such as those. Selecting a parser generator was difficult for this one reason: documentation (especially lack thereof.) But, I went through a few options: YACC, Bison, Lemon, and then finally ANTLR. Bison is supposedly improved/enhanced over YACC, and is also compatible with YACC, but when I tried compiling the Bison on *nix, I couldn't even get it to compile/configure right out of the box, so I moved on to Lemon. Lemon amazed me right off the bat. Firstly, the entire package was in one relatively small .c file, and secondly it compiled right off. The problem I encountered with Lemon was a lack of documentation. I am no compiler writer expert, I will admit. In all my programmings, I am heavily dependent on reference material. So then I came to ANTLR. It claimed to be far better than other parser generators in many ways, but what really caught my eye was ANTLRworks. It's an IDE for ANTLR, with error reporting, debugging, and the whole works. It still lacked a lot of documentation, at least in regards to what I wanted to build. But there was enough documentation to get me going after a couple hours of scouring I had tasted success. See http://www.antlr.org for info on ANTLR.


  • Note: One of the many nice things about ANTLR is that it can generate code in multiple languages. I figure Java and C are the big ones, and both of them are supported, and thus portability is mostly solved right off the bat.


(continued next post)
       
Post is unread #2 Dec 23, 2007, 7:07 pm   Last edited Dec 23, 2007, 7:19 pm by Kasji
Go to the top of the page
Go to the bottom of the page

Kasji
Apprentice
GroupMembers
Posts62
JoinedDec 23, 2007

Down to Business
Here on in, it's on to the work. Parser generators use a set of rules that we define, and uses that set of rules to 1) convert a stream of characters into a stream of tokens, 2) build an Abstract Syntax Tree, and 3) decide what to do with the tree.

Get the project files here: http://www.shallonline.com/tphp

  • tphp.tar includes the generated source files and a backup rule file (tphp__.g)

  • tphp.g is our rule file. This file defines what the language is, how it works, etc.

  • sample.script is a sample PHP script.

  • sample_ast.bmp is a picture of the Abstract Syntax Tree that the rule file created from the sample.script. Shows tokens, and the structure of the language.


Also, I apologize for the gigantic size of the BMP. I would be extremely greatful if anyone would convert it to a JPG or PNG. My main (Windows) computer is broken, and I'm on Linux. So at the moment I'm not really able to do the conversion myself, if only out of ignorance.

This is a work in progress. Far from complete.
If you're interested in helping me create this new scripting language for MUDs, the best place to start is the file tphp.g. I've added various comments to it, to aid in your own learning process. Questions are certainly welcome.

Things that need done first:

  • Order of operations needs to be added. This may be tricky, but order of operations can be constructed through a set of well structured rules.

  • Rule optimization. I am aware that the rules I have put together are not exactly well structured. You will notice though, that our language only has 25-ish rules. It's pretty small right now. It will definitely grow as it matures. So, getting things right the first time is a high priority.



Things that need done next:

  • Well, we've gotta start adding functionality. Right now our little parser and lexer don't do anything. They just break a script down into parts and place it in a tree.

  • I'd like to see working variables. Probably through an STL map, such as this psuedo code: std::map<VARNAME, OurDataStruct> global_scope_vars;



Where, OurDataStruct might look like:
class OurDataStruct {
typefed enum DataType{ STRING, CHAR, INT, FLOAT };
union MultiData {
  std::string str;
  int i;
  double f;
  char c;
};
  private:
    enum DataType; // What kind of data is in the union right now?
    union MultiData data; // Our data
  public:
    // Some functions for data access/construction/destruction.
};

I know it's C++, assuming I am giving valid code, but it's just an example.

Anyhow, I won't go any further until I see if anyone is interested in this or not.
       
Post is unread #3 Dec 25, 2007, 6:04 pm
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

Interesting idea, but unless you are doing it for the learning experience (which is a perfectly fine reason), I would recommend to not roll your own scripting language. Writing a parser is not even half of the battle: you will need to implement some kind of interpreter (either bytecode, if you compile to bytecode, or some kind of syntax tree walker). In my experience, writing the parser is actually the really easy part. In particular, this statement isn't quite right:
A compiler that creates a compiler. Or in our case, creates an interpreter.

An interpreter really isn't a compiler. A compiler is something that takes one language and turns it into another language; the interpreter (or the processor+OS, if you compile to machine code) is what runs it. Bison etc. will not help you write an interpreter: you will have to implement the running environment yourself.

Here are some reasons to use somebody else's scripting language:
- the parser/compiler is likely to be more stable because more people have used it
- the language is maintained and worked on by other people (i.e., you have more time to do other things)
- there will be a community around the language whence you can get libraries, support, etc.
- a good scripting language has more or less plug and play embedding, and all you have to do is get data into its form and it does everything else.

Like I said if you want to do it just to learn how to do it, I'd say great and more power to you. (I did the same a few years back.) But that said, I am also tossing the scripting engine I wrote because it's just not worth my time to improve it when I can embed a proper language that already has things like functions and complex variable scoping. In any case, it's very fun to see somebody interested in this stuff. :smile:
       
Post is unread #4 Dec 25, 2007, 6:56 pm
Go to the top of the page
Go to the bottom of the page

Kasji
Apprentice
GroupMembers
Posts62
JoinedDec 23, 2007

Right on. Thanks for the comments. I have reviewed a multitude of scripting languages, and I find PHP the most appeasing. I did look into implementing PHP. The Zend engine, that PHP runs on, wasn't really designed to be embedded in anything except Apache, or another http server. They did come out with CLI SAPI, but running scripts from a command line isn't ideal. Your points are all valid, but I will still move forward with this. The learning experience is definitely a plus, but not really my main motive.

Just a small update on my progress....

I've coded a C++ class for the PHP style variables. Unfortunately, a union isn't allowed to have a member with a constructor, so I am using a struct instead, because of std::string. The class currently handles int, double, char, and string. I have several operator over-loaders coded as well. I did encounter a little issue with it. Apparently you can't call member functions on a 'const' class, even if the function doesn't modify the class, so my effort to keep data as 'protected' was thwarted.

Also, I'm about half-way done implementing order of operations in the rules, and once the above class is done I'll add the code needed to make expressions actually work.
       
Post is unread #5 Dec 26, 2007, 1:33 pm
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

If you're implementing your own version of PHP, I think it's a little bit of a misnomer to say you're using PHP. Is PHP itself really that hard to embed? Running things from the command line is definitely bad, perhaps even a terrible idea... :wink: And out of curiosity, did you look at Lua? It was designed to be embedded and it is ridiculously easy to do so. Perl has (IMHO) terrible embedding, and supposedly Python is pretty good although I've not looked at that myself. I'm also not sure what you mean by an "appeasing" scripting language so I'm not sure if Lua would be good for you. :smile: My main issue is that I really don't like PHP all that much at all, especially if you're setting out not only to embed it but to reimplement the entire engine from ground up...

Kasji said:

Apparently you can't call member functions on a 'const' class, even if the function doesn't modify the class, so my effort to keep data as 'protected' was thwarted.

You can, you just need to declare the member functions const as well. E.g.
class Foo {
public:
  int getValue() const;
private:
  int f;
}
int Foo::getValue() const
{
  return f;
}


Kasji said:

I'll add the code needed to make expressions actually work.

What about function calls? e.g. return foo() + bar()
       
Post is unread #6 Dec 26, 2007, 4:33 pm
Go to the top of the page
Go to the bottom of the page

Kasji
Apprentice
GroupMembers
Posts62
JoinedDec 23, 2007

I suppose technically it's not PHP, but it's PHP syntax and behaviour, as closely as I can get it.

I have looked at Lua, yes. Yes it is very easy. Regrettably, I don't like the syntax. That's not a good reason to not use it, but that about sums it up. :P In all honesty, IF I did select a language already out there, it would be Python.

PHP may not be as hard as I think it is to embed, but the code is a bit obfuscated for me. Still though since PHP wasn't designed for real time (at least, I'm guessing it's not, given the conditions of it's birth) executions, and may be too slow.

By appeasing, I mean, I find it's syntax similar to C/C++, yet much easier, due to not having to mess with data types.

Also, thanks for the info on the 'const' issue. I've rarely used 'const', but it's necessary for operator overloading.

And, I'm not sure what you're asking by your last question, but I'll take a shot at it. In ANTLR, we'll do something like this example:

multiexpr returns [int value] :
    e=addexpr { $value = $e.value; }
    ('*' e=addexpr { $value *= $e.value; }
    |'/' e=addexpr { if ($e.value == 0) {
                /* Error -- Division by zero */ }
                else $value /= $e.value; } )?
    ;

// Here's the same rule, but without the functionality code
multiexpr :
    addexpr ('*' addexpr
                  | '/' addexpr)?


Obviously the above is using an int, and not the class I mentioned. The class should be able to handle all operations internally, so you could swap int with class foobar and it would work seamlessly. The $whatever isn't a PHP variable, FYI, it's a token that ANTLR uses which it replaces with real code of the language you're generating for the lexer/parser for. Not sure if that answered your question or not. Also note I might have the above backwards since I'm not looking at code right now. The addexpr might be the parent of multiexpr. It depends on how it the tree is built/walked.
       
Post is unread #7 Dec 26, 2007, 4:58 pm
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

Well, I suppose disliking syntax is fair enough if the technical merits of the solutions being considered don't matter. :smile: (That's not meant as a swipe at PHP.)

Kasji said:

Still though since PHP wasn't designed for real time (at least, I'm guessing it's not, given the conditions of it's birth) executions, and may be too slow.

Chances are good that it will still be faster than a home-grown implementation, though, if anything because a lot more people have gone through it and worked on making it faster.

Also, PHP is basically as real-time as mudprogs in that you start a script, run through it, and terminate.

Kasji said:

By appeasing, I mean, I find it's syntax similar to C/C++

You might want to look into JavaScript then because its syntax is also (generally) C++ish but it is also fairly easy to embed apparently (never done it myself).

Kasji said:

Not sure if that answered your question or not.

Not really, unfortunately. :smile: What I meant is how you are actually going to handle function calls that are part of an expression. The problem with your approach is that you are executing as you parse, which means that your language will not be able to support functions without awfully hackish workarounds.

I had thought you would be compiling to some kind of parse tree format that you would later run through an interpreter. I see now why you previously said that you were making an interpreter -- you were right about that, given how you are approaching the problem. But that approach severely limits what you will be able to do with your language (e.g. not being able to call functions).

Consider the following code:
function foo(a)
{
  return a*2;
}
// ... later on ...
return foo(2) * 2;

How will you handle this? Since you are executing as you parse, you will need to call the code contained in the function foo. But you can't do that without some kind of virtual machine lying around that can set up stack frames and so forth.

But here's an even more problematic example: in the following, you make a function call to something you haven't even parsed yet:

function foo(a)
{
  if (a == 0) return 0;
  else return bar(a-1);
}
function bar(a)
{
  return foo(a);
}


Never mind the details; the problem is: how are you going to handle the call to bar from within foo?

If you implement an execute-as-you-parse approach, you will be limiting your language to very, very simple semantics. Function calls are almost entirely knocked out; interesting control flow is also very difficult.

Consider even the following simple code:

a = 0;
b = -1;
c = 0;
if (a == 0)
{
  b = 0;
}
else
{
  b = 1;
  c = 1;
}


In this code, you expect c to be equal to zero after the execution of the loop. But under your approach, you will need to parse (and therefore execute) both branches of the if statement before you can then evaluate the condition on which to branch. As a result, you will have executed both branches of the if statement, and then the correct branch again. But wait -- it gets worse...

a = 0;
b = -1;
c = 0;
if (a == 0)
{
  b = 0;
}
else
{
  a = 1;
  c = 1;
}

This time, since you still need to execute the second branch before evaluating the condition, you will end up executing the wrong branch when you evaluate the conditional!

Just trying to give some general pointers here and get you thinking about what all is involved in this kind of task, which, given how I'm reading what you've written, is harder that you think... it's quite possible, though, that I've misunderstood you and you're already handling all of this, in which case you should tell me so and ignore the above. :smile:
       
Post is unread #8 Dec 26, 2007, 7:22 pm   Last edited Dec 26, 2007, 7:32 pm by Kasji
Go to the top of the page
Go to the bottom of the page

Kasji
Apprentice
GroupMembers
Posts62
JoinedDec 23, 2007

I do have a virtual machine on the back burner. I didn't want to bring that into the scene yet, though. In fact, I will be using most (maybe all) of the instruction set of the PHP Zend Engine VM. Also, ANTLR generates thread safe code.

Assuming we're doing a single pass through the script, as that we are executing at the same time as building the tree, then I do see the problem with calling a function before it's in the tree. In PHP though, when you define a function:
function foo($a) {
 // Do something
}


The function is not executed. It would only be added to the tree. To actually execute it, it has to be called from the global scope.

I do not, however, see the problem with logic control.

$a = 3;
$b = 1;
if ($a == 0) {
$b = 2;
}
else {
$b = 0;
}


As it executes, $a will be 3, and $b will be 1;

Once it reaches the if statement, it will evaluate false, and the next token will be skipped, and will then see the else, and would execute that.

However I think it would be better to build the tree, then execute the tree, which I see no reason that can't be done. The rule code in my previous examples is only defining the behavior of the token, but that doesn't mean the script has to be executed as it's tokenized.

It was my initial belief that the tree would be fully built before execution began. One pass through the raw data (tokenize, add token to tree, repeat) and then the tree would be executed/walked.

Back to the VM issue. Using byte code is always an option, but I'd rather just skip compiling byte code. The main issues that the VM needs to handle is process management, and memory management. Since the code is thread safe, we could just create a thread for each script executed, and let the OS handle it, though that is slightly slower since you have to call on the OS, because the MUD will be stopped while the OS handles the call. Now with memory management, we've got a global scope stack, and then we've got temporary scope stacks. We'll also have certain system defined variables in the global scope stack, such as the location, and characters involved in triggering the script. That will be a more complicated matter to handle, as allowing a script access to a character for instance will require some sort of struct/class-like behavior. We know PHP has classes, so that's not a problem. But I haven't gotten to the point of adding that to the rule set.

With temporary scope stacks, only one rule is allowed to have a stack, and that is 'sub'.
The 'sub' rule looks like this:
'{' code* '}'

So it's about like C, in that regard, if you have a set of curly brackets, you can define variables, and those variables are only valid in between those curly brackets, and not before or after them.

Sorry for jumping around, but I just had a thought, on the issue of function calls inside of expressions. A function in PHP isn't required to return a value, but it may just be best to let it return zero by default (if a return call is never reached during it's execution path, or the return call doesn't return a value.) As of this writing, I haven't tried adding a function with no return call to an expression to see what happens, but... Anyway, let me know what you think.
       
Post is unread #9 Dec 27, 2007, 7:37 am
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

Kasji said:

Once it reaches the if statement, it will evaluate false, and the next token will be skipped, and will then see the else, and would execute that.

Yes, but a bottom-up parser would need to parse the entire if statement before running the semantic rules associated therewith. In other words, you've parsed the two branches by the time you evaluate the conditional. There's a difference between tokenizing and parsing.

Kasji said:

However I think it would be better to build the tree, then execute the tree, which I see no reason that can't be done. ... It was my initial belief that the tree would be fully built before execution began. One pass through the raw data (tokenize, add token to tree, repeat) and then the tree would be executed/walked.

This is indeed a better way of doing things and was what I was suggesting.

Kasji said:

Since the code is thread safe, we could just create a thread for each script executed,

I don't really see the point in this, since your scripts won't be executing concurrently anyhow. In fact, I'm not sure you even want your scripts to be executing concurrently -- it doesn't matter if the interpreter is thread safe if the rest of the program isn't.

Kasji said:

That will be a more complicated matter to handle, as allowing a script access to a character for instance will require some sort of struct/class-like behavior. We know PHP has classes, so that's not a problem.

I would just use associative arrays and not worry about classes. Even if the underlying MUD has characters defined as classes, you can still use global functions on the PHP side that when called do the right thing regarding calling the method on the object.

Kasji said:

So it's about like C, in that regard, if you have a set of curly brackets, you can define variables, and those variables are only valid in between those curly brackets, and not before or after them.

Again we have problems of bottom-up parsing here. By the time you execute the semantic rule associated with the first curly brace, you've parsed everything in between...

Kasji said:

A function in PHP isn't required to return a value, but it may just be best to let it return zero by default (if a return call is never reached during it's execution path, or the return call doesn't return a value.) As of this writing, I haven't tried adding a function with no return call to an expression to see what happens, but... Anyway, let me know what you think.

Well, I wouldn't even start thinking about calling functions until I'm done parsing. As I've said, I think it makes things a good deal harder to execute as you parse, in fact so much harder that it can severely limit the semantics of your language. I would parse to a syntax tree, and then probably keep that syntax tree in memory and to execute the script I would give that tree to a simple virtual machine for walking i.e. execution.
       
Post is unread #10 Dec 27, 2007, 9:23 am   Last edited Dec 27, 2007, 9:27 am by Kasji
Go to the top of the page
Go to the bottom of the page

Kasji
Apprentice
GroupMembers
Posts62
JoinedDec 23, 2007

I see what you're getting at, but ANTLR generates top-down/recursive-decent parsers, or am I mistaken? This lexer/parser is LL(2) at present, and possibly LL(1) if we can get away with it. Further-more, we have access to semantic predicates, and arbitrary lookahead.
       
Post is unread #11 Dec 27, 2007, 6:54 pm
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

Regardless, that was just one example problem. Even with a top-down parser you will need to parse the entire if-true block before you see the if-false block. Sure, you can set special flags to skip it, but you will be making things very hard on yourself for any kind of interesting control flow. Furthermore, you still can't handle function calls properly if you execute as you parse. Trust me, for any kind of interesting language it's really not generally done to execute as you parse. :smile:
       
Post is unread #12 Dec 28, 2007, 8:19 am
Go to the top of the page
Go to the bottom of the page

Kasji
Apprentice
GroupMembers
Posts62
JoinedDec 23, 2007

Well like I said, that's not what I have in mind to do. I expect the tree to be fully built before it's traversed. Do you have anything else I should be on the look out for? I'm sure there is plenty of pitfalls to beware.

On the note of function calls, I do have an idea for a function register. Let's say we're building our tree, and encounter a function definition. We could add the function to an STL Map, and a reference to the node in the tree where the function begins. That way, later when the function is called, we look up the beginning node in the register (by hash of function name, or just function name.) Seems simple enough.
       
Post is unread #13 Dec 28, 2007, 5:55 pm
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

Kasji said:

I expect the tree to be fully built before it's traversed

Ah... good. That's not what the ANTLR code you gave above does, which is why I was confused about what you were doing... :wink:

Kasji said:

Do you have anything else I should be on the look out for?

Well, handling functions correctly (part of which is setting up and destroying stack frames) is a tricky issue. You'll also need to handle memory management in your runtime engine (that will be fun...) which is a considerable task.

Kasji said:

On the note of function calls, I do have an idea for a function register. Let's say we're building our tree, and encounter a function definition. We could add the function to an STL Map, and a reference to the node in the tree where the function begins.

I would treat functions as entries of the global scope symbol table just like any other entry of the global scope. I wouldn't treat them specially as members of their own map.

Kasji said:

(by hash of function name, or just function name.)

I definitely wouldn't look things up by hash of the name. Let the container class (e.g. STL map) figure out how to store entries. Don't try to outsmart it; if you want hashing, use a container class that hashes.
       
Post is unread #14 Dec 29, 2007, 11:21 am
Go to the top of the page
Go to the bottom of the page

Kasji
Apprentice
GroupMembers
Posts62
JoinedDec 23, 2007

I've been doing a little research. There'll be two separate grammars. The two grammars are essentially identical, but one is designed to build the AST, and the second is designed to evaluate the AST. So the code above would be for the evaluator.

To keep track of stack frames I am thinking we may just use an STL container, that holds a reference to a node that frame is currently at, as well as a temporary stack for that frame. The default behavior for finding a symbol in a stack will be to start at the current frame and work upward to the global frame until the symbol is found. Function definitions will of course not be allowed in any frame other than the top frame (in the current grammar you could easily add a function definition inside a function definition, or logic block.

Handling the memory of a script, I see two paths generalized paths.

Option 1, we can create our own virtual address space. Like an operating system would assign a process its own address space in RAM, we could do the same for scripts, and call a stack overflow on any script that attempts to exceed its allocated space. Just thinking of it makes it seem tricky. We'd have to have our own garbage collector, etc.

Option 2, is perhaps less tedious, by imposing a limit on the total number of symbols in all frames combined, and in the case of strings, imposing a size limit on strings, so we would know that the maximum memory a script could possibly use would be: Max_Vars * (Max_String_Length + Overhead). This is referring to variables only, and not functions.

Functions are in the symbol table, and have their own stack frames, but the function code itself exists in the AST only, and not in any stacks. At this point I am not sure if there will have to be any size restrictions on the AST. Operating systems create segments in the allocated address space of a process. The text segment contains the program's byte code for executing, another segment contains the stack, and another segment contains the heap. The text segment is allocated at runtime equaling the size of the program's byte code.

Now I ask, what are the merits of leaving function symbols in their own table, versus being in their own table? From my logic (which may not be logical at all :) ) if we let functions have their own table, it will cost us a little more overhead, but lookups will be faster (depending on the script) because the tables will be smaller, and due to the way the language grammar is, it's auto-magically known whether a particular symbol is a function or variable, so there would be no confusion as to what table to look at.

And lastly, you are correct. I will let the container decide how to store/lookup keys.
       
Post is unread #15 Dec 29, 2007, 3:02 pm
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

Kasji said:

So the code above would be for the evaluator.

I'm not sure how exactly you plan on making that work, but if you've got a plan then that's fine. I would have walked the AST without using any kind of grammar, because you don't need a grammar to execute. The grammar is just for parsing the syntax, not representing semantics.

Kasji said:

Function definitions will of course not be allowed in any frame other than the top frame

Shame; anonymous functions (and functions as first-class values in general) are a very nice feature.

Kasji said:

Option 1, we can create our own virtual address space

There's not a lot of point representing an address space and all that for a language that doesn't allow pointer arithmetic. It would suffice to track blocks of allocated memory and then use whatever garbage collection scheme you want. Of course, garbage collection is a rather difficult problem...

Kasji said:

Option 2, is perhaps less tedious, by imposing a limit on the total number of symbols in all frames combined

Option 2 is less tedious but also won't work. You don't necessarily have a symbol for every piece of allocated memory; think of e.g. linked lists.

Kasji said:

but the function code itself exists in the AST only, and not in any stacks.

This works only if you disallow closures. I don't remember if PHP lets you return functions from functions.

Kasji said:

The text segment contains the program's byte code for executing, another segment contains the stack, and another segment contains the heap. The text segment is allocated at runtime equaling the size of the program's byte code.

Yes, but you're not implementing an operating system, and it isn't necessarily helpful to think of this in terms of an operating system. The OS is a very low-level systems application; you are implementing a high-level scripting language. The problems are of course similar but you shouldn't necessarily be using OS solutions.

Kasji said:

if we let functions have their own table, it will cost us a little more overhead, but lookups will be faster (depending on the script) because the tables will be smaller

Lookups will only be marginally faster if you have a decent container. Think of a binary search tree. Imagine you have 256 functions. That's a maximum of 8 hops. Now imagine you have 65,536 symbols, functions and all combined. That's still a maximum of 16 hops. And of course, this is without hashing, which dramatically improves lookup when you have string keys. As usual it makes sense to do empirical tests on these things before optimizing prematurely (which as we all know is the root of all evil).

Kasji said:

and due to the way the language grammar is, it's auto-magically known whether a particular symbol is a function or variable, so there would be no confusion as to what table to look at.

Yes, unless you want to pass around function pointers... :smile:


So do you still think it would be easier to implement a whole scripting language than it would be to plug in an existing one? :tongue:
       
Post is unread #16 Dec 29, 2007, 9:17 pm
Go to the top of the page
Go to the bottom of the page

Kayle
Off the Edge of the Map
GroupAdministrators
Posts1,195
JoinedMar 21, 2006

I think.. This is all over my head. and I'd much rather implement an existing one then to try and write my own. And since I'll soon have formal training in Java, I'd probably embed that if I really wanted to.
       
Post is unread #17 Dec 30, 2007, 11:50 am
Go to the top of the page
Go to the bottom of the page

Samson
Black Hand
GroupAdministrators
Posts3,639
JoinedJan 1, 2002

Most of this embedding stuff is over my head as well, but since PHP is fairly close to C/C++ and I'm already pretty good with PHP I'd be all for embedding PHP as a scripting language. At least then there's no time lag in having to learn Lua or Python or something else in addition to the rest :)
       
Post is unread #18 Dec 30, 2007, 11:52 am   Last edited Dec 30, 2007, 11:54 am by David Haley
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

One doesn't really embed Java. You'd embed Javascript (whose principal similarity to Java is in syntax, but anyhow), but embedding Java would be kind of like embedding C -- it's far too heavyweight to be a good choice for embedding.


EDIT:
Embedding PHP isn't a bad idea; what I'm arguing against is writing a PHP implementation from scratch. :) And incidentally, for the kind of programming that scripting would use, almost any language should be possible to learn in just an hour or two. (Of course, more advanced usage would take longer.)
       
Post is unread #19 Dec 30, 2007, 5:14 pm
Go to the top of the page
Go to the bottom of the page

Samson
Black Hand
GroupAdministrators
Posts3,639
JoinedJan 1, 2002

Heh. I guess we're not all scripting geniuses though who can learn stuff like that in an hour or two. I know I'm not. I'd need considerably longer before it started to sink in for even the basic stuff. I don't even have simple mudprog scripting down yet :P

I guess I must also have glossed over the details. I didn't notice Kasji was planning to reinvent the wheel with his own version of PHP. That does indeed seem like a silly thing to do when you have the language already available. Unless it's web based nature makes using it as-is impractical.
       
Post is unread #20 Dec 30, 2007, 6:16 pm
Go to the top of the page
Go to the bottom of the page

Kayle
Off the Edge of the Map
GroupAdministrators
Posts1,195
JoinedMar 21, 2006

I seem to always forget there's a language called java, and a language called javascript, I was actually referring to javascript but well, like I said, I always forget about that grand language known as Java. :P
       
Pages:<< prev 1, 2, 3 next >>