Scope Change: Impact Study

Discussion in 'CommandHelper' started by LadyCailin, Aug 25, 2014.

  1. LadyCailin

    LadyCailin Administrator Developer

    So, one thing that I've found to be quite annoying is the way that global scope works in CH. It would be nice to change up the way scope works, to more closely align with C type languages, that is, { } indicates a variable scope. So my question is, how much impact would this have on your scripts? In many cases, probably none, but it's possible, so I wanted to ask. What this change would entail is this:

    Code (PHP):

    if(someValue()){
        @a = 'string1';
    } else {
        @a = 'string2';
    }

    // @a is no longer in scope here
     
    Scoping would work regardless of whether or not you use brace syntax, though raw braces would be allowed to create inner scopes:

    Code (PHP):

    {
        @a = 'string';
    }
    // @a is no longer in scope here
     
    With the addition of type hinting (soon to be strong typing), having global scope leads to several problems. Generally speaking, scopes help to isolate potential problems to a more limited portion of code, and the compiler can help identify these problems, assuming scope is used.

    I also hesitate to make this vary based on strict mode being on or not, I think it should be all or nothing. Once I add strong typing, I can easily detect use of unassigned variables, so if your code would be affected by this, it would need to be rewritten, but the compiler would help you catch all errors, so you shouldn't have an issue with random bugs being introduced.

    Thoughts on this?

    Edit: Let me update this and list the pros of this change.

    1. With strong typing in place, changing the type of a variable mid scope will be a questionable programmer operation. Consider the following code:

    Code (PHP):

    void proc _myProc(string @value){
        msg(@value);
    }

    string @a = 'string';

    _myProc(@a);

    int @a = 5;

    _myProc(@a);
     
    If you merely skimmed this code, you might miss the type change of @a from string to int, and wonder why the _myProc(@a) (where @a is an int) isn't working.

    2. Accidentally unassigned variables will be caught:

    Code (PHP):

    @v = userInput();
    if(@v == 1){
        @msg = 'text 1';
    } else if(@v == 2){
        @msg = 'text 2';
    } else if(@v == 3){
        @msg = 'text 3';
    }
    // We forgot a catch all else condition here, so it's possible that @msg is undefined still.
    // In those cases, it is highly unlikely that we meant for @msg to be null, and if it was, it
    // wouldn't have been hard to assign @msg = null at the top.
    msg(@msg);
     
    3. Automatic variable destruction and memory optimizations. Once a variable goes out of scope, the garbage collector can clear it from memory faster, leading to better running scripts:

    Code (PHP):

    if(something()){
        array @a = reallyReallyBigArray();
        // Do something with the array
    }

    // @a can be destructed at this point, and all the memory used can now be freed.
    // This happens automatically, though previously you could still have done this by
    // setting @a to null or something. This prevents you from having to remember
    // to do that though. This is more easily demonstrated with the following code:

    for(int @i = 0, @i < 5, @i++){
        msg(@i);
    }

    // @i is no longer needed here, but will stick around anyways, taking up memory, albeit a small amount.
     
    4. Compiler optimizations. The compiler can more easily track variables, thus leading to more optimal execution:

    Code (PHP):

    if(something()){
        int @a = 5 ** 2;
        msg(@a); // This can be refacted to msg(25); by the compiler
    }
     
    While technically this optimization can be done already, it is far less likely that the compiler would be able to actually perform the optimization, as if the value of @a in this case was used elsewhere, then full static analyzation would need to be done to determine scope based on assignments... much harder to do, and in many cases still impossible.

    5. Better flow control and readability. It is much easier to read and determine the impact of various variables in a small scope, rather than having to grok the entire codebase.

    6. Accidental reassignment caught. If you accidentally reassign a variable in a lower scope, this is caught and made a compile error.

    Code (PHP):

    string @a = 'my message';

    if(something()){
        int @a = 4; // Compile error here, @a is already defined in this scope. Previous behavior would
        // be to overwrite @a with 4, in the global scope.
    }

    msg(@a)
     
    7. Eliminates type ambiguity.

    Code (PHP):

    if(something()){
        int @a = 5;
    } else {
        string @a = 'string';
    }

    _myProcThatAcceptsAnInt(@a);
     
    One of two things must happen here, either the string @a definition must be an error (type redeclaration) or type safety has to be lost on the proc call.

    8. Global scope is bad. This article covers many many good points: http://c2.com/cgi/wiki?GlobalVariablesAreBad

    Cons:
    There is one con, that is, you'll have to go through and update any code that doesn't use scoping properly, but as I said, the compiler will be able to catch this, and initially I can make it compiler warnings. (the compiler has been updated so that most compile errors "stack" now, and you won't have to do multiple reloads to catch errors.)
    Last edited: Aug 26, 2014
  2. kookster

    kookster New Member

    Just because C type languages does it doesn't make it right. This would completely and utterly break all of my code. Now for things like a proc/function or a closure i understand because there is scoping but in a if statement? thats just dumb.
  3. LadyCailin

    LadyCailin Administrator Developer

    Well, in order to "fix" it, you would need the following. Consider your possible old code:

    Code (PHP):

    if(someCondition()){
        @msg = 'You can\'t do that';
    } else {
        @msg = 'Done!';
    }

    msg(@msg);
     
    All you need to do to fix this is:

    Code (PHP):

    // Type declaration is optional, though preferred.
    // We are defining the value to be in this scope
    string @msg;

    if(someCondition()){
        @msg = 'You can\'t do that';
    } else {
        @msg = 'Done!';
    }

    msg(@msg);
     
    In the old code, it would detect use of the @msg in msg(@msg) and cause a compile error on that line.
  4. __import__

    __import__ Member

    I'm personally all for this. Scope is good.
  5. PseudoKnight

    PseudoKnight Well-Known Member

    I'm always willing to rewrite my code, but this is a sweeping change and will potentially introduce bugs that the compiler won't be able to pick up on. So that's a lot of code scanning and bug fixing. I can probably handle it, but can we expect everyone else to be able to? In my opinion breaking changes should happen infrequently and should be obvious, usually accompanied by a version change and maybe a temporary compatibility branch. Another idea is to start putting these breaking changes into the strict type mode, so that people can start converting over slowly. Then later require block scope. In any case, you don't want to unnecessarily hold back servers updating CH dependencies because they can't get a compatible version without updating their scripts.

    So I'd like this, but don't underestimate the impact this would have on the code that's being used out there. This would be hard on some of the more casual programmers or dependent server owners.
    Last edited: Aug 25, 2014
  6. PseudoKnight

    PseudoKnight Well-Known Member

    As an experiment, I went over a single script file with an unimportant set of tool aliases to see how often this occurred in my code. In ~300 lines, I had to fix 6 variables so that they initialized outside of block scope -- all if() blocks not unlike your latest example. In some cases, though, this actually got me to fix up the code a bit.

    So then I looked at an equivalently sized but more recent mini-game I wrote. In ~300 lines, there was nothing I needed to change.
  7. LadyCailin

    LadyCailin Administrator Developer

    So, that number, 6, is an interesting metric, but also important is, how many of those errors would the compiler have caught? I'm slightly less concerned with breaking backwards compatibility, if there's a compelling reason to do so, and the compiler (or some tool) can help you upgrade quickly. To be clear though, I'm not underestimating anything, hence why we're having this discussion, as opposed to me merely going ahead with the changes.

    I think what I'm going to do is implement this in another branch, and we can then run that branch on some real world code, and actually see what kind of impact we are looking at. If the impact is too large, we'll regroup, and see where to go from there. The changes shouldn't be all that difficult to add, once I get the rest of the changes I need to make to implement type safety.
  8. LadyCailin

    LadyCailin Administrator Developer

    Also, I hesitate to make this feature optional with strict mode, as that would significantly change the behaviour of code, depending on if you're in strict mode or not. Strict code should work the exact same in non-strict mode.