Login
User Name:

Password:



Register
Forgot your password?
Vote for Us!
Couple bugs
Dec 12, 2017, 5:42 pm
By Remcon
Bug in disarm( )
Nov 12, 2017, 6:54 pm
By GatewaySysop
Bug in will_fall( )
Oct 23, 2017, 1:35 am
By GatewaySysop
Bug in do_zap( ), do_brandish( )
Oct 18, 2017, 1:52 pm
By GatewaySysop
Bug in get_exp_worth( )
Oct 10, 2017, 1:26 am
By GatewaySysop
LOP 1.45
Author: Remcon
Submitted by: Remcon
LOP Heroes Edition
Author: Vladaar
Submitted by: Vladaar
Heroes sound extras
Author: Vladaar
Submitted by: Vladaar
6Dragons 4.3
Author: Vladaar
Submitted by: Vladaar
Memwatch
Author: Johan Lindh
Submitted by: Vladaar
Users Online
CommonCrawl, DotBot

Members: 0
Guests: 11
Stats
Files
Topics
Posts
Members
Newest Member
477
3,705
19,232
608
LAntorcha
Today's Birthdays
There are no member birthdays today.
Related Links
» SmaugMuds.org » Codebases » SWR FUSS » Emergency Copyover
Forum Rules | Mark all | Recent Posts

Emergency Copyover
< Newer Topic :: Older Topic >

Pages:<< prev 1, 2 next >>
Post is unread #1 Apr 15, 2008, 7:39 pm
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005

I'm using emergency copyover in my MUD, which, in the event of a possible crash, copies over the MUD instead. I'm aware of the problems with this system and I don't care, I simply need insight as to why moving to a new server has significantly decreased the use of the system.

I was on Infolaunch, and they run Debian, and I've moved to a Genesismuds server which is using Red Hat. If you need more information on the servers, or snippets of the emergency copyover code, tell me and I will supply them.

On Infolaunch, every crash except, infinite loop crashes, was caught and rectified with a copyover by the system. On Genesismuds, only the first crash is caught and the MUD copies. Any after that and the emergency copyover system doesn't engage, it simply crashes without a copyover. This is obviously a problem and I'd like help in fixing this. If nothing else, what is a solution to the problem? Can the Genesismuds people fix something on their end to make it work, or can I add or remove something in my code to make it work? Or should I simply go to a startup script? If so, how would one of those be done?

Thanks in advance.
       
Post is unread #2 Apr 15, 2008, 7:55 pm
Go to the top of the page
Go to the bottom of the page

Zeno
Sorcerer
GroupMembers
Posts723
JoinedMar 5, 2005

All I know is that it's possible. I know someone else on that host who uses it.
       
Post is unread #3 Apr 15, 2008, 8:11 pm   Last edited Apr 15, 2008, 8:12 pm by Banner
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005

Yes, I seen a person on the host mention it in their helpfiles, so I emailed them for insight. Perhaps that would be the only way? And could you ask whoever you know about what they did to fix theirs?
       
Post is unread #4 Apr 15, 2008, 8:23 pm
Go to the top of the page
Go to the bottom of the page

Zeno
Sorcerer
GroupMembers
Posts723
JoinedMar 5, 2005

AFAIK, it always worked for them.
       
Post is unread #5 Apr 15, 2008, 9:27 pm
Go to the top of the page
Go to the bottom of the page

Kayle
Off the Edge of the Map
GroupAdministrators
Posts1,195
JoinedMar 21, 2006

Easy solution, get off Genesismuds. :P
       
Post is unread #6 Apr 15, 2008, 9:41 pm
Go to the top of the page
Go to the bottom of the page

Quixadhal
Conjurer
GroupMembers
Posts398
JoinedMar 8, 2005

This sounds like a case of single handlers not resetting themselves after triggering.

The default behavior on some systems (older redhat, solaris) is that when a signal handler fires, the default handler is restored before the custom handler is called. This is to prevent loops where a repeating signal can interrupt the handler that's supposed to be dealing with it. If your system does this (I believe there's a one-shot property), the correct way is to have the handler re-register itself before exit AFTER it has dealt with the issue.

You may also be able to set the properties on the handler itself so it isn't one-shot.

It's been quite a few years since I dealt with this. It was different between SunOS 4 and Linux 1.1.x, I can tell you that. :)
       
Post is unread #7 Apr 16, 2008, 5:15 am
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005


Quixadhal said:

This sounds like a case of single handlers not resetting themselves after triggering.

The default behavior on some systems (older redhat, solaris) is that when a signal handler fires, the default handler is restored before the custom handler is called. This is to prevent loops where a repeating signal can interrupt the handler that's supposed to be dealing with it. If your system does this (I believe there's a one-shot property), the correct way is to have the handler re-register itself before exit AFTER it has dealt with the issue.

You may also be able to set the properties on the handler itself so it isn't one-shot.

It's been quite a few years since I dealt with this. It was different between SunOS 4 and Linux 1.1.x, I can tell you that. :)




That sounds like it could be a problem, so if that were the case, how would I fix it? Specific details if possible? :-p
       
Post is unread #8 Apr 16, 2008, 11:26 am
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

man 2 signal
man 2 sigaction

is what you want.
       
Post is unread #9 Apr 16, 2008, 11:29 am
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005


DavidHaley said:

man 2 signal
man 2 sigaction

is what you want.


What do I do with that.. where do I put it?

Here is the function if that helps:
[code]
void emergancy_copyover( void )
{
FILE *fp;
DESCRIPTOR_DATA *d;
char buf[100], buf2[100], buf3[100], buf4[100], buf5[100];

log_string( "--- Engaging Emergency Copyover! ---" );

fp = fopen( COPYOVER_FILE, "w" );

if( !fp )
{
log_string( "Could not write to copyover file!" );
perror( "emergancy_copyover:fopen" );
return;
}

sprintf( buf, "\n\r [ALERT]: EMERGENCY COPYOVER - DIVERTING CRASH\n\r" );

for( d = first_descriptor; d; d = d->next )
{
CHAR_DATA *och = CH( d );
d_next = d->next;

if( !och || !d->character || d->connected > CON_PLAYING )
{
write_to_descriptor( d->descriptor, "\n\rSorry, we are rebooting. Come back in a few minutes.\n\r", 0 );
/*
* close_socket (d, FALSE);
*/
}
else
{
fprintf( fp, "%d %s %s\n", d->descriptor, och->name, d->host );
save_char_obj( och );
write_to_descriptor( d->descriptor, buf, 0 );
}
}
fprintf( fp, "-1\n" );
fclose( fp );

fclose( fpReserve );
fclose( fpLOG );

sprintf( buf, "%d", port );
sprintf( buf2, "%d", control );
sprintf( buf3, "%d", control2 );

dlclose( sysdata.dlHandle );

execl( EXE_FILE, "swreality", buf, "copyover", buf2, buf3, ( char * )NULL );

perror( "emergancy_copyover: failed to copyover in 'execl'" );

if( ( fpReserve = fopen( NULL_FILE, "r" ) ) == NULL )
{
perror( NULL_FILE );
exit( 1 );
}
if( ( fpLOG = fopen( NULL_FILE, "r" ) ) == NULL )
{
perror( NULL_FILE );
exit( 1 );
}
}
       
Post is unread #10 Apr 16, 2008, 11:41 am
Go to the top of the page
Go to the bottom of the page

Quixadhal
Conjurer
GroupMembers
Posts398
JoinedMar 8, 2005

Banner said:


DavidHaley said:

man 2 signal
man 2 sigaction

is what you want.


What do I do with that.. where do I put it?


They're man pages, you read them. :)

Once you've read them, you go find where signal or sigaction is called in your code and figure out what options need to be set to make the signals persistent, or how to reschedule the handler... depending on which way you want to do it.

Remember, it's not the copyover function itself that's the problem, it's the signal handler which calls it when you get a signal (such as SIGSEGV, SIGBUS, etc).
       
Post is unread #11 Apr 16, 2008, 11:59 am
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005

Okay, I looked at those, and there are no calls for sigaction. I found a call for signal, and these are the only ones:

   signal( SIGPIPE, SIG_IGN );
   signal( SIGALRM, caught_alarm );
   signal( SIGSEGV, SegVio );
   signal( SIGTERM, SigTerm );   /* Catch kill signals */



I'm guessing the offending one would be SIGSEGV, but how do I change that to make it "reset" or "reschedule" or whatever? It doesn't look like it accepts new arguements? Can you tell me what to do with it?
       
Post is unread #12 Apr 16, 2008, 7:41 pm
Go to the top of the page
Go to the bottom of the page

Quixadhal
Conjurer
GroupMembers
Posts398
JoinedMar 8, 2005

If your OS defaults to having signals be one-shot (SA_ONESHOT or SA_RESETHAND is on), you can either switch to using sigaction() so you can pass flags and unset that... or... you can include a call to reset your handler at the end of your handler code itself, but make sure it was successful or you may get into infinite loops.

You could try resetting it right before the copyover, however that might get you into an infinite loop if whatever caused the crash would affect the copyover routine.

In short, your SegVio() routine must call the copyover handler (somehow, maybe indirectly) to cause the game to restart on a crash. I suspect the code that sets the signal handlers isn't being called on restore from a copyover *OR* processes in the same process group inherit signal handlers (which would be very strange to me).
       
Post is unread #13 Apr 16, 2008, 7:51 pm
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005

.. So could you tell me the exact line to put, and where? Obviously I'm not familiar with this or I'd have fixed it myself already.. but this is a major problem and whatever help you're able to give me is appreciated.
       
Post is unread #14 Apr 17, 2008, 12:27 am
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

In the signal handler, try putting another call to register a signal handler. Signal handlers are registered with "signal".
       
Post is unread #15 Apr 17, 2008, 5:22 am
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005

So put signal( SIGSEGV, SegVio ); at the bottom of those signal callers again? What will that do?
       
Post is unread #16 Apr 17, 2008, 5:59 am
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005


Quixadhal said:

If your OS defaults to having signals be one-shot (SA_ONESHOT or SA_RESETHAND is on), you can either switch to using sigaction() so you can pass flags and unset that... or... you can include a call to reset your handler at the end of your handler code itself, but make sure it was successful or you may get into infinite loops.

You could try resetting it right before the copyover, however that might get you into an infinite loop if whatever caused the crash would affect the copyover routine.

In short, your SegVio() routine must call the copyover handler (somehow, maybe indirectly) to cause the game to restart on a crash. I suspect the code that sets the signal handlers isn't being called on restore from a copyover *OR* processes in the same process group inherit signal handlers (which would be very strange to me).


I understand what you mean, I just don't know the arguements to use, or where exactly to put them, so that'd help. I gather that I should either change signal to sigaction to include an arguement to reset the signal, or put something else at the botto mof the SegVio function to manually reset it. Can you tell me exactly what to do?
       
Post is unread #17 Apr 17, 2008, 8:03 am
Go to the top of the page
Go to the bottom of the page

David Haley
Sorcerer
GroupMembers
Posts903
JoinedJan 29, 2007

As the man page says, the use of "signal" registers a new signal handler for the signal given. Since the hypothesis we are using is that your new OS is unregistering the handler after a single invocation, what you want to do is to re-register the handler at the end of the signal handling. After handling the segment violation signal (SEGV), you want to re-register it. After handling the termination signal, you re-register it. And so forth. You would use the same arguments that the registering functions already use. I don't think you need to change to sigaction unless you try the above and it fails to work.
       
Post is unread #18 Apr 17, 2008, 11:31 am
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005


DavidHaley said:

As the man page says, the use of "signal" registers a new signal handler for the signal given. Since the hypothesis we are using is that your new OS is unregistering the handler after a single invocation, what you want to do is to re-register the handler at the end of the signal handling. After handling the segment violation signal (SEGV), you want to re-register it. After handling the termination signal, you re-register it. And so forth. You would use the same arguments that the registering functions already use. I don't think you need to change to sigaction unless you try the above and it fails to work.


I tried putting signal( SIGSEGV, SegVio ); at the end of the SegVio function, at the end of the emergency_copyover function, and at the bottom of where the other signals were declared at. No change. Did I place them wrong or should I try the sigaction now? If so, what needs to be used exactly, and where do I put it?
       
Post is unread #19 Apr 17, 2008, 11:36 am
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005

I tried sigaction( SIGSEGV, SegVio, SA_RESETHAND); but apparently that's not correct for arguements 2 and 3:

[code]
Lines:
sigaction( SIGSEGV, SegVio, SA_RESETHAND );

Errors:
[shoie13@harbinger src]$ make
make -s swreality
Compiling o/comm.o
comm.c: In function `emergancy_copyover':
comm.c:514: warning: passing arg 2 of `sigaction' from incompatible pointer type
comm.c:514: warning: passing arg 3 of `sigaction' makes pointer from integer without a cast
comm.c: In function `SigTerm':
       
Post is unread #20 Apr 17, 2008, 9:08 pm
Go to the top of the page
Go to the bottom of the page

Banner
Magician
GroupMembers
Posts169
JoinedNov 29, 2005

My host helped me with this much, but it still didn't work. I did determine that this was the one triggering the ecopy. I removed the handler and the MUD crashed without copying, and with the new sigaction one, it still only copies once then crashes the next. What's wrong?

void game_loop(  )
{
   struct timeval last_time;
   char cmdline[MAX_INPUT_LENGTH];
   DESCRIPTOR_DATA *d;
/*  time_t      last_check = 0;  */
   struct sigaction act;

   {
   sigemptyset(&act.sa_mask);
   act.sa_flags = SA_RESETHAND;
   act.sa_handler = (void*) SegVio;

       if (sigaction(SIGSEGV, &act, NULL) == -1)
       {
         log_string( "Failed to install signal handler for SIGSEGV.\n";);
         return;
       }
  }


   signal( SIGPIPE, SIG_IGN );
   signal( SIGALRM, caught_alarm );
//   signal( SIGSEGV, SegVio );
   sigaction(SIGSEGV, &act, NULL);
   signal( SIGTERM, SigTerm );   /* Catch kill signals */
   gettimeofday( &last_time, NULL );
   current_time = ( time_t ) last_time.tv_sec;
       
Pages:<< prev 1, 2 next >>