in
Support Center

Server Crashes (hard with no BSOD) upon initiating a Full Backup

Last post 07-23-2008 9:36 AM by StephenL. 6 replies.
Page 1 of 1 (7 items)
Sort Posts: Previous Next
  • 05-21-2008 10:17 AM

    Server Crashes (hard with no BSOD) upon initiating a Full Backup

    I have a client running 4 servers, we have been unable to add one of the servers to the imaging process.  As soon as the backup job starts, the server crashes.  The only warning is a freeze of the keyboard/mouse and within a few seconds the server drops.  There is no BSOD and no memory dump file, the server simply drops and has to be powered back on manually.

    Specs:  HP Proliant ML330 G3 (there are 2 of these in the environment, one of them is not having any problems)

    Drives:  6 drive RAID 5 partitioned into C: (for system) and D: (for data)

    OS:  Windows Server 2003 Standard Edition SP2

    RAM:  3GB

    This error only occurs when adding the data volume (D:) to the backup process.  When only selecting the C: drive the process works fine.  From the other posts it would seem to lead me to a possible hard drive issue but chkdsk does not report any errors or bad sectors.  I also do not have any other indication that there is an issue with the drives but have ruled out the SCSI controller (based on the controller also controls C: and has been swapped for like make-model with same result).

    *Acronis, True Image or other software is not and has not ever been installed.  The only 'unusual' software is a database service running on the machine - "Advantage Database Server"

    Unfortunately, the way the server fails there are no error reports to review.  Does anyone have any suggestions on what might be causing this or what I might try to further troubleshoot this issue? 

    Thoughts and suggestions are greatly appreciated.

     

    Filed under: ,
  • 05-21-2008 2:59 PM In reply to

    Re: Server Crashes (hard with no BSOD) upon initiating a Full Backup

    Does the failed backup appear in the Backup History in ShadowProtect?  If so, what is the last line shown for that job?  (Feel free to post the whole log for that job).  Also, you may want to look in your System event log.  Check for any errors or warnings originating from disk, ftdisk, ntfs, dmio, or your storage controller driver.  Also make note of any other errors in the System or Application log that may be related, particularly anything around the time of the crash.  Since the C: drive backs up just fine on its own, this is probably not a software conflict.  Sounds more like a hardware problem, but let me know what you find.
  • 05-22-2008 9:48 PM In reply to

    Re: Server Crashes (hard with no BSOD) upon initiating a Full Backup

    A complete server failure like that could be related to a faulty RAM module. Remove all but a single module and try it again. The faulty module might not be utilised until the data partition is being imaged. Stranger things have happened but it's a start when it comes to diagnosing hardware faults.

    There could also be a hard drive failure that is being ignored because of the RAID 5 disk parity, but when shadow protect tries to backup those sectors it poops itself and kills the server. Chkdsk may not pick up on it thanks to the RAID controller using the parity.

    Are you using the HP RAID Configuration Utility to check out the health etc of the RAID array?

  • 05-23-2008 8:52 AM In reply to

    Re: Server Crashes (hard with no BSOD) upon initiating a Full Backup

    There are no System Event log entries for the failure on the server that crashes.  Unfortunately, there are also no entries for possible failures or predictive failures of the drives.

     Here is the log from the backup:

    07-May-2008 18:44:49 service 100 service (build 39) started job  manually as incremental
    07-May-2008 18:44:49 service 104 3.0.0.4
    07-May-2008 18:44:49 service 100 backup volume E:\ (NTFS)
    07-May-2008 18:44:49 service 100 VDIFF was disabled and then enabled on E:\
    07-May-2008 18:44:49 service 102 creating snapshots for  \\?\Volume{bd1e4deb-2f8d-11da-97ac-001185c11e12} \\?\Volume{138854f5-2f28-11da-b167-806e6f6e6963}
    07-May-2008 18:44:49 service 199 try snapshot by  VSS API by STC provider
    07-May-2008 18:44:49 service 199 all possible methods:  VSNAP API directly VSS API by STC provider and no other alternate methods
    07-May-2008 18:45:36 service 150 retrieving snapshot names.
    07-May-2008 18:45:36 service 103 snapshot was created by  VSS API by STC provider. It took 47 seconds
    07-May-2008 18:45:36 service 104 image will be created by VDIFF
    07-May-2008 18:45:36 sptask 110 worker thread has started
    07-May-2008 18:45:36 sptask 111 sbrun -mdn ( sbvol -fi \\?\STC_SnapShot_Volume_21_1 \\?\Volume{bd1e4deb-2f8d-11da-97ac-001185c11e12} : sbcrypt -64 : sbfile -w smb://\\MCCAL5\D$\Backup\71D87AD3EAEC4C07-MCCAL1\E_VOL-b001.spf )
    07-May-2008 18:45:36 (loader) 112 corelogic version: 1.0.0.206
    07-May-2008 18:45:36 sbvol 109 free space exclusion on
    07-May-2008 18:45:36 sbvol 107 incremental tracking is on, generation count: 0
    07-May-2008 18:45:36 sbvol 117 incremental tracking is on, 507605800 of 507605800 (100%) updated sectors
    07-May-2008 18:45:48 sbvol 101 successfully opened volume \\?\Volume{bd1e4deb-2f8d-11da-97ac-001185c11e12}
    07-May-2008 18:45:48 sbvol 107 FAT system area sectors: 0
    07-May-2008 18:45:48 sbvol 112 file \\?\STC_SnapShot_Volume_21_1\pagefile.sys excluded OK
    07-May-2008 18:45:48 sbcrypt 109 filter started
    07-May-2008 18:45:48 sbfile 620 Enter the user name to access smb://\\MCCAL5\D$\Backup\71D87AD3EAEC4C07-MCCAL1\E_VOL-b001.spf
    07-May-2008 18:45:48 sptask 101 answer was sent as UNICODE string
    07-May-2008 18:45:48 sbvol 107 disk MBR sectors: 1
    07-May-2008 18:45:48 sbvol 107 first track sectors: 63
    07-May-2008 18:45:48 sbvol 109 disk CHS 35421/255/63 partition 2
    07-May-2008 18:45:48 sbfile 621 Enter the password to access smb://\\MCCAL5\D$\Backup\71D87AD3EAEC4C07-MCCAL1\E_VOL-b001.spf
    07-May-2008 18:45:48 sptask 101 answer was sent as UNICODE string
    07-May-2008 18:45:48 sbcrypt 107 compression mode: 6
    07-May-2008 18:45:48 sbcrypt 107 encryption mode: 4
    07-May-2008 18:45:48 sbcrypt 600 Please enter the encryption password
    07-May-2008 18:45:48 sptask 101 answer was sent as UNICODE string
    07-May-2008 18:45:49 sbfile 101 successfully opened file \\MCCAL5\D$\Backup\71D87AD3EAEC4C07-MCCAL1\E_VOL-b001.spf
    07-May-2008 18:46:58 service 101 job was cancelled while sbrun was running.   <<<<<<< this is about where the server crashed/turned off
    07-May-2008 18:46:58 sptask 116 cancellation signal was sent to engine
    07-May-2008 18:46:58 (loader) 111 cancellation signal caught
    07-May-2008 18:46:58 sbfile 109 deleting the incomplete output file(s)
    07-May-2008 18:46:58 sbvol 109 fini done
    07-May-2008 18:46:58 sbcrypt 109 fini done
    07-May-2008 18:46:59 sbfile 109 fini done
    07-May-2008 18:46:59 sptask 115 sbrun.exe was cancelled
    07-May-2008 18:47:00 service 105 snapshots were destroyed

     

  • 05-23-2008 9:16 AM In reply to

    Re: Server Crashes (hard with no BSOD) upon initiating a Full Backup

    You should probably update ShadowProtect to version 3.2 and see if that helps.  If this is a known issue or interop problem with some other software, it may have already been fixed.
  • 05-23-2008 10:03 AM In reply to

    Re: Server Crashes (hard with no BSOD) upon initiating a Full Backup

    I will follow up on the possible RAM problem.  With respect to the RAID controller, the HP RAID CU does not seem to read the cards used.  They are Adaptec SCSI cards.  The Adaptac Storage Manager utility does not report any issues.

    I was considering the Hard drives as a possible failure point, thank you for the confirmation on this.

  • 07-23-2008 9:36 AM In reply to

    Re: Server Crashes (hard with no BSOD) upon initiating a Full Backup

    I have finally been able to complete my analysis.  RAM, and other hardware on the server all checked out as fine.  The problem was resolved when all of the data on the volume was moved to another, temporay, server.  After this cleanup I was able to successfully join the volume to the BDR without resulting in the server crashing.

     Since this, I have moved all data back on to the server with the backup system still operating and have not had a single problem with the server.  I have all volumes from the server added to the BDR and they are all backing up incrementally without issue.  Unfortunately, I do not know if there was a specific piece of malformed data that was causing the problems but it would appear the removal of 'something' corrected the problem.

     Thank you for the support and suggestions, I hope this result assists someone else if a similar problem occurs.

Page 1 of 1 (7 items)
(c) StorageCraft Technology Corporation 2008