How to Persist Configuration Changes in ESXi 4.x/5.x Part 1 by William Lam 8 Comments The question of persisting configuration changes in ESXi and the expected behavior using the ESXi Shell is a question that comes up quite frequently.
Hello,I have a virtual host running ESXi 5.5. I open up vSphere to check on our alerts and notices when I see a message I've never seen before. On the configuration tab there is the following message.And all powered off machines (we have a few that are pending deletion) look like this:Is this something I should be concerned about? Should I reboot the server?
When I check the health status, everything has green check marks next to them (and I expand every category possible).As I'm looking through this, I see some other notices that Datastores haven't been configured. I'm not certain as to why it's suddenly asking about Datastores. I replaced a hard drive in the array a week or two ago and everything looked fine and health notices subsided so I'm not sure what happened. The powered on machines seem to be function correctly, including server-side AV management, VPN, SMTP relays, among other things on guest machine are lookinAnyone have some experience with this type of thing or have any idea what might be going on here?Edited Feb 9, 2015 at 15:00 UTC. So, another update on this a couple of months later.
Looks like the same drive that died originally is indicating similar behavior and alerts. I have a predictive failure on Drive 1 Bay 1. Dell support took a look at our RAID logs which they acquired with a script. They determined high levels of corruption on drive 3. When I lost my datastore the first time, apparently the array was rebuilding drive 1 with the corruption on drive 3 causing the datastore to disappear. Even though I rebuilt the array from scratch, the controller apparently still had metadata on it or physically drive 3 had an issue that allowed the corruption to persist through the rebuild.
So 2 months later, drive 1 in bay 1 again is indicating predictive failure because of drive 3. Due to the corruption, drive 1 is marking a lot of blocks as bad and thus we see a predictive failure state in vSphere.Mystery solved, hopefully.The solution:Dell is sending me 2 drives. We will clear all metadata from the RAID controller, replace the drives, and rebuild the RAID array from scratch.I am unable to copy the.vmdk files from the datastore due to file operation errors, I'm assuming, as a result of the corruption in the drive array. This could be a blessing in disguise since there's a potential of copying corrupted files to the new array with the new drives.
EDIT: Meant to send this sooner sorry if it seems some questions were answered ahead of time.Thanks for the quick replies. Any suggestions as to how I should be more thorough in the event I do take it down? What can I look for that might indicate what has gone wrong? I have a little experience managing ESXi but not much in troubleshootingShould I simply start with a clean reboot to see if it comes back up clean? Does anyone suspect that this has anything to do with the hard drive I replaced?
It was a hot swap and I haven't taken it down since then (about a week or 2 ago when I did that). BirdLaw wrote: Does anyone suspect that this has anything to do with the hard drive I replaced?
It was a hot swap and I haven't taken it down since then (about a week or 2 ago when I did that).I think it might help to know what hardware you are running on (so we can know what tools you might use), how storage is configured (local vs remote) and what level of protection (RAID and what level, or single drive). Right now you have access to the ESXi console so that is working, but obviously no VMs running because your datastore is missing.One might try rescanning for datastores and see if it comes back.
VMs are off and so a restart can't kill anything in my opinion. Make sure to kick it in maint. Mode before the reboot if you do that.Other than that, details, details. Let me clarify.
I mentioned in the OP that powered down VMs were looking like the second screenshot but powered on VMs are all working fine. I can RDP into our powered on machines and I can use all of the Guest Machine's features and functions.The only thing is, any other old machines that are pending deletion looks like the second screenshot and they can't be powered on or anything like that.This server is a DELL PowerEdge R710. I don't know what kind of RAID Controller is in it. I may need some more specific instruction because, while I can manage most simple tasks as it pertains to guest machines, I still haven't had to do much troubleshooting with host issues.I'm not sure if I know iSCSI vs. NFS (Never heard of the term NFS before). Does this screenshot tell us that we use iSCSI?Also, rescanned the datastore but didn't have any changes in the errors.Gary,First of all, the USB comment was separate from this issue more or less. We are not running ESX from a USB drive.
The passthrough was a completely separate issue altogether and probably had nothing to do with this issue at all. You may as well forget that it was mentioned.Unfortunately, due to my lack of experience in this area, I'm not really sure if I can answer all of those questions as mentioned above. I will give it the old college try.
Let me know if these answers don't make sense, I'm answering to the best of my ability.1. Storage System - This is all local storage. We use a Raid controller.
It is expandable but it should be on-board. It's RAID 52. NFS v iSCSI - I don't really know, see information above, hopefully the screenshot answers the question3.
Ping of Storage System - Well, I know some VMs are running just fine, the storage is all onboard so I would hope it can reach the storage system. I'm not sure how I would check into that, though.4. From hot swap - errors - I don't think so, the health status indicator lights were specifying pre-emptive fail but since the replacement and rebuild of the data on that drive, the health status has been green.
As you know, there are many log files and being green with this, I'm not quite sure what log to look at for those types of errors.In the meantime, I'm going to reboot the host and hope that nothing goes horribly wrong. I will post results here!Thanks for all of the input so far! I really appreciate the Spice Community! BirdLaw wrote:1. Storage System - This is all local storage.
We use a Raid controller. It is expandable but it should be on-board. It's RAID 52. NFS v iSCSI - I don't really know, see information above, hopefully the screenshot answers the questionOk, so it's all local storage and there is no storage array you're connecting to. That means that there is no iSCSI or NFS to worry about.However, it does mean that either the RAID card has failed or you've had a double disk failure and lost the RAID 5 array.At this point you're best off trying a reboot and seeing if there are any error messages from the RAID card (I assume a PERC as the hardware is Dell).Do you have backups? BirdLaw wrote:4. From hot swap - errors - I don't think so, the health status indicator lights were specifying pre-emptive fail but since the replacement and rebuild of the data on that drive, the health status has been green.
As you know, there are many log files and being green with this, I'm not quite sure what log to look at for those types of errors.In the meantime, I'm going to reboot the host and hope that nothing goes horribly wrong. I will post results here!Thanks for all of the input so far! I really appreciate the Spice Community!A screenshot of the RAID config and status from the iDRAC may help work out what's going on.
Hutchingsp wrote:Gary D Williams wrote:At this point you're best off trying a reboot and seeing if there are any error messages from the RAID card (I assume a PERC as the hardware is Dell).Do you have backups?I wouldn't reboot it yet but that's because I'm still not clear if there are other VM's running on this host that are working.Fair point. I was assuming that every VM on that host was down and showing inaccessible.If they are on local storage I'd like to see the storage configuration page of the idrac. So, here's the situation,This is the older of 2 hosts we have on the network. It is basically being used for utility machines/servers.Many of the guest VMs were powered down and are going through a grace period and are pending deletion when enough time has gone.
However, we do have a couple of utility machines that are actively being used for SMTP relay servers, VPN servers, and AV management software. I did not show the powered on VMs because their names displayed correctly and I didn't want to share too much information with the world. All the powered off VMs look as they did in the second screenshot. I do not have the option to power these machines on, nor would I because I can't see which one is which and there's at least one server that could cause us issues if it was powered on with the NIC enabled.Again, everything that is powered on guest seems to be working correctly right now.Also, we don't have backups for these machines but they can be rebuilt without too much trouble.
I'm not concerned about that as there are no business critical files/data on this host. Gary D Williams wrote:hutchingsp wrote:Gary D Williams wrote:At this point you're best off trying a reboot and seeing if there are any error messages from the RAID card (I assume a PERC as the hardware is Dell).Do you have backups?I wouldn't reboot it yet but that's because I'm still not clear if there are other VM's running on this host that are working.Fair point. I was assuming that every VM on that host was down and showing inaccessible.If they are on local storage I'd like to see the storage configuration page of the idrac.I don't have much experience with iDRAC just yet.
Give me a little time to look it up and I will drop a screenshot here when I have it. BirdLaw wrote:Gary D Williams wrote:hutchingsp wrote:Gary D Williams wrote:At this point you're best off trying a reboot and seeing if there are any error messages from the RAID card (I assume a PERC as the hardware is Dell).Do you have backups?I wouldn't reboot it yet but that's because I'm still not clear if there are other VM's running on this host that are working.Fair point. I was assuming that every VM on that host was down and showing inaccessible.If they are on local storage I'd like to see the storage configuration page of the idrac.I don't have much experience with iDRAC just yet. Give me a little time to look it up and I will drop a screenshot here when I have it.- Logon with user name and password - Storage - Physical disksa screenshot of that would help. Gary D Williams wrote:BirdLaw wrote:Again, everything that is powered on guest seems to be working correctly right now.hutchingsp wrote:Wow so running VMs but no datastores showing, no datastores can be added, and host hardware shows as healthy?I'll bow out on the Dell specifics but something sounds screwy!This can happen. If a VM is memory resident and there is an APD event it'll stay running in memory.
If the machine is very lightly used it can be like this for some hours but eventually it'll fall over.Oh damn.okay thanks for the hints and heads up, I'm logged into the iDrac but I don't see any options for storage. But even in this view we have Server Health with all green check marks for Batteries, Fans, Intrusion, Power Supplies, Temperatures, and voltages. There were some errors about a battery but that's from 2013, well before I started here and I'm not entirely sure if it was ever addressed or if the errors just stopped on their own.
If you need information that's been redacted, please let me know. I covered up some version information but overall it's ESXi 5.5.Also, I held off on rebooting for the moment until I have finished gathering information.
Hi,I have been trying to configure the Teredo Tunneling adapter, and I was able to route all my traffic with the following netsh command: ' add route::/0 interface=13' where 13 is the Teredo Tunneling Adapter.With that I am able to ping ipv6.google.com (which is the only server that I know that only respond to ipv6 traffic).The thing is when I reset the network adapter (or reboot) that route even though it exists (executing ' add route::/0 interface=13') it doesnt work. I have to remove it and then add it again. My teredo configuration is:Teredo Parameters - Type: client (Group Policy) Server Name: teredo.ipv6.microsoft.com Client Refresh Interval: 60 seconds Client Port: 34567 State: qualified Client Type: teredo host-specific relay Network: unmanaged NAT: restricted NAT Special Behaviour: UPNP: No, PortPreserving: Yes Local Mapping: 192.168.1.100:34567 External NAT Mapping: xxx.xxx.xx.xxx:34567It can be noted that I have forced through local policy the state as Qualified, so the Teredo adapter shouldnt go 'Dormant'.
Hi Federico,I'm experiencing the same problem. Have you found a way to solve that?Tried ' add route::/0 interface=15 validlifetime=infinite preferredlifetime=infinite store=persistent'Got ' Persistent aging routes are not supported. To create non-persistent aging routes, specify the active store.' But ' add route::/0 interface=15 store=persistent' got no error.The artical Nicholas provided said that infinite is the default value. Confused.I don't quite understand the meaning of validlifetime and preferredlifetime.Does it mean that an infinite route can't bepersistent?:-).