mnagel

datasource migration function

Recommended Posts

I have run into too many cases now where a new but slightly different DS is setup due to LM support actions, upgrades, etc. and the result is lost data or noncontinuous data.  A good example I recently encountered is with NTP.  The standard DS was not working in all cases.  I was given a new DS that uses Groovy, and it works (which I appreciate!).  But the datapoint list and names have changed, and even if they had not, there is no way to maintain data history from the old DS to the new DS.  My recommendation is to add a migrate function so you can indicate how to map old to new datapoints in such a situation and thus avoid data loss.  Building in a default migration ruleset into a new DS would be a bonus -- this could allow for zero-touch data migrations in at least some cases.

Thanks,
Mark

 

  • Upvote 5

Share this post


Link to post
Share on other sites

Two things:

1.  I'd greatly appreciate it if you could share that datasource.  Is this the one in the official repository?

2.  I largely agree with your point that it's not always obvious when a datasource change or update is going to cause data loss - a pain I've experienced a few too many times.  Even when updating official datasources, it's a risk due to the custom applies-to functions we might be using.  It would be great if there was at least some logic that allowed the import of a datasource, but allowing the administrator to choose to override the applies-to function or not.  Or maybe even (for advanced users) make manual changes to the XML doc before importing to prevent datapoint renaming. 

Share this post


Link to post
Share on other sites

I agree too that updating DataSources presents a data loss risk.  I've stopped updating DataSources now, unless we find a bug.

Share this post


Link to post
Share on other sites

@Brandon I am not sure if the fixed NTP DS is the one now in the repo, I was given it by support after I found the original one broken on various devices.

@Mosh Yeah, I have been bitten by the DS import many times.  A big problem is the difference display (see recent FR I posted on that).  A simple first pass on this would be to prevent importing a replacement DS if any datapoints would be removed without a force indicator.  That is really a no-brainer and would avoid many problems.  My recommendation earlier is how to handle transformation of one DS to a new DS when the datapoints are basically the same from one to the other, but renamed/reshuffled.  In my experience, there is not nearly enough attention to this sort of thing, which is unfortunate, as data loss is a really bad result in a monitoring solution.

Share this post


Link to post
Share on other sites

This is another area in LM long overdue for improvements and it just came up again due to the new "Gen 3" VMware datasources.  In some cases, "Gen 2" datasources were split into multiple new datasources (a bit trickier to deal with), but in other casses the name was changed and either nothing else changed, or perhaps some datapoints changed.  Changing the name of a datasource causes all of the following issues, probably others I am not thinking of:

* historical data loss
* reference breakage (widgets, alert rules, etc.)
* instance tuning loss (custom thresholds, instance descriptions, group tags, etc.)

It is understandable that the naming change clearly makes the DS set "go together", but the benefit of that is far lower than the problems created.  If there was a method to upgrade/migrate the existing datasources so the new ones take effect without breaking stuff as I asked for in Aug 2017, that would indicate the developers understand this system is for monitoring and should not be arbitrarily broken for aesthetic reasons like in this case.  I was advised by support that the solution is just "don't use the new datasources".  Surely this is not a suitable answer?  I have been told there is work being done on this front, but it is unclear what will come of it and when.

Share this post


Link to post
Share on other sites

Hi, 

Monitoring Engineering team lead here.

I wrote these and would like to provide some insight into why the VMware Gen3 datasources are the way they are. We introduced a slew of history breaking changes which run a lot deeper than aesthetics for many reasons. Including better alerting (e.g. ESX datasources only alert when a server is in standalone mode), new data points (status data points for everything, from VM's to Datastores, we added support for relaying vCenter alerts) and a easing the collector's load on the VMware API. And yes, aesthetics played a part, I think these look nicer, but I'm biased.

In the end I made the call to split and rename the datasources for the following reasons.

  • vCenter and ESXi are different products, they expose similar API's but they react to it in different ways. We want the flexibility to extract different information from both these products independently.
  • Improved monitoring now > keeping history. It's always a balance but in this case new features justified it. 
  • We renamed it in such a way customers could choose to keep the old ones without us removing some data points. When doing destructive operations it's always better to give you the power to decide if/when to do it.
  • These datasources pave the way for other features like topologies and such.
  • Some datasources change AD behavior completely (HW sensors) causing complete loss of history.

That being said, we do need better module migration tools to make these changes less painful. We are keenly aware of it and it’s actively being worked on.

Regards,

Share this post


Link to post
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now