Opened 9 months ago

Last modified 2 months ago

#85 reopened Service Improvement

Add error recovery to NightlySnapshots.

Reported by: D Delmar Davis Owned by: D Delmar Davis
Priority: Priority Milestone: Make Shit Happen / Own Your Shit.
Component: Development Keywords: Python lxd backups
Cc: Joe Dumoulin

Description

Because LXD has a mind of its own (grumble grumble) sometimes it puts zfs in a funny state.
The way the scripts are currently written any failure results in the remaining containers not being backed up.

A litte
try:

exception:

Is in order here.

Change History (8)

comment:1 Changed 9 months ago by D Delmar Davis

LXDAPIException: Create instance snapshot (mount source): Failed to run: zfs mount infra/containers/naomi: cannot mount 'infra/containers/naomi': filesystem already mounted

It is completely unintuitive that the way you resolve this issue is to mount the filesystem and take a snapshot.

root@kb2018:/etc/ansible# zfs mount infra/containers/naomi
root@kb2018:/etc/ansible# lxc snapshot naomi

comment:2 Changed 9 months ago by D Delmar Davis

Fixed the zfs/lxd container side.
When we have a successful nightly backup I will pull the code changes from the repo.

comment:3 Changed 7 months ago by D Delmar Davis

Resolution: Some Other Time
Status: assignedclosed

The code is written but it would be easier to debug if there were actual errors.

comment:4 Changed 4 months ago by D Delmar Davis

Resolution: Some Other Time
Status: closedreopened

Now that we have a pooched container we can test this.

comment:5 Changed 4 months ago by D Delmar Davis

This is done and the ticket to follow up on it can be made an issue no the bitbucket repository.

comment:6 Changed 4 months ago by D Delmar Davis

Resolution: Done
Status: reopenedclosed

comment:7 Changed 2 months ago by D Delmar Davis

Resolution: Done
Status: closedreopened

Given atlasians complete lack of responsibility regarding its confluence and cloud clients (down for weeks without any data recovery). I am thinking bitbucket is not where we should put our issues.

I need to figure out something "on prem" and more useful than our current ticketing.

comment:8 Changed 2 months ago by D Delmar Davis

Also the backup script needs to delete the Spares after insuring that at least one copy is there.

Note: See TracTickets for help on using tickets.