OLPC:Volunteer Infrastructure Group/2008-08-19-internal

From OLPC
Jump to navigation Jump to search
<hhardy> hello all
<hhardy> kimquirk: I don't see anyone here but us and a bunch of bots I think
<reubencaron> hhardy: im here
<hhardy> yay!
<kimquirk> i'm here
<isforinsects> Hello
<m_stone> likewise, though I'm hoping to stay quiet...
<dogi> hi hhardy
<kimquirk> hhardy: do you have some agenda items you can post here?
<hhardy> hi dogi please introduce yourself
<hhardy> from italy?
* adric (~adric@c-24-126-182-240.hsd1.ga.comcast.net) has joined #olpc-admin
* hhardy has changed the topic to: sysadmin meeting agenda
<dogi> i wos at in the mit museum 2 weeks ago
<dogi> yes i m from italy
<hhardy> dev home directory is 95% full
<hhardy> status of weka, owl and swan
<hhardy> status of volunteer infrastructure group (infrastructure-gang)
<hhardy> how to host community XO resources such as large media files and file conversion utilities
<hhardy> new business
<hhardy> welcome 
<hhardy> this is a short meeting devoted to our internal IT issues
>dogi< welcome :)
<hhardy> the infrastructure group mtng is at 5:15
* adric is early. Shoudl I duvk out?
<hhardy> since this is now an open channal is ok to lurk tho
<adric> kk, ty
<hhardy> we need to have a housecleaning on dev, /home is 95% full
<kimquirk> hhardy: what's the plan for dev?
<hhardy> a lot of it is backups of grinch which cjb and I made
<hhardy> also there some individual user directories with millions of files
<hhardy> cjb and I will clean up the backups
<hhardy> I will email devel and ask people to clean up or tar and compress
<hhardy> status of weka is unknown I've not received info from dgilmore
<hhardy> I will have to look inot it will open ticket
<hhardy> mstone said it isn't a blocker for now
<hhardy> http://rt.laptop.org/Ticket/Display.html?id=19059
<m_stone> (we'll use the same infrastructure we used for the last release. it's not as nice as what dgilmore wanted to do, but it will let us scrape by)
* hhardy has changed the topic to: status of weka, owl and swan
<hhardy> ok
* hhardy has changed the topic to: status of volunteer infrastructure group (infrastructure-gang)
* gregdek (~gdk@wireless-nat-pool-rdu.redhat.com) has joined #olpc-admin
<hhardy> we have a handful of people
<hhardy> adric has expressed interest in helping maintain rt
<hhardy> gregdek is looking for a few fedora stalwarts to assist
<CanoeBerry_> we too interested in maintaining RT, but yes i've been slow to get up to speed :)
<hhardy> christoph will run an article if we send him 300-500 words
<hhardy> caryl and culseg and some of the support gang volunteers are being supportive to me
<hhardy> noah would like to spearhead trac maintenance
<hhardy> we will get into details at the 5:15 volunteer infrastructure gang mtng
<hhardy> any questions or discussion?
<isforinsects> I updated the wiki page with henry's draft to the devel list(s)
<hhardy> yes thanks isfor
<hhardy> and also for meeting transcripts
<isforinsects> (http://wiki.laptop.org/go/OLPC:Volunteer_Infrastructure_Group)
* hhardy has changed the topic to: status of volunteer infrastructure group (infrastructure-gang)
* hhardy has changed the topic to: how to host community XO resources such as large media files and file conversion utilities
<hhardy> sj was again pressing me that we need in his opinion a big fast machine purely dedicated to hosting community content
<hhardy> the push back is this is a thing that maybe could be provided by the community if we ask
<hhardy> disussion?
<hhardy> discussion, rather?
<isforinsects> This would be helpful for a few ongoing projects of mine and in the community to scrape / process public databases of medical information
<isforinsects> cjl's medline plus project would likely be helped by it, he has been slowed down by the need to create a *nix box.
<cjl> true
<isforinsects> And I've found a repository of ~51GB of medical images that I can migrate over to wiki commons, and would likely provide a great medical image database for future health content of any kind.
<hhardy> we do seem to have a number of people using our computers to produce videos 
<hhardy> and doing format conversion of video does use a lot of processor
<hhardy> doing chucks video took me an hour on my workstation
<hhardy> which is a dual core intel box
<hhardy> that was for 4 minutes of video
<isforinsects> I have ~1500 audio books that could be processed that I can't get archive.org to do for us.
<hhardy> we can get a workstation like the one that I got for cjb for < $1000
<hhardy> I asked sj to submit a ticket and explain what the business requirements are
<hhardy> although to be fair he had a trac ticket about this way before I was hired
<hhardy> we will probably have more requirements for making multimedia presentations as we run up to G1G1 2.0
<hhardy> does SJ have a budget for community content?
<hhardy> I have owl booted up using the crank files and os by hand-remounting and chrooting everything
<hhardy> I could use some help with fixing things so we could automagically fail over to it
<hhardy> I will see if we can find a community person to assist
<hhardy> it would be ideal if they could come onsite since we will need to be rebooting and if things are wrong it wont come back on the net necessarily
<hhardy> swan is up with the shadow of pedal on it but I havent been messing with auto-remounting and booting from it
<hhardy> regarding weka I am questioning if we should reinstall it as a fedora 9 box, this would be pretty straight-forward but we need to determine if we would be losing any useful work which dgilmore may have done
<hhardy> I dont see the logic of doing fedora9 builds in ubuntu
* hhardy has changed the topic to: any new business?
<nteon> hhardy: status of mud.laptop.org? :)
<cjl> hhardy  What do you have in mind for failover cluster? Heartbeat?
<hhardy> I had in mind a manual failover
<hhardy> but doing it automatically would be nice
<cjl> maybe look at http://fedoraproject.org/wiki/Heartbeat
<hhardy> lets pursue this in the 5:15
<hhardy> bookmarking that
<cjl> I have an oracle failsafe cluster on two very impt machines, helps me sleep easier.
<gregdek> What meeting is this, btw?
<hhardy> fedora has a lot of infrastructure like fas we could leverage if we move to fedora as our base server os over time
<cjl> sorry
<hhardy> this here is the internal regular sysadmin meeting
<gregdek> :)
<gregdek> Okeydoke.
<hhardy> intrnal but open
<hhardy> as of last week
<gregdek> Heh.
<m_stone> hhardy: it has always been open.
<m_stone> hhardy: we've just never invited people before.
<hhardy> yes
* hhardy has changed the topic to: new business if any
<hhardy> heartbeat looks cool I like it
<hhardy> any new discussion?
* cjb (~cjb@pullcord.laptop.org) has joined #olpc-admin
<hhardy> I'm not sure if we can use arp to switch from the w91 subnet to 1cc tho will have to investigate
<hhardy> cjb discussing http://fedoraproject.org/wiki/Heartbeat
* cjl thought it would be of interest, will hold further comment for 45 min :-)
<hhardy> if they are over at w91 as we have discussed should be ok
<hhardy> he  ok we are about done here isfor can you send/post transcripts? perchance?
<cjb> cjl: I think it works for when you have primary machines and backup machines and want to fail over between them.
<cjb> cjl: We don't have any backup machines.
<hhardy> in theory we have owl for crank and swan for pedal
<kimquirk> so... what is the plan for owl and swan for getting them booting properly?
<cjb> hhardy: We have backup hardware, yes.
<kimquirk> do you need people help?
<hhardy> I was saying I have owl running chrooted to the dev files whicha re there
<hhardy> I will see if cjl and others want to pitch in
<hhardy> to make this a real high availability solution rather than my bubblegum and duct tape solution
<kimquirk> sounds good. i like the fail-over, heartbeat solution
<hhardy> cjl are you near boston?
<cjb> hhardy: are you talking about heartbeat, or getting the new hardware up?
<cjb> (with the "to make this a.." part)
<cjl> no, I would suggest reaching out to http://www.linux-ha.org/ContactUs
<hhardy> bookmarked and cool
<hhardy> we can do it with me as remote hands
<hhardy> saturday?
<cjb> I'm really confused.
<cjb> What is being set up?  Why would we set up a HA infrastructure when we don't have any spare machines?  Shouldn't we instead set up the spare machines?
<hhardy> owl is a spare machine for crank and swan is a spare machine for pedal
<cjb> Right, and they're not set up yet.
<hhardy> in what respect?
<cjb> Are you talking about setting them up on saturday, or setting heartbeat up on saturday?
<cjl> hhardy, I must admit I don't know Heartbeat in detail, I run an active:active cluster using other tools (not linux), it works really nicely.  Two machines with their own jobs prepared t okeep things running (slower) if one fails.
<cjb> hhardy: In the respect that they don't boot?
<hhardy> cjb: the former first
<hhardy> swan boots fine, I munged the boot loader on owl however
<hhardy> I can certainly reinstall ubuntu on it in the root partition I made without losing the mirror of crank
<kimquirk> is the requirement that swan is a fail-over mirror of pedal?
<kimquirk> that everythign that is written to pedal is also written to swan?
<hhardy> crank aka dev -> owl
<kimquirk> and if pedal dies, then swan takes over?
<hhardy> pedal aka mail -> swan
<cjb> cool, okay.  I'd like us to spend a few weeks understanding how to keep the machines synchronized and if there's anything to be aware of before we start switching over hardware on the fly.
<hhardy> thats the suggestion of the HA solution
<hhardy> cjb: agree
<m_stone> ...
<cjb> HA isn't something you just install; you first design your servers and database such that they *can* be swapped over.  We haven't started that yet.
<m_stone> can someone please summarize why we aren't sticking to the original plan here?
<m_stone> cjb++
<hhardy> m_stone in what respect?
<m_stone> the original plan is a simple single snapshot backup so that we feel comfortable applying some of the numerous package updates that have been queued up for months on crank and pedal
<hhardy> that has been done
<m_stone> no automated failover
<hhardy> we are talking about next steps now
<m_stone> and have crank and pedal been updated?
<m_stone> hhardy: you said that your backups don't boot yet.
<m_stone> hhardy: (that one boots with some manual jiggery-pokery)
<m_stone> that's not a finished backup
<hhardy> that's true because boot loader, fstab, netconfig needs to change
<cjb> I think the next steps are:
<cjb>  * make owl and swan able to boot reliably
<hhardy> right now it is a rsync image of them
<cjb>  * make them run on their own IPs
<cjb>  * sync up the rsync again
<cjb>  * run something to replicate the databases on the primary machines with the secondary
<cjb>  * perform some manual failovers during a preannounced outage period and test the result very carefully
<cjb>  * begin to consider a HA solution to automate the same.
<m_stone> that sounds close to right to me, but it doesn't address the package updates which were one of the major points of this exercise.
<m_stone> also, we don't seem to be proceeding according to that plan
<m_stone> (or rather, we're seriously discussing diverging from it)
<hhardy> updating http://rt.laptop.org/Ticket/Display.html?id=19035 per cjb's list
<cjb> m_stone: yes, I don't understand why either.
<hhardy> in what respect are we not following plan?
<m_stone> work on database replication and HA are not needed to attempt to perform some package updates.
<hhardy> correct
<m_stone> they are simply interesting future work.
<hhardy> right
<m_stone> hhardy: does that answer your question?
<hhardy> no
<hhardy> seems like we are talking about finishing the plan then taking next steps
<m_stone> hhardy: what you described sounded (and continues to sound) like an entirely different plan to me.
<m_stone> (and I think also to cjb?)
<hhardy> no, it isn't
<m_stone> ah. perhaps you can adjust your description so that I understand how it encompasses the original plan I described?
<hhardy> apt-get -s update on pedal shows 0 packages needing update now
<cjb> hhardy: It doesn't here.
<hhardy> hmmm
<cjb> Are you running update, or upgrade?  Update doesn't do anything except refresh the package list.
<hhardy> ah yes
<hhardy> well we can upgrade on pedal any time according to the "original plan" as stated above then
<hhardy> since swan boots and it wont take long to re-rsync it
<m_stone> hhardy: schedule downtime first.
<hhardy> if I reinstall the boot and root on owl it will be at the same state
<m_stone> hhardy: and we need to test the manual failover first as well.
<m_stone> as cjb stated.
<hhardy> having the rsync image is one thing, making the stuff bootable is where I could use some help
<m_stone> okay. we'll see what we can do.
<m_stone> and perhaps someone who visits us in 20 minutes will also be able to assist, so long as we can clearly state what we want.
<hhardy> yes
<cjb> ooh, who's that?
<m_stone> (or who's present already?)
<m_stone> cjb: it's a wish, not a promise!
<hhardy> maybe cjl by the sound of it
<cjb> ah, Blake, perhaps?
<hhardy> ah blake yay!
<m_stone> cjb: he probably good if we asked him...
<m_stone> *could
<hhardy> splendid
<cjb> yes, I'd be very happy with having Blake help
<m_stone> I hadn't thought of it.
<m_stone> do we mean the same Blake?
<hhardy> do we?
<m_stone> i.e. Blaketh?
<cjb> yes
<hhardy> your friend from nyc?
* cjb knows no other Blakes.
<hhardy> yes
<m_stone> ah. well, we could certainly ask. I wasn't thinking of him in particular.
<cjb> m_stone: is someone else visiting us in 20 minutes
<cjb> ?
<hhardy> more like he could help if its of interest
<cjb> It was a guess based on the likelihood of who might be visiting us.
<hhardy> we have the volunteer infra group mtng in 20 min
<m_stone> cjb: no, I was expressing hope that someone _might_ visit us.
<cjb> ohh :)
<cjb> okay.  I think this requires 1cc presence.
<hhardy> I dont want to hijack your friends
<cjb> at least until the machine boots.
<hhardy> or at least me here as remote hands yes
<m_stone> cjb: sure.
<m_stone> cjb: I'll pass along your interest, though. :)
<hhardy> cool ok break till 5:15 then for those returning else thanks for coming!