OLPC:Volunteer Infrastructure Group/2008-08-19-internal
Jump to navigation
Jump to search
<hhardy> hello all <hhardy> kimquirk: I don't see anyone here but us and a bunch of bots I think <reubencaron> hhardy: im here <hhardy> yay! <kimquirk> i'm here <isforinsects> Hello <m_stone> likewise, though I'm hoping to stay quiet... <dogi> hi hhardy <kimquirk> hhardy: do you have some agenda items you can post here? <hhardy> hi dogi please introduce yourself <hhardy> from italy? * adric (~adric@c-24-126-182-240.hsd1.ga.comcast.net) has joined #olpc-admin * hhardy has changed the topic to: sysadmin meeting agenda <dogi> i wos at in the mit museum 2 weeks ago <dogi> yes i m from italy <hhardy> dev home directory is 95% full <hhardy> status of weka, owl and swan <hhardy> status of volunteer infrastructure group (infrastructure-gang) <hhardy> how to host community XO resources such as large media files and file conversion utilities <hhardy> new business <hhardy> welcome <hhardy> this is a short meeting devoted to our internal IT issues >dogi< welcome :) <hhardy> the infrastructure group mtng is at 5:15 * adric is early. Shoudl I duvk out? <hhardy> since this is now an open channal is ok to lurk tho <adric> kk, ty <hhardy> we need to have a housecleaning on dev, /home is 95% full <kimquirk> hhardy: what's the plan for dev? <hhardy> a lot of it is backups of grinch which cjb and I made <hhardy> also there some individual user directories with millions of files <hhardy> cjb and I will clean up the backups <hhardy> I will email devel and ask people to clean up or tar and compress <hhardy> status of weka is unknown I've not received info from dgilmore <hhardy> I will have to look inot it will open ticket <hhardy> mstone said it isn't a blocker for now <hhardy> http://rt.laptop.org/Ticket/Display.html?id=19059 <m_stone> (we'll use the same infrastructure we used for the last release. it's not as nice as what dgilmore wanted to do, but it will let us scrape by) * hhardy has changed the topic to: status of weka, owl and swan <hhardy> ok * hhardy has changed the topic to: status of volunteer infrastructure group (infrastructure-gang) * gregdek (~gdk@wireless-nat-pool-rdu.redhat.com) has joined #olpc-admin <hhardy> we have a handful of people <hhardy> adric has expressed interest in helping maintain rt <hhardy> gregdek is looking for a few fedora stalwarts to assist <CanoeBerry_> we too interested in maintaining RT, but yes i've been slow to get up to speed :) <hhardy> christoph will run an article if we send him 300-500 words <hhardy> caryl and culseg and some of the support gang volunteers are being supportive to me <hhardy> noah would like to spearhead trac maintenance <hhardy> we will get into details at the 5:15 volunteer infrastructure gang mtng <hhardy> any questions or discussion? <isforinsects> I updated the wiki page with henry's draft to the devel list(s) <hhardy> yes thanks isfor <hhardy> and also for meeting transcripts <isforinsects> (http://wiki.laptop.org/go/OLPC:Volunteer_Infrastructure_Group) * hhardy has changed the topic to: status of volunteer infrastructure group (infrastructure-gang) * hhardy has changed the topic to: how to host community XO resources such as large media files and file conversion utilities <hhardy> sj was again pressing me that we need in his opinion a big fast machine purely dedicated to hosting community content <hhardy> the push back is this is a thing that maybe could be provided by the community if we ask <hhardy> disussion? <hhardy> discussion, rather? <isforinsects> This would be helpful for a few ongoing projects of mine and in the community to scrape / process public databases of medical information <isforinsects> cjl's medline plus project would likely be helped by it, he has been slowed down by the need to create a *nix box. <cjl> true <isforinsects> And I've found a repository of ~51GB of medical images that I can migrate over to wiki commons, and would likely provide a great medical image database for future health content of any kind. <hhardy> we do seem to have a number of people using our computers to produce videos <hhardy> and doing format conversion of video does use a lot of processor <hhardy> doing chucks video took me an hour on my workstation <hhardy> which is a dual core intel box <hhardy> that was for 4 minutes of video <isforinsects> I have ~1500 audio books that could be processed that I can't get archive.org to do for us. <hhardy> we can get a workstation like the one that I got for cjb for < $1000 <hhardy> I asked sj to submit a ticket and explain what the business requirements are <hhardy> although to be fair he had a trac ticket about this way before I was hired <hhardy> we will probably have more requirements for making multimedia presentations as we run up to G1G1 2.0 <hhardy> does SJ have a budget for community content? <hhardy> I have owl booted up using the crank files and os by hand-remounting and chrooting everything <hhardy> I could use some help with fixing things so we could automagically fail over to it <hhardy> I will see if we can find a community person to assist <hhardy> it would be ideal if they could come onsite since we will need to be rebooting and if things are wrong it wont come back on the net necessarily <hhardy> swan is up with the shadow of pedal on it but I havent been messing with auto-remounting and booting from it <hhardy> regarding weka I am questioning if we should reinstall it as a fedora 9 box, this would be pretty straight-forward but we need to determine if we would be losing any useful work which dgilmore may have done <hhardy> I dont see the logic of doing fedora9 builds in ubuntu * hhardy has changed the topic to: any new business? <nteon> hhardy: status of mud.laptop.org? :) <cjl> hhardy What do you have in mind for failover cluster? Heartbeat? <hhardy> I had in mind a manual failover <hhardy> but doing it automatically would be nice <cjl> maybe look at http://fedoraproject.org/wiki/Heartbeat <hhardy> lets pursue this in the 5:15 <hhardy> bookmarking that <cjl> I have an oracle failsafe cluster on two very impt machines, helps me sleep easier. <gregdek> What meeting is this, btw? <hhardy> fedora has a lot of infrastructure like fas we could leverage if we move to fedora as our base server os over time <cjl> sorry <hhardy> this here is the internal regular sysadmin meeting <gregdek> :) <gregdek> Okeydoke. <hhardy> intrnal but open <hhardy> as of last week <gregdek> Heh. <m_stone> hhardy: it has always been open. <m_stone> hhardy: we've just never invited people before. <hhardy> yes * hhardy has changed the topic to: new business if any <hhardy> heartbeat looks cool I like it <hhardy> any new discussion? * cjb (~cjb@pullcord.laptop.org) has joined #olpc-admin <hhardy> I'm not sure if we can use arp to switch from the w91 subnet to 1cc tho will have to investigate <hhardy> cjb discussing http://fedoraproject.org/wiki/Heartbeat * cjl thought it would be of interest, will hold further comment for 45 min :-) <hhardy> if they are over at w91 as we have discussed should be ok <hhardy> he ok we are about done here isfor can you send/post transcripts? perchance? <cjb> cjl: I think it works for when you have primary machines and backup machines and want to fail over between them. <cjb> cjl: We don't have any backup machines. <hhardy> in theory we have owl for crank and swan for pedal <kimquirk> so... what is the plan for owl and swan for getting them booting properly? <cjb> hhardy: We have backup hardware, yes. <kimquirk> do you need people help? <hhardy> I was saying I have owl running chrooted to the dev files whicha re there <hhardy> I will see if cjl and others want to pitch in <hhardy> to make this a real high availability solution rather than my bubblegum and duct tape solution <kimquirk> sounds good. i like the fail-over, heartbeat solution <hhardy> cjl are you near boston? <cjb> hhardy: are you talking about heartbeat, or getting the new hardware up? <cjb> (with the "to make this a.." part) <cjl> no, I would suggest reaching out to http://www.linux-ha.org/ContactUs <hhardy> bookmarked and cool <hhardy> we can do it with me as remote hands <hhardy> saturday? <cjb> I'm really confused. <cjb> What is being set up? Why would we set up a HA infrastructure when we don't have any spare machines? Shouldn't we instead set up the spare machines? <hhardy> owl is a spare machine for crank and swan is a spare machine for pedal <cjb> Right, and they're not set up yet. <hhardy> in what respect? <cjb> Are you talking about setting them up on saturday, or setting heartbeat up on saturday? <cjl> hhardy, I must admit I don't know Heartbeat in detail, I run an active:active cluster using other tools (not linux), it works really nicely. Two machines with their own jobs prepared t okeep things running (slower) if one fails. <cjb> hhardy: In the respect that they don't boot? <hhardy> cjb: the former first <hhardy> swan boots fine, I munged the boot loader on owl however <hhardy> I can certainly reinstall ubuntu on it in the root partition I made without losing the mirror of crank <kimquirk> is the requirement that swan is a fail-over mirror of pedal? <kimquirk> that everythign that is written to pedal is also written to swan? <hhardy> crank aka dev -> owl <kimquirk> and if pedal dies, then swan takes over? <hhardy> pedal aka mail -> swan <cjb> cool, okay. I'd like us to spend a few weeks understanding how to keep the machines synchronized and if there's anything to be aware of before we start switching over hardware on the fly. <hhardy> thats the suggestion of the HA solution <hhardy> cjb: agree <m_stone> ... <cjb> HA isn't something you just install; you first design your servers and database such that they *can* be swapped over. We haven't started that yet. <m_stone> can someone please summarize why we aren't sticking to the original plan here? <m_stone> cjb++ <hhardy> m_stone in what respect? <m_stone> the original plan is a simple single snapshot backup so that we feel comfortable applying some of the numerous package updates that have been queued up for months on crank and pedal <hhardy> that has been done <m_stone> no automated failover <hhardy> we are talking about next steps now <m_stone> and have crank and pedal been updated? <m_stone> hhardy: you said that your backups don't boot yet. <m_stone> hhardy: (that one boots with some manual jiggery-pokery) <m_stone> that's not a finished backup <hhardy> that's true because boot loader, fstab, netconfig needs to change <cjb> I think the next steps are: <cjb> * make owl and swan able to boot reliably <hhardy> right now it is a rsync image of them <cjb> * make them run on their own IPs <cjb> * sync up the rsync again <cjb> * run something to replicate the databases on the primary machines with the secondary <cjb> * perform some manual failovers during a preannounced outage period and test the result very carefully <cjb> * begin to consider a HA solution to automate the same. <m_stone> that sounds close to right to me, but it doesn't address the package updates which were one of the major points of this exercise. <m_stone> also, we don't seem to be proceeding according to that plan <m_stone> (or rather, we're seriously discussing diverging from it) <hhardy> updating http://rt.laptop.org/Ticket/Display.html?id=19035 per cjb's list <cjb> m_stone: yes, I don't understand why either. <hhardy> in what respect are we not following plan? <m_stone> work on database replication and HA are not needed to attempt to perform some package updates. <hhardy> correct <m_stone> they are simply interesting future work. <hhardy> right <m_stone> hhardy: does that answer your question? <hhardy> no <hhardy> seems like we are talking about finishing the plan then taking next steps <m_stone> hhardy: what you described sounded (and continues to sound) like an entirely different plan to me. <m_stone> (and I think also to cjb?) <hhardy> no, it isn't <m_stone> ah. perhaps you can adjust your description so that I understand how it encompasses the original plan I described? <hhardy> apt-get -s update on pedal shows 0 packages needing update now <cjb> hhardy: It doesn't here. <hhardy> hmmm <cjb> Are you running update, or upgrade? Update doesn't do anything except refresh the package list. <hhardy> ah yes <hhardy> well we can upgrade on pedal any time according to the "original plan" as stated above then <hhardy> since swan boots and it wont take long to re-rsync it <m_stone> hhardy: schedule downtime first. <hhardy> if I reinstall the boot and root on owl it will be at the same state <m_stone> hhardy: and we need to test the manual failover first as well. <m_stone> as cjb stated. <hhardy> having the rsync image is one thing, making the stuff bootable is where I could use some help <m_stone> okay. we'll see what we can do. <m_stone> and perhaps someone who visits us in 20 minutes will also be able to assist, so long as we can clearly state what we want. <hhardy> yes <cjb> ooh, who's that? <m_stone> (or who's present already?) <m_stone> cjb: it's a wish, not a promise! <hhardy> maybe cjl by the sound of it <cjb> ah, Blake, perhaps? <hhardy> ah blake yay! <m_stone> cjb: he probably good if we asked him... <m_stone> *could <hhardy> splendid <cjb> yes, I'd be very happy with having Blake help <m_stone> I hadn't thought of it. <m_stone> do we mean the same Blake? <hhardy> do we? <m_stone> i.e. Blaketh? <cjb> yes <hhardy> your friend from nyc? * cjb knows no other Blakes. <hhardy> yes <m_stone> ah. well, we could certainly ask. I wasn't thinking of him in particular. <cjb> m_stone: is someone else visiting us in 20 minutes <cjb> ? <hhardy> more like he could help if its of interest <cjb> It was a guess based on the likelihood of who might be visiting us. <hhardy> we have the volunteer infra group mtng in 20 min <m_stone> cjb: no, I was expressing hope that someone _might_ visit us. <cjb> ohh :) <cjb> okay. I think this requires 1cc presence. <hhardy> I dont want to hijack your friends <cjb> at least until the machine boots. <hhardy> or at least me here as remote hands yes <m_stone> cjb: sure. <m_stone> cjb: I'll pass along your interest, though. :) <hhardy> cool ok break till 5:15 then for those returning else thanks for coming!