Minutes
DRAFT minutes from the LXCERT meeting 04.05.04
Agenda:
- ORACLE on Linux (J.Shiers, IT-DB)
- Status updates and outstanding problems
- end date for the certification
- AOB
Present:
- Jamie Shiers
- Nicholas de Metz-Noblat
- Alastair Bland
- Benigno Gobbo (video)
- Thorsten Kleinwort
- Marco Cattaneo
- Helge Meinhard
- Jarek Polok
- Massimo Lammana
- Alberto Aimar
- Marc Dobson (for Bruce Barnett)
- Peter Sherwood
- Eric Cano
Excused
- Stephan Wynhoff: represented also by Eric Cano.
"ORACLE on Linux" presentation by J.Shiers (slides)
1st slide: ---------- gcc-3.2.3 ORACLE client available on RHE10 (not yet on AFS) * regular procedure for releasing client libraries (previous freq: 18month, sync with server == useless). ORACLE Can now rebuild rapidly. * should include all header files and complete development environment * should be available in ORACLE CERN tree (even for outside institutes), as RPM (but could go into SPI tree as well) old AFS tree is not flexible enough (compiler/OS), possibly need something different Too little time since Friday (release date) to evaluate. Q: RPM is standard solution for everybody, would this be OK for IT-DB? A: Yes, but needs testing (premature) 10g should interoperate with old servers, need testing, but ORACLE claim OK. "Thin client", expect little trouble (no tons of NLS support etc) Outlook: ------- * good contacts since 99: * get immediate and longer term results ("silent install" improvements) * no more 16bit limits (Exabyte) * IEEE float and double without additional storage blowup in DB * "easy install" 10g1 not in production 10g release2 to be announced in October (avail Q3 2005). Will probably use that for the initial data exports in LHC (no other release foreseen in that timeframe) * next meeting June, will stress ORA for timely client releases. CERN: Unusual customer (continuous development, don't freeze & ship) 3rd party products * no support for "tora", perl-DBD-ORACLE (neither expertise nor staff to support). 10% cut in staff, will not take up new things now. still recognize this as a support issue. -+-+-+- Benigno: basically need the client libraries. If this works with the servers, can start certification. Good news. Need to wait and see. When would these libs be available A: Now. NMN: need to have them available on 1 machine only, not generally. Using Pro-C, is it available? A: Pro-C not there yet NMN: linking works, but doesn't run (gcc296, gcc323 both link, first embedded SQL call gets error after getting proper read). A: should be available immediatly for testing on AFS, can check whether this Pro-C release works. Q: needs to be very clear what is available on which platform (review directory structure, or README.) Sysadmin point of view (support requests bouncing etc) Q(JI): can we install ORACLE via RPMs? A: discussing now (should be over this week) Agreement: certification time, don't need to do it "right" now. Can standarize in 2nd stage. Q(Jarek): license, redistribution for other sites? A: technet (OTN) general purpose license: otn.oracle.com/tech/oci/occi/occidownloads.html - not useable for production - condition: "not being a terrorist" A: instant client license: useable for production A: separate licenese agreement, prefer to use that one (CERN research, can use anywhere). Don't ask ORACLE to redistribute. A: depends where CERN distro is going to, collab institites JI: needs to go to separate directory. (NNM: go to PostgresSQL :-)) ORACLE may do GOOGLE search to validate license terms (like no benchmarks publishing..) current AFS installation is not protected (only thing preventing copying is CERN computing rules). Should not be worse.. Propose: quick install now, set out NMN happy to start testing now even if full development kit is not available yet. Expect both RHE3 and CEL3 to be available. NMN: ORACLE Application servers with CEL3 from IT? -> taper off.
Status updates and ongoing issues
Benigno Gobbo (non-LHC experiments):
Comment: received no answers for status requests (no interest?) from some LEP experiments, even if they should show an interest (ORACLE).
COMPASS: require ORACLE client as hard dependency. If this issue is now resolved, they can start to test, everything else should be there (ROOT). No forecast on time (Data Acquisition started, long DA period, means little manpower (= internal problem)). No additional blockings dependencies from COMPASS known for now.
DEPLHI: need CERNLIB (now there). Should start to test (for production SW, testing method is unclear).
Peter Sherwood (ATLAS-offline)
(fresh on the job, tries to find out issues → Offline discussion.)
Alastair Bland, Nicolas de Metz-Noblat (AB-CO)
issues with RH AS 2.1 + RHE3:
- installation: QUATTOR is not end-user-friendly, difficulties in deployment (need effort). No problems except missing docs.
- porting:
- ORACLE, embedded SQL: runtime broken.May be solved now
C++ specific: gcc-3 definition, OSstring handling different (open FD reassignment disappears). Need C++ expert, may need to rewrite parts of the software. (3.3.1 in Fedora is still more unfriendly), 3.2.3 has at least --no-deprecated. Basic tests (networking) successful. Modification means loosing backward compatibility... but could move as soon as ORACLE stuff is cleared. Most of the code from 7.3.4 seems to work
db1, db1-devel are missing (NDBM format), required for development. May be solved by importing + recompiling sources into a static lib.
- Showstoppers: cannot deploy large-scale before accelerators are stopped == November
- can actually run 7.3.4 binaries (warning about errno). Can run AB/CO "Console" on CEL3 already now.
- AB/CO development environment is more problematic (lots of developers, dependencies are hard to find out)
- Alastair: switching from Windows to Linux consoles, (sh|c)ould go directly to CEL3.
- no BLOCKing issues for now.
Thorsten Kleinwort (IT-FIO)
- minor things:
- time sync (NTP instead of AFS), could use more generally. NMN is interested, discrepancy between AFS and NTP (AFS needs to be kept in sync). Option is already there, and more usable (no jumps in time, no backward setting that can screw "make" and "cron")
FVWM2 gone: LXPLUS is used via Exceed, X-terminal. Concerned about network traffic (JP: explain DTF decision). NMN: mwm as "default" default?
Jarek: user education, don't run full window manager
MC: concerned about new helpdesk being able to answers 2000 calls, after switching. Train. Compare with amount of work to keep fvwm2.
MC: helpdesk needs to be aware, common agreement.
NMN: would need UCO -RESET. People are unable to change user environments copied over from somebody else.
JP: reminder: DTF decision, don't go back too often.
NMN: (example:) still recompiling old editor, less work than to re-educate users.
- CASTOR server (Thorsten): not responsible!!!
- LSF 5.1 client works
- Kickstart/automatic install work.
expect no major problems, just porting SUE → NCM. Should be ready Mid-May (but may be slipping). No request from experiment yet to run CEL3.
JP: faster machine for new LXPLUS? A: Not planned.
Marco Cattaneo (LHCb)
- analysis + reconstruction software: nothing certified yet, DataChallenge starts tomorrow, so little manpower available. LHCB is already using gcc-3.2.3, so they are not foreseeing any application software problems once the LCG libraries are there.
- production software, mostly python code. Not being looked at (for next month), could have some surprises (which would be late). Only J.Closier is in a position do validate this, and he will be busy (manpower issue). Need to move to LCG2, whatever they support. At the PEB, the problem of binary compat was raised, some confusion since CERN is part of LCG2 but runtime requirements may differ.
[short discussion on CEL3, RHE3.]
MC summarizes: software will have to work on the GRID (across platforms), various releases may need to be supported.
Timelines:
"rubberstamp" for Application: end of May.
Production: end of June.
Deployment for apps: autumn (not in middle of a DC)
Helge Meinhard
nothing from CLUG
Eric Cano (CMS-online and also representing CMS-offline)
CMS-offline: require 1 month (after SEAL).JP/AA: SEAL came out 3 weeks ago, Stephan was informed..
Q: can we have a newer user environment (KDE, GNOME,
mozilla-1.6) (minor issue only)?
discussion: would like to keep as close to RHEL as possible, so
anything with dependencies to it (like KDE/GNOME) should stay.
JI: reject KDE/GNOME. Mozilla is already newer than RHEL.
Q: Can JDK be included in distribution?
JP: will deploy JDK, even without formal recommendation from the Java
workign group.
AB: JAWS contains JDK, could have been easier to take their version?
JP: need (probably) 1.5 for the 64bit architectures.
MC: Xerces available? AA: is in the SPI tree, version 2.3.0
CMS-online:
bigphys is in kernel, OK.
kernel API instability (thanks to Red Hat backports), but can cope
Want 2.6 as soon as possible. (JP: trying. will be provided)
timeline: 1 more month, but not fixed... will come back with more details (also from Stephan).
Marc Dobson (ATLAS-online, for Bruce Barnett)
Issues:
- Java: want at least JRE (dependency is noted)
- gcc: alternative compilers, (Not showstoppers). gcc-3.3 is bad, but
gcc-3.4 would be nice.. eventually available via LCG, but ATLAS-online
is not using LCG environment. Would prefer compiler on machine
(non-AFS), prefer gcc-alt RPM.
JI: manpower issue, who will be doing this.. (P.Defert did gcc for 7.3 as ASIS "legacy") - bigphys and others things: tested, no showstoppers found. Will recompile all software on certification machine next week (nothing bad expected, using gcc-3.2.3 already). Could be done end of next week.
Alberto Aimar (LCG-SPI)
As requested, the LCG/SPI external software has been recompiled on gcc-3.2.3, don't plan to recompile again for newer versions unless problems appear.
MC: will you keep the SW up-to-date, i.e. recompile newer
versions? (will build current GAUDI on test system, need production POOL
version, and POOL was moving quickly due to bugs).
AA: no, unless explicitly requested. Not top priority, nobody used
this immediatly (despite the strong requests during the last
certification meeting). Will wait for explicit requests.
MC: will build current GAUDI on test system, need production POOL version (and POOL was moving quickly due to bugs).
AA: why spend effort in a hurry if nobody using these releases? Will wait for requests.
Jarek Polok (Desktops)
issues:
- Printing needs to be packaged
- installer changes
NMN: moving into which direction for printing?
JP: tool to extract full list of printers. same data as CERN
printing wizard for Windows.
NMN/MC: non-existing printers. consistency check to LANDB required
Jan Iven (general IT things)
- CASTOR: seems OK, relocation to /usr under way (with symlinks)
- PARC/engineering application: trouble with legacy environment, still being tested. Initial tests on Mathematica,CASTEM,Axalant,Ansys,HPGLview OK
- SDT (software devel tools): no results yet. Insure seems to work, Together only in RH6 version (vendor contacted)..
- VRVS: should be included as RPMs
- EGEE uses RHE3 as certification platform, will need to re-test on CEL3 for deployment at CERN. [update: EGEE will develop on CEL3]
certification end date discussion
Summary: no end date? LHCb+CMS: No. At least >= end of May. MC: told you before certification : not before May. NMN: unrealistic while things are running. cannot certify, this needs "validation" by running in production. MC: compiling is not proving anything, need to get tools running and test in real environment. could certify until summer holidays, then start deploying. Analysis environment is not going to change until September.. JP: question certification model, takes always too long MC: compare to 7.3 certification (Nov 2002). Considered it "production", 2 month (January): moved default. NMN: AB deploys 6months - 1 year ahead. (compare AIX). NMN: cannot declare end of certification the day we have the ORACLE stuff on the table (which was a hard requirement) MC: IT wants "rubberstamp" from individual SW providers. Experiment is much less structured, can only wait for problems to be reported. Need more visible "beta" environment (more machines?), explicit beta. "this is what LXPLUS will look like in 3 months", then ask experiment people to test there. NMN: can install own test machines, but QUATTOR is worse than beta, "unusable" for non-IT (documentation bad) Thorsten: mid-May/end-May is goal to have "LXPLUS" setup. MC: similar timescale to have analysis SW for LHCb, then ask users to test. JI: will still have problems found after release. MC: should do the switch like last time, following demand JP: afraid of continuous "1 more month" MC: 2 weeks on next LXPLUS would be enough. [ NMN: careful about newer versions of GNOME/KDE screwing older environment on AFS. JI: no big deal last time. ] Marc: version after CEL3?? (experiments are getting worried). installation + maintenance costs, not foreseen in budget. JI: details at HEPix, but will be cheap enough. JP: propose: site license or nothing for Red Hat.