Minutes

DRAFT minutes from the LXCERT meeting 04.05.04

Agenda:

  • ORACLE on Linux (J.Shiers, IT-DB)
  • Status updates and outstanding problems
  • end date for the certification
  • AOB

Present:

  • Jamie Shiers
  • Nicholas de Metz-Noblat
  • Alastair Bland
  • Benigno Gobbo (video)
  • Thorsten Kleinwort
  • Marco Cattaneo
  • Helge Meinhard
  • Jarek Polok
  • Massimo Lammana
  • Alberto Aimar
  • Marc Dobson (for Bruce Barnett)
  • Peter Sherwood
  • Eric Cano

Excused

  • Stephan Wynhoff: represented also by Eric Cano.

"ORACLE on Linux" presentation by J.Shiers (slides)

1st slide:
----------
gcc-3.2.3 ORACLE client available  on RHE10
(not yet on AFS)

* regular procedure for releasing client libraries (previous freq: 18month,
sync with server == useless). ORACLE Can now rebuild rapidly.

* should include all header files and complete development environment

* should be available in ORACLE CERN tree (even for outside institutes),
as RPM (but could go into SPI tree as well)

old AFS tree is not flexible enough (compiler/OS), possibly need
something different

Too little time since Friday (release date) to evaluate.

Q: RPM is standard solution for everybody, would this be  OK for IT-DB?
A: Yes, but needs testing (premature)

10g should interoperate with old servers, need testing, but ORACLE claim OK.
"Thin client", expect little trouble (no tons of NLS support etc)


Outlook:
-------
* good contacts since 99:
  * get immediate and longer term results ("silent install" improvements)
  * no more 16bit limits (Exabyte)
  * IEEE float and double without additional storage blowup in DB
  * "easy install"
10g1 not in production
 10g release2 to be announced in October (avail Q3 2005). Will probably
 use that for the initial data exports in LHC (no other release foreseen
 in that timeframe)

* next meeting June, will stress ORA for timely client releases.

CERN: Unusual customer (continuous development, don't freeze & ship)

3rd party products

* no support for "tora", perl-DBD-ORACLE (neither expertise nor staff
  to support). 10% cut in staff, will not take up new things now.
  still recognize this as a support issue.


   -+-+-+-
Benigno:
  basically need the client libraries. If this works with the servers,
  can start certification. Good news. Need to wait and see.

  When would these libs be available

   A: Now.

NMN: need to have them available on 1 machine only, not
generally. Using Pro-C, is it available?

A: Pro-C not there yet

NMN: linking works, but doesn't run (gcc296, gcc323 both link, first
   embedded SQL call gets error after getting proper read).

A: should be available immediatly for testing on AFS, can check whether this
   Pro-C release works.

Q: needs to be very clear what is available  on which platform (review directory
  structure, or README.) Sysadmin point of view (support requests
  bouncing etc)

Q(JI): can we install ORACLE via RPMs?

A: discussing now (should be over this week)

Agreement: certification time, don't need to do it "right" now. Can standarize
   in 2nd stage.


Q(Jarek): license, redistribution for other sites?

A: technet (OTN) general purpose license: otn.oracle.com/tech/oci/occi/occidownloads.html
  - not useable for production
  - condition: "not being a terrorist"

A: instant client license: useable for production
A: separate licenese agreement, prefer to use that one (CERN research,
  can use anywhere). Don't ask ORACLE to redistribute.

A: depends where CERN distro is going to, collab institites
JI: needs to go to separate directory.

 (NNM: go to PostgresSQL :-))

ORACLE may do GOOGLE search to validate license terms (like no benchmarks publishing..)

current AFS installation is not protected (only thing preventing copying is
CERN computing rules). Should not be worse.. 


Propose:
quick install now, set out 

NMN happy to start testing now even if full development kit is not
available yet. Expect both RHE3 and CEL3 to be available.

NMN: ORACLE Application servers with CEL3 from IT? -> taper off.

Status updates and ongoing issues

Benigno Gobbo (non-LHC experiments):

Comment: received no answers for status requests (no interest?) from some LEP experiments, even if they should show an interest (ORACLE).

COMPASS: require ORACLE client as hard dependency. If this issue is now resolved, they can start to test, everything else should be there (ROOT). No forecast on time (Data Acquisition started, long DA period, means little manpower (= internal problem)). No additional blockings dependencies from COMPASS known for now.

DEPLHI: need CERNLIB (now there). Should start to test (for production SW, testing method is unclear).

Peter Sherwood (ATLAS-offline)

(fresh on the job, tries to find out issues → Offline discussion.)

Alastair Bland, Nicolas de Metz-Noblat (AB-CO)

issues with RH AS 2.1 + RHE3:

  • installation: QUATTOR is not end-user-friendly, difficulties in deployment (need effort). No problems except missing docs.
  • porting:
    • ORACLE, embedded SQL: runtime broken.May be solved now
    • C++ specific: gcc-3 definition, OSstring handling different (open FD reassignment disappears). Need C++ expert, may need to rewrite parts of the software. (3.3.1 in Fedora is still more unfriendly), 3.2.3 has at least --no-deprecated. Basic tests (networking) successful. Modification means loosing backward compatibility... but could move as soon as ORACLE stuff is cleared. Most of the code from 7.3.4 seems to work

      db1, db1-devel are missing (NDBM format), required for development. May be solved by importing + recompiling sources into a static lib.

    • Showstoppers: cannot deploy large-scale before accelerators are stopped == November
    • can actually run 7.3.4 binaries (warning about errno). Can run AB/CO "Console" on CEL3 already now.
    • AB/CO development environment is more problematic (lots of developers, dependencies are hard to find out)
    • Alastair: switching from Windows to Linux consoles, (sh|c)ould go directly to CEL3.
  • no BLOCKing issues for now.

Thorsten Kleinwort (IT-FIO)

  • minor things:
    • time sync (NTP instead of AFS), could use more generally. NMN is interested, discrepancy between AFS and NTP (AFS needs to be kept in sync). Option is already there, and more usable (no jumps in time, no backward setting that can screw "make" and "cron")
    • FVWM2 gone: LXPLUS is used via Exceed, X-terminal. Concerned about network traffic (JP: explain DTF decision). NMN: mwm as "default" default?

      Jarek: user education, don't run full window manager

      MC: concerned about new helpdesk being able to answers 2000 calls, after switching. Train. Compare with amount of work to keep fvwm2.

      MC: helpdesk needs to be aware, common agreement.

      NMN: would need UCO -RESET. People are unable to change user environments copied over from somebody else.

      JP: reminder: DTF decision, don't go back too often.

      NMN: (example:) still recompiling old editor, less work than to re-educate users.

  • CASTOR server (Thorsten): not responsible!!!
  • LSF 5.1 client works
  • Kickstart/automatic install work.

expect no major problems, just porting SUE → NCM. Should be ready Mid-May (but may be slipping). No request from experiment yet to run CEL3.

JP: faster machine for new LXPLUS? A: Not planned.

Marco Cattaneo (LHCb)

  • analysis + reconstruction software: nothing certified yet, DataChallenge starts tomorrow, so little manpower available. LHCB is already using gcc-3.2.3, so they are not foreseeing any application software problems once the LCG libraries are there.
  • production software, mostly python code. Not being looked at (for next month), could have some surprises (which would be late). Only J.Closier is in a position do validate this, and he will be busy (manpower issue). Need to move to LCG2, whatever they support. At the PEB, the problem of binary compat was raised, some confusion since CERN is part of LCG2 but runtime requirements may differ.

[short discussion on CEL3, RHE3.]

MC summarizes: software will have to work on the GRID (across platforms), various releases may need to be supported.

Timelines:
"rubberstamp" for Application: end of May.
Production: end of June.
Deployment for apps: autumn (not in middle of a DC)

Helge Meinhard

nothing from CLUG

Eric Cano (CMS-online and also representing CMS-offline)

CMS-offline: require 1 month (after SEAL).
JP/AA: SEAL came out 3 weeks ago, Stephan was informed..

Q: can we have a newer user environment (KDE, GNOME, mozilla-1.6) (minor issue only)?
discussion: would like to keep as close to RHEL as possible, so anything with dependencies to it (like KDE/GNOME) should stay.
JI: reject KDE/GNOME. Mozilla is already newer than RHEL.

Q: Can JDK be included in distribution?
JP: will deploy JDK, even without formal recommendation from the Java workign group.
AB: JAWS contains JDK, could have been easier to take their version?
JP: need (probably) 1.5 for the 64bit architectures.

MC: Xerces available?
AA: is in the SPI tree, version 2.3.0

CMS-online: bigphys is in kernel, OK.
kernel API instability (thanks to Red Hat backports), but can cope
Want 2.6 as soon as possible. (JP: trying. will be provided)

timeline: 1 more month, but not fixed... will come back with more details (also from Stephan).

Marc Dobson (ATLAS-online, for Bruce Barnett)

Issues:

  • Java: want at least JRE (dependency is noted)
  • gcc: alternative compilers, (Not showstoppers). gcc-3.3 is bad, but gcc-3.4 would be nice.. eventually available via LCG, but ATLAS-online is not using LCG environment. Would prefer compiler on machine (non-AFS), prefer gcc-alt RPM.
    JI: manpower issue, who will be doing this.. (P.Defert did gcc for 7.3 as ASIS "legacy")
  • bigphys and others things: tested, no showstoppers found. Will recompile all software on certification machine next week (nothing bad expected, using gcc-3.2.3 already). Could be done end of next week.

Alberto Aimar (LCG-SPI)

As requested, the LCG/SPI external software has been recompiled on gcc-3.2.3, don't plan to recompile again for newer versions unless problems appear.

MC: will you keep the SW up-to-date, i.e. recompile newer versions? (will build current GAUDI on test system, need production POOL version, and POOL was moving quickly due to bugs).
AA: no, unless explicitly requested. Not top priority, nobody used this immediatly (despite the strong requests during the last certification meeting). Will wait for explicit requests.

MC: will build current GAUDI on test system, need production POOL version (and POOL was moving quickly due to bugs).

AA: why spend effort in a hurry if nobody using these releases? Will wait for requests.

Jarek Polok (Desktops)

issues:

  • Printing needs to be packaged
  • installer changes

NMN: moving into which direction for printing?
JP: tool to extract full list of printers. same data as CERN printing wizard for Windows.
NMN/MC: non-existing printers. consistency check to LANDB required

Jan Iven (general IT things)

  • CASTOR: seems OK, relocation to /usr under way (with symlinks)
  • PARC/engineering application: trouble with legacy environment, still being tested. Initial tests on Mathematica,CASTEM,Axalant,Ansys,HPGLview OK
  • SDT (software devel tools): no results yet. Insure seems to work, Together only in RH6 version (vendor contacted)..
  • VRVS: should be included as RPMs
  • EGEE uses RHE3 as certification platform, will need to re-test on CEL3 for deployment at CERN. [update: EGEE will develop on CEL3]

certification end date discussion

Summary: no end date? LHCb+CMS: No. At least >= end of May.


MC: told you before certification : not before May.
NMN: unrealistic while things are running.
    cannot certify, this needs "validation" by running in production.
MC: compiling is not proving anything, need to get tools running and
     test in real environment. could certify until summer holidays, then
     start deploying. Analysis environment is not going to change until September..



JP: question certification model, takes always too long

MC: compare to 7.3 certification (Nov 2002). Considered it
"production", 2 month (January): moved default.

NMN: AB deploys 6months - 1 year ahead. (compare AIX).
NMN: cannot declare end of certification the day we have the ORACLE
stuff on the table (which was a hard requirement)

MC: IT wants "rubberstamp" from individual SW providers. Experiment
is much less structured, can only wait for problems to be reported.
Need more visible "beta" environment (more machines?), explicit
beta. "this is what LXPLUS will look like in 3 months", then ask
experiment people to test there.


NMN: can install own test machines, but QUATTOR is worse than beta, "unusable" for non-IT (documentation bad)

Thorsten: mid-May/end-May is goal to have "LXPLUS" setup.

MC: similar timescale to have analysis SW for LHCb, then ask users
to test.

JI: will still have problems found after release.

MC: should do the switch like last time, following demand

JP: afraid of continuous "1 more month"

MC: 2 weeks on next LXPLUS would be enough.

[ NMN: careful about newer versions of GNOME/KDE screwing older
  environment on AFS.
  JI: no big deal last time.
 ]


Marc: version after CEL3?? (experiments are getting
worried). installation + maintenance costs, not foreseen in budget.
JI: details at HEPix, but will be cheap enough.
JP: propose: site license or nothing for Red Hat.