Linux certification coordination meeting, 14.11.02 -------------------------------------------------- present: ATLAS: Marc Dobson, Guenter Duckeck CMS: Eric Cano Desktop: Jarek Polok IT-FIO: Tim Smith IT-catchall: Jan Iven LHCb: Marco Cattaneo non-LHC exp.: Benigno Gobbo PS-CO: Nicolas de Metz-Noblat SL-CO: Alastair Bland excused: CMS: Stephan Wynhoff [gave 'OK' by mail] missing: ALICE: Fons Rademakers [gave 'OK' by mail] 1. Round-the-table status review and certification recommendation: ------------------------------------------------------------------ Beningo: [OK] - COMPASS: (OK) tests are depending on the port from Objectivity to ORACLE, nothing critical seen so far, will be able to resolve upcoming problems - NA48: (OK) no complaints, but added new dependencies - DELPHI: (OK) tests, now verifying using production, nothing critical found until now - OPAL: (OK) dependencies listed, small problems, but not really blocking Eric: [OK] hardware drivers tested, previous crash seems to have gone Marco: [DELAY] - Dependencies were not rebuilt on 7.3, using 7.2 versions (CLHEP, CERNLIB, BOOST etc.). Would like to test native version - need more time to test, started compilations only this week (required compiler only recently installed). need until Friday for compilation, then ~1 week to test. No problems expected as outside institutes seem happy with the standard RH7.3 release and the patched gcc 2.95.2 compiler. - Production tools not tested yet (this would be nonblocking) - online: PVSS is not blocking anymore, but need this for switching off 6.1.1 (no native PVSS version for 7.3 available, seems to run on 7.3) [confirmed by Alastair:STMA(SPS)-PVSS application seems to run] Marc: [OK] - status of their dependencies (example CMT,Java) are UNKNOWN. not a sticking point, were brought up fairly late. [Discussion on IT Java support] Guenter: [OK] - no real tests on 7.3.1 done, outside sites run 7.2 and work fine. Expecting no problems. No block or delay. Jarek: [OK] Nicolas: [OK] [aside: no dependencies on 2.95.2] - system part ok, validated on following kind of systems: - Olivetti M133 (64Mb) - Vobis PII 233MHz - Vobis PII 300MHz - Vobis PII 400MHz - Vobis PII 600MHz - Vobis PIII 600MHz - Elonex SMP PIII 600MHz - Vobis PIII 800MHz (either with standard graphic R128 or with Matrox G400, G450 or G550). - Elonex PIII 850MHz - Validated a critical part: using 7.3.1 as boot server for NCD xterminal (using dhcpcd + tftpd). seems to work as on 7.2.1 - X+mwm seems to freeze on HP Omnibook 6000 more often than on 7.2.1 - Xfs problem from 7.2.1 has disappeared development: - gcc-2.91 -> gcc-2.96 change on monday: - everything needs to be recompiled on 7.3, mixing may cause problems: "..I swapped default compilation environment for applications to use 2.96 instead of 2.91 on last Monday for all PS Linux developers. This did not yet lead to noticeable problems except from people still using 7.2.1 systems: code resulting from a mixup of stuff compiled on 7.3 resulted in non-running code on 7.2. Problem was solved in recompiling everything on 7.3 for execution on 7.3. (This is the kind of reason why I have to migrate almost all systems in 1 shot)..." - Forms that were produced OK on 7.2 with dev6ip7 did not systematically rebuilt OK on 7.3. But apparently Jan Cuperus was able to turn around the problem (in collaboration with IT/DB). Alastair: [OK] - (LHC+SPS controls group, unified in Xmas, so SL/CO and PS/CO dependencies will be merged) - ADSM is used, but not too important. - would like to be independent from AFS, but still get security fixes [Jarek: looking at this] - known 7.2 problems: ipchains and legacy applications. FAQ seems to help. - benchmarking: 7.2 floating point is awful on 2.96 unless "-O3" is used (this is for gcc-2.96) - automount has a problem mounting SunOS,HP servers without explict map (manually mounting works) - mounting NFS from HP (messages "Jumbo packet in reply" for very large directories could be HPUX 10.20) --> investigating. - SUE: should run early in the morning, not at midnight (bad time for calls), not at boot (high nuisance to user); something looks for "adm"? worried about other Linux systems at CERN: ST, access system, random developers. Most of these should be tied into the certification through Jarek (Desktop) or Tim (LXPLUS). Should make this clear at next CLUG. Stephan: OK [by mail] testing gcc-3.2 now, looking forward to gcc-3.2 CERN libraries. Jan: [DELAY] - native 7.3.1 CERNLIB only available next week, some OpenMotif problem [Nicolas: had color problems while mixing ICS+Openmotif] - CASTOR only available next week - no status update for GEANT3/4 - ADSM: not tested yet, may have trouble with 'ext3' not known to ADSM, graphical interface may be broken (text version should work). New versio available, but could require server update Alastair: is this the future backup client from IT? Tim: not announced yet. - IT services reported no reason for delay - given the delay for IT-provided packages, recommend delaying one week. Tim: [OK] - moving tools from 7.2 to 7.3 took more effort than expected (config file format changes). Now have 7.3.1 internally ready. - of course, waiting for basically everybody else before mass-migrating - LXPLUS7 will grow quickly after certifiction at the expense of LXPLUS6 - would like quick transfer of LXBATCH to 7.3.1 for non-6.1.1-only contigents - alias change LXPLUS -> LXPLUS7 will be announced separately (DTF) - reminder: FOCUS announcement: 6.1.1 services will go in May. [Jan: would like similar time for completely freezing 6.1.1] Decision: ========= * accept reasons for delay * certify on Thursday 21.11.02 by default * no further meeting unless critical errors appear Process changes, things that went wrong --------------------------------------- * missing local sysadmins in the certifications? will need iterations to get into touch with everybody concerned, task of the members of this group. * some dependencies should be rather on IT services than on individual products, example: rather "backup service" than ADSM client. Up to IT to fulfil the dependency with a product. * need more active role of IT/LCG-provided packages (ex. CERNLIB, ORACLE), these need to be available early and in the "standard" location (on test machines) even if not fully certified yet themselves. Should be adressed by the dependency list, but did not work too well this round. * add a new status for "SEEMS to work" for packages that may already be used by others, but where e.g. no vendor-supported release is available yet * how to tie Joe Random User and other and service providers (e.g. ST/LHC/TIS, SL-AP monte carlo) into the certification? Either per Desktop/LXPLUS, or through more active CLUG (e.h. chairperson), or by adding people to the certification coordination group in case we completely have missed somebody. Should take a look at the composition of the Nice2k migration task force (which is reporting to DTF, another way to reach people not represented in the certification) Timeline until next certification --------------------------------- CLUG: RedHat 8.X certification could start in spring, expect to take 6 month, means certification in autumn and deployment until the end of 2003. Basically still valid. LHCb/CMS are primarily interested in gcc-3.2 and CERN libraries for future production runs. Discussion: should we have a "7.3.1+gcc-3.2" certification or roadmap? Pro: + something formal gives more clout towards package providers than individual experiments/groups have + roadmap for compilers != OS + in the interest of all, will speed up a 8.X certification if compiler-related problems have been solved earlier. Contra: - still would need 7.3.1+gcc-2.95.2 for production, a 7.3.1+gcc-3.2 would not give an end-of-live for 7.3.1+gcc-2.95.2 - could cause trouble for package deliveries, would need ability to have multiple versions per platform [ should not be a problem for experiments, already supporting multiple versions per platforms, ASIS has gotten the ability recently; may impact some of the AFS-based delivery forms ] - still sufficient time for informal approach, formal certification would force maintainers to raise priorities although this may de-facto not be needed (if the maintainers are interested into gcc-3.2 themselves) - not everybody would be interested in this, e.g PS/CO+SL/CO are "Java fanatics" and don't care too much about the gcc-alt things Decisison: try informal approach (experiment -> package provider) until end of year, review, start formal compiler certification process for gcc-3.2 afterwards if informal approach is too slow. Should we have minor releases for 7.3 (7.3.2, 7.3.3,..)? (Idea was to have ~6month release cycle for minor releases). Yes, but not easily planned. Minor release could be triggered by: * lots of updates from RedHat (security, bug fixes) * need to support new hardware * new redhat 7.x release (unlikely). Agreement that formal certification would not be needed if * updates would be applied anyway to current 7.3.1 systems (as for the security fixes) * public test machines would be available for a short period of time (2 weeks)