LIGO Support Ticket 15287
Ticket Information
Number: admin 15287
User: anderson@ligo.caltech.edu
Email: espinoza_e__AT__ligo.caltech.edu,duncan__AT__gravity.phys.uwm.edu
Status: new
Assigned To: psilord
Date: Mon, 9 Apr 2007 15:42:02 -0700
From: Stuart Anderson <anderson__AT__ligo.caltech.edu>
To: condor-admin__AT__cs.wisc.edu
CC: Todd Tannenbaum <tannenba__AT__cs.wisc.edu>, Erik Espinoza
<espinoza_e__AT__ligo.caltech.edu>, Brown Duncan
<duncan__AT__gravity.phys.uwm.edu>
Subject: LIGO X509 certificate management enhancement request
This is a request for Condor to be robust against GSI sshd removing a
proxy cert that a submitted job points to before that job has a chance
to execute.
The exapmle scenario is that a user:
1) Uses their X509 certificate to ssh to a Condor submit machine.
2) Submits a condor job that deligates (or copies) their credentials to
an execute machine.
3) Logs out while the job is still Idle in the queue.
4) Expects the executed job to have their credentials available.
The proposed solution is for condor_submit (and condor_submit_dag?) to
cache a copy of the current user certificate at submit time (deligated proxy
from sshd in this case) to a spool directory. At execute time, Condor should
should continue to examine the original submit time cert location for a
possibly refreshed certificate that should be deligated. However, the
enhancement is for Condor to fall back to the original cached certificate
if it is no longer available at the original location.
Thanks.
--
Stuart Anderson anderson__AT__ligo.caltech.edu
http://www.ligo.caltech.edu/~anderson
===========================================================================
Date of creation: Mon Apr 9 17:42:19 2007 (1176158543)
Subject: Actions
Assigned to adesmet by adesmet
===========================================================================
Date of actions: Mon Apr 16 10:45:18 2007 (1176738319)
Subject: Actions
Assigned to psilord by tannenba
===========================================================================
Date of actions: Mon Jun 18 16:41:45 2007 (1182202906)
Subject: Actions
Assigned to adesmet by psilord
===========================================================================
Date of actions: Wed Jun 27 15:23:53 2007 (1182975833)
Subject: Comments added
ligo would like to see this at about the same priority as spooling
up lots o files and multicore support. for this particular one, perhaps
scott k needs to chime in and say if ligo is indeed going to officially
support MyProxy everywhere?
Comments added by tannenba
===========================================================================
Date comments were added: Fri Jul 18 13:50:18 2008 (1216407018)
Subject: Comments added
should be resolved by the end of july
Comments added by tannenba
===========================================================================
Date comments were added: Fri Jul 18 13:58:31 2008 (1216407512)
Subject: Comments added
Open questions:
- How does this fit into the general design. The current defaults for
Condor are to spool/cache _nothing_ unless requested. Making the proxy
a special case seems likely to cause confusion and surprises. It
means copies of your proxy can be floating around that you're not
aware of.
- The proxy is, in a way, an input file. The current general answer to
input files possibly going away is to spool all of your input files.
This should work. On the down side 1. I'm not sure the proxy actually
does spool (but it probably should) and 2. We should probably
continuing checking the original location for updates; which is a weird
special case (and potentially a security hole if the original location
is, say, /tmp/publically_writeable, and the submitter expected spooling
to make a copy and no longer need it).
Comments added by adesmet
===========================================================================
Date comments were added: Fri Jul 18 14:13:43 2008 (1216408423)
Subject: Comments added
Further context, primarily for psilord, who isn't necessarily
as familiar with the proxy stuff:
- Proxies are just short-lived certificates used for identification
and authorization. You can copy them around (or "delegate" them around
which is just a specialized copy that puts limits on the copy), so you
can carry your authorization around. Grid jobs, especially Globus jobs
(any version) require a proxy to get access to remote resources.
- gsi_ssh uses a proxy to log into a remote machine. It also places a
proxy on the remote machine for use. This is the use case. The problem
is that currently Condor expects to be able to find the proxy in the
place it was in during submission when it gets around to sending your
job off to a Globus site (a gatekeeper). But gsi_ssh deletes the proxy
on the remote machine when you log out. Condor finally tries to start the
job, the proxy is gone, the job fails to authenticate, the job stops.
- The requested feature is to copy the proxy somewhere, presumably into
the SPOOL. Problems with the are noted in the previous comment.
- Another complication is tha proxies often "refresh." A proxy by default has
a short lifespan, 12 hours is typical. Longer proxies are discouraged,
since if they're compromised (someone else gets a copy) they have a longer
window to abuse it. So the solution is to frequently update your proxy,
either by hand, or using software like MyProxy. Currently when a job is
running, Condor (specifically the gridmanager? gahp? I forget which) will
regularly check for an updated proxy. If it finds one in the same spot, it
will switch to the newer one. The problem is that proxies are typically
something like /tmp/x509_u1234 (where 1234 is your uid). If it's deleted,
anyone else can copy a new one in, potentially replacing your credentials.
(There may be ownership checks; I don't know).
Comments added by adesmet
===========================================================================
Date comments were added: Fri Jul 18 15:04:09 2008 (1216411449)
Subject: Actions
Assigned to psilord by adesmet
===========================================================================
Date of actions: Fri Jul 18 15:10:09 2008 (1216411809)
Subject: Comments added
Key insight from Jaime:
The proxy provided by gsi-ssh has the ssh session's PID attached
to the file name. So _it is not possible to refresh a proxy
by simply logging back in, or even using grid-proxy-init_. You'd
need to recreate a file with the old session's PID, or the gridmanager
will need to start watching for any gsi-ssh file in /tmp. Both seem
like a mess.
Comments added by adesmet
===========================================================================
Date comments were added: Thu Jul 24 13:27:46 2008 (1216924066)
Subject: Comments added
More thoughts from Jaime and Alan.
- The gridmanager does load the certificate, to check the date, before
it tells the gahp to update it. So an attacker putting garbage in
place once gsi-sshd deletes the proxy won't accomplish anything.
A logical next step would, "scan /tmp for new proxies from gsi-ssh and
use them." Complications: a user might log in using gsi-ssh but
two different certificates (mapping to the same user). The gridmanager
will need to confirm that the distinguished name is identical.
Also, there is a race condition. The gridmanager could see the proxy
and decide it was good; sshd could delete it; an attacker could put their
own proxy in place; the gahp will load the attacker's proxy. Fail.
Attackers can potentially hard link to one of your gsi-sshd proxies in /tmp.
If we're scanning /tmp for proxies, what ramifications will this have. It
will appear to be a valid proxy for all intents and purposes: right owner,
right permissions, right name. This may be okay, but it's worth more thought.
Comments added by adesmet
===========================================================================
Date comments were added: Fri Jul 25 16:22:50 2008 (1217020971)