nohupジョブを実行し、すでに実行されているプロセスで特定の新しいコマンド(kerberos認証など)を実行したいと思います。実際に理想的な解決策は、最初にreauthコマンドを実行してから実際のタスクを実行することです。すべて同じ nohup プロセス ID の下にあります。。これにより、nohupプロセスでkerberosチケットが失われることはありません。だから、Kerberosチケットを失わないように、他のPythonスクリプトと同期して再認証を実行したいと思います。画面やtmuxなしまたはスクリプトと対話する必要があるすべて。私は基本的にPythonタスク(またはすべてのタスク)を実行し、実行が完了するまでチケットを失わないデーモンを実装したいと思います。どうすればいいですか?
これは現在の試みですが、うまくいくかどうかはわかりません。
# - set up this main sh script
source ~/.bashrc
source ~/.bash_profile
source ~/.bashrc.user
echo HOME = $HOME
source cuda11.1
conda init bash
conda activate metalearning_gpu
# - get a job id for this tmux session
export SLURM_JOBID=$(python -c "import random;print(random.randint(0, 1_000_000))")
echo SLURM_JOBID = $SLURM_JOBID
export OUT_FILE=$PWD/main.sh.o$SLURM_JOBID
export ERR_FILE=$PWD/main.sh.e$SLURM_JOBID
export WANDB_DIR=$HOME/wandb_dir
export CUDA_VISIBLE_DEVICES=4
echo CUDA_VISIBLE_DEVICES = $CUDA_VISIBLE_DEVICES
python -c "import torch; print(torch.cuda.get_device_name(0));"
# - CAREFUL, if a job is already running it could do damage to it, rm reauth process, qian doesn't do it so skip it
# top -u brando9
# pkill -9 reauth -u brando9
# - expt python script then inside that python pid attach a reauth process
# should I run rauth within python with subprocess or package both the nohup command and the rauth together in badsh somehow
#python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE &
#nohup (echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE) &
nohup (echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/main.py > $OUT_FILE 2> $ERR_FILE) &
# other option is to run `echo $SU_PASSWORD | /afs/cs/software/bin/reauth` inside of python, right?
export JOB_PID=$!
echo JOB_PID = $JOB_PID
# - Done
echo "Done with the dispatching (daemon) sh script
それはおそらく十分に長く実行されていないからですか?わからない。
別のオプションは、Pythonスクリプト自体内で再認証を実行することです。テストされていませんが、動作する可能性がありますか?主な秘訣は、Kerberosのチケットが欠落しているため、Nohupプロセスが死なないことです。
reauthの内容は次のとおりです。
(metalearning_gpu)~ $ cat /afs/cs/software/bin/reauth
#!/usr/bin/perl
# $Id: reauth 2737 2011-06-20 18:14:05Z miles $
#
# Original version (C) Martin Schulz, 2'2002
# University Karlsruhe
#
# Modifications by Miles Davis <[email protected]>
# Super minimal -- call programs rather than functions to reduce dependence
# on extra perl modules.
#
# Heimdal patches thanks to Georgios Asimenos <[email protected]>
#
# General:
##########
# This little script aims at maintaining a valid AFS token from a
# users password for long running jobs.
# As everybody knows (or should know) Kerberos tickets and AFS tokens
# only have a limited lifetime. This is so by design and is usually
# reasonable. After 12 hours, it is no more obvious that it is really
# that user sitting in front of the computer that once typed the
# correct password in. Furthermore the damage caused by compromized
# AFS tokens is limited to the lifetime of that ticket.
# However, there are situations when users want to use long running
# jobs that will write to AFS filespace for several days. Renewable
# tickets are not so much of help here, since they can only be renewed
# if ....
# Therefore the secret has somehow deposited on the local computer
# that will run the long time job. This can be eiter done by storing a
# keytab on the local disk, maybe with a cron(*) principal with
# reduces priviledges. The approach taken here is to work with the
# original password and keep it in RAM only.
# When starting this program, the user is asked for his principal and
# the corresponding password. Then the TGT and AFS token is obtained
# and displayed, afterwards, a background process is forked and the
# main process will return to the system prompt. The workload program
# can now be started.
# The background process will periodically attempt to obtain krb
# tickets and AFS tokens. If this fails for some reason (Kerberos
# server not available or anything, the program aborts.
# aklog does not create a new pag if not told so. If you want your
# background process have a separate pag, create it beforehand.
# The reauth.pl program will work until eternity if is not stopped
# somehow. The canonical way is kill it by "kill $pid", where $pid is
# the process id printed before the return of the initial call to
# reauth.pl or found in the output of "ps".
# (*) Cron jobs are another issue. Our institute introduced
# user.cron-style principals to enable cron to obtain a token and then
# work on restricted parts of the users home directories.
# Security issues:
##################
# reauth.pl will run forever if you do not stop it, so don't forget that!
# The password is kept in RAM (of the child process). AFAIK, this can
# only be recovered by local root (who you need to trust anyway). It
# will not survive a reboot of the local machine.
# The password is not kept on any disk. Therefore any bootfloppy
# (reboot to single user mode..) or screwdriver (take disk away..)
# attacks are not promising.
# Be aware that your NSA-, FBI-, MI5-, KGB-, ElQaida-, or (*insert
# your favorite opponent or competitor here*)-sponsored cleaning
# personnel or coworkers might have even more elaborate means... :-)
# BUGS:
#######
# Only mildly tested only on Linux and Solaris.
# Uses kinit, aklog, klist and tokens programs for a KerberosV/ Ken
# Hornstein's migration kit centered AFS setup. Please adjust to your
# config.
###########################################################################
# Configs:
# kinit program, add path if necessary
if ( -e "/usr/kerberos/bin/kinit" ) {
$kinit="/usr/kerberos/bin/kinit";
} elsif ( -e "/usr/lib/heimdal/bin/kinit" ) {
$kinit = "/usr/lib/heimdal/bin/kinit";
} elsif ( -e "/usr/bin/kinit" ) {
$kinit="/usr/bin/kinit";
} else {
die("Couln't find kinit.\n");
}
# aklog program, add path if necessary
if ( -e "/usr/bin/aklog" ) {
$aklog="/usr/bin/aklog";
} elsif ( -e "/usr/lib/heimdal/bin/afslog" ) {
# or, afslog, for heimdal weirdos
$aklog="/usr/lib/heimdal/bin/afslog";
} else {
die("Couln't find aklog or afslog.\n");
}
# klist program, add path if necessary
$klist="/usr/kerberos/bin/klist";
# tokens program, add path if necessary
$tokens="/usr/bin/tokens";
#################################################################
# Program:
use Getopt::Long;
use POSIX qw(setuid);
use POSIX qw(setgid);
use POSIX qw(setsid);
# Defaults for command line options.
my $keytab = '';
my $command = '';
my $username = '';
my $debug = 0;
my $verbose = 0;
my $interval=15000; # time interval in seconds: 4+ hours:
my %opts = (
# Keytab
'k=s' => sub {
$keytab = @_[1];
$kinit_opts .= "-k -t $keytab ";
},
# Run command
'c=s' => sub {
$command = @_[1];
},
# Run command as user
'u=s' => sub {
$username = @_[1];
},
# Time interval to sleep
'i=i' => sub {
$interval = @_[1];
},
# Debug
'd' => sub {
$debug++;
},
# Be versbose
'v' => sub {
$verbose++;
},
);
GetOptions(%opts) or die "Usage: reauth [ -k=keytab ] [ -u user ] [ -i <sleep_interval ] [ -v ] [ -c <command> ]\n";
if(@ARGV) {
$princ = $ARGV[0];
debug_print(2, "Principal name provided by argument = $princ");
} else {
# Assume we want the login name as the principal name
$princ = getpwuid($<);
debug_print(2, "Principal name provided by argument = $princ");
}
if ($keytab) {
# Don't ask for password, a keytab was provided.
debug_print(1, "Keytab provided = $keytab");
} else {
# read password, but turn off echo before:
print "Password for $princ: ";
system "stty -echo";
$passwd = <STDIN>;
system "stty echo";
printf "\n";
chomp $passwd;
# Actually get the tickets/tokens
if(obtain_tokens()!=0) {
die "Can't obtain kerberos tickets\n";
}
if ($verbose) {
show_tokens();
}
}
# fork to go into background:
# a) the parent will exit
# b) the child will work on
$pid = fork();
if ($pid) {
# I am the parent.
printf "Background process pid is: $pid\n";
if ($command) {
debug_print(1,"Waiting for child to die.");
wait;
debug_print(1,"Child is dead.");
}
exit 0;
} else {
# I am the child.
debug_print(2,"I am process $$");
print "Can't set session id\n" unless setsid();
debug_print(2,"KRB5CCNAME: " . $ENV{KRB5CCNAME});
#if ($ENV{KRB5CCNAME}) {
#$ENV{KRB5CCNAME} = $ENV{KRB5CCNAME} . "_reauth_$$";
#} else {
#$ENV{KRB5CCNAME} = "/tmp/krb5cc_reauth_$$";
#}
#debug_print(2,"Creating " . $ENV{KRB5CCNAME});
#system "touch $ENV{KRB5CCNAME}";
if ($username) {
debug_print(1, "Looking up UID for $username");
($name,$passwd,$UID,$GID, @junk) = getpwnam($username);
debug_print(1, "Changing to UID $UID, GID $GID");
print "Can't set group id\n" unless setgid($GID);
print "Can't set user id\n" unless setuid($UID);
if ($ENV{KRB5CCNAME}) {
$ENV{KRB5CCNAME} = $ENV{KRB5CCNAME} . "_reauth_$$";
} else {
$ENV{KRB5CCNAME} = "/tmp/krb5cc_reauth_$$";
}
}
debug_print(2, "Running as uid " . $<);
# Actually get the tickets/tokens
if(obtain_tokens()!=0) {
die "Can't obtain kerberos tickets\n";
}
if ($verbose) {
show_tokens();
}
# If I was told to run a command, do it.
if ($command) {
debug_print(1,"About to exec $command");
exec($command) or die "Can't execute '$command'.\n";
exit
}
debug_print(2,"Going into auth loop (interval is $interval).");
#close(STDOUT);
#close(STDERR);
# Otherwise, work until killed:
while (1) {
debug_print(2,"Waking up to obtain new tokens.");
obtain_tokens();
if ($verbose) {
show_tokens();
}
sleep $interval;
};
}
#################################################################
sub obtain_tokens() {
# ignore sigpipes' (according to perlopentut)
$SIG{PIPE} = 'IGNORE';
#debug_print(1,"Running: | $kinit -f $kinit_opts -p $princ 1>/dev/null 2>&1");
# run kinit
open(KINIT, "| $kinit -f $kinit_opts -p $princ 1>/dev/null 2>&1");
# pass password to stdin, password does not show up on command line
if (! $keytab) {
print(KINIT "${passwd}\n");
}
# close pipe and get status
close(KINIT); $status=$?;
debug_print(1,"kinit exited with status $status\n");
# act on status..
if($status == 256) {
if ($verbose) {
print "WARNING: kinit is not able to obtain Kerberos ticket ($status).\n";
print " Possible DNS or network problem. Continuing anyway...\n";
}
return 1;
} elsif($status!=0) {
print "kinit is not able to obtain Kerberos ticket: $status\n";
return 2;
};
debug_print(1,"Running $aklog...\n");
$status = system "$aklog >/dev/null" ;
debug_print(1,"aklog exited with status $status\n");
if($status!=0) {
print "aklog is not able to obtain AFS token: $status\n";
return 3;
};
return 0;
};
##################################################################
sub show_tokens() {
system $klist ;
system $tokens ;
};
##################################################################
sub debug_print($$) {
my $level = shift;
my $message = shift;
if ($debug >= $level) {
print "DEBUG$debug: $message\n";
}
}
##################################################################
Pythonの再検証の試み:
def run_bash_command(cmd: str) -> Any:
import subprocess
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
if error:
raise Exception(error)
else:
return output
def stanford_reauth():
# def stanford_rauth(password: Optional[str] = None):
# password: str = os.environ['SU_PASSWORD'] if password is None else None
# assert password is not None, f'Err: {password=}'
reauth_cmd: str = f'echo $SU_PASSWORD | /afs/cs/software/bin/reauth'
out = run_bash_command(reauth_cmd)
print('Output of reauth (/afs/cs/software/bin/reauth with password)')
print(f'{out=}')
私は現在、彼らが通過しているか死ぬことを確認するために両方を実行しています。
新しいエラー:
私は私がやっていることを試しています:
nohup sh -c "echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_diversity_with_task2vec.py --manual_loads_name diversity_ala_task2vec_hdb1_mio > $OUT_FILE 2> $ERR_FILE" > $PWD/main.sh.nohup.out$SLURM_JOBID &
しかし、それがまだ動作しているかどうかはわかりません。次のエラーが発生します。
Password for brando9: stty: 'standard input': Inappropriate ioctl for device
stty: 'standard input': Inappropriate ioctl for device
Can't obtain kerberos tickets
私は問題を理解すると思います。新しいコマンドは端末がパスワードを送信することを期待しているため、エコーはパスワードを送信できません。私は試した:
- 予想される(ただし、sudoまたはインストール方法はありません)
- ここにあるほとんどのものhttps://serverfault.com/questions/241588/how-to-automate-ssh-login-with-passwordしかし、最善を尽くすためにsudoを実行することはできません。
私はSSHを持っていますパスワードを送信する必要があります。パスワードを送信する方法はありますかssh
?
関連:
- 国際赤十字社:https://www.reddit.com/r/unix/comments/yvn5t3/how_does_one_send_new_commands_to_run_to_an/
- 法廷交差:https://www.quora.com/unanswered/How-does-one-send-new-commands-to-run-to-an-already-running-nohup-process-or-run-two-commands-together- nohupの同時実行
- Linux Reddit:https://www.reddit.com/r/linuxquestions/comments/yvo8ti/how_does_one_send_new_commands_to_run_to_an/
- 関連していますが、tmux(または画面では大丈夫だと思います)と関連しています。tmuxとの対話(nohupジョブの転送など)なしでtmuxを使用してバックグラウンドでジョブを実行するにはどうすればよいですか?
- 関連:基本端末をブロックせずにプロセスが終了するのを待ってから、一連のタスク(tmuxセッションの終了など)を実行するにはどうすればよいですか?
- 関連、パスワードが必要なコマンドを使用してLinuxで認証するには?
- 関連:tty / stty(ターミナル)ではなく、Expect、sshpassがない場合、どのようにコマンドにパスワードを送信しますか?
ベストアンサー1
理由はわかりませんが、ここでは代替案が機能しないようです。tty / stty(ターミナル)ではなく、Expect、sshpassがない場合、どのようにコマンドにパスワードを送信しますか?
3つの試みがすべて成功したようです。
- 参照されたshスクリプトでreauthを実行する
- Pythonで認証を実行する
- 上記の両方を実行してください。
ジョブの実行に使用するサンプルスクリプト:
# https://unix.stackexchange.com/questions/724902/how-does-one-send-new-commands-to-run-to-an-already-running-nohup-process-e-g-r
# sh ~/diversity-for-predictive-success-of-meta-learning/main_nohup_snap.sh
# - set up this main sh script
export RUN_PWD=$(pwd)
source ~/.bashrc
source ~/.bash_profile
source ~/.bashrc.user
echo HOME = $HOME
# since snap .bash.user cd's me into HOME at dfs
cd $RUN_PWD
echo RUN_PWD = $RUN_PWD
realpath .
source cuda11.1
conda init bash
conda activate metalearning_gpu
# - get a job id for this tmux session
export SLURM_JOBID=$(python -c "import random;print(random.randint(0, 1_000_000))")
echo SLURM_JOBID = $SLURM_JOBID
export OUT_FILE=$PWD/main.sh.o$SLURM_JOBID
export ERR_FILE=$PWD/main.sh.e$SLURM_JOBID
export WANDB_DIR=$HOME/wandb_dir
echo $OUT_FILE
echo $ERR_FILE
export CUDA_VISIBLE_DEVICES=5
echo CUDA_VISIBLE_DEVICES = $CUDA_VISIBLE_DEVICES
python -c "import torch; print(torch.cuda.get_device_name(0));"
# - CAREFUL, if a job is already running it could do damage to it, rm reauth process, qian doesn't do it so skip it
# top -u brando9
# pkill -9 reauth -u brando9
# - expt python script then inside that python pid attach a reauth process
# should I run rauth within python with subprocess or package both the nohup command and the rauth together in badsh somehow
#python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE &
nohup sh -c 'echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE' > $PWD/nohup.out$SLURM_JOBID &
#nohup python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE &
# other option is to run `echo $SU_PASSWORD | /afs/cs/software/bin/reauth` inside of python, right?
export JOB_PID=$!
echo JOB_PID = $JOB_PID
echo SLURM_JOBID = $SLURM_JOBID
# - Done
echo "Done with the dispatching (daemon) sh script"
主なコマンド:
nohup sh -c 'echo $SU_PASSWORD | /afs/cs/software/bin/reauth; python -u ~/diversity-for-predictive-success-of-meta-learning/div_src/diversity_src/experiment_mains/main_sl_with_ddp.py --manual_loads_name sl_hdb1_5cnn_adam_cl_filter_size --filter_size 4 > $OUT_FILE 2> $ERR_FILE' > $PWD/nohup.out$SLURM_JOBID &
私はオプション3を使い続けると思いますので、両方を実行してください。なぜなら、分散ジョブを実行しても、そのプロセスが私の許可なしにランダムに終了したくないからです。これを行うには、Pythonスクリプトは次のことを行います。
def run_bash_command(cmd: str) -> Any:
import subprocess
process = subprocess.Popen(cmd.split(), stdout=subprocess.PIPE)
output, error = process.communicate()
if error:
raise Exception(error)
else:
return output
def stanford_reauth():
""""
re-authenticates the python process in the kerberos system so that the
python process is not killed randomly.
ref: https://unix.stackexchange.com/questions/724902/how-does-one-send-new-commands-to-run-to-an-already-running-nohup-process-or-run
"""
reauth_cmd: str = f'echo $SU_PASSWORD | /afs/cs/software/bin/reauth'
out = run_bash_command(reauth_cmd)
print('Output of reauth (/afs/cs/software/bin/reauth with password): ')
print(f'{out=}')
再認証作業を頻繁に確認することを忘れないでください。
# - CAREFUL, if a job is already running it could do damage to it, rm reauth process
# pkill -9 reauth -u brando9