3 回答

TA貢獻(xiàn)1777條經(jīng)驗(yàn) 獲得超10個(gè)贊
UNIX Sort命令的算法細(xì)節(jié)說(shuō)Unix Sort使用外部R-Way合并排序算法。鏈接進(jìn)入更多細(xì)節(jié),但實(shí)質(zhì)上它將輸入分成較小的部分(適合內(nèi)存),然后在結(jié)尾處將每個(gè)部分合并在一起。

TA貢獻(xiàn)1942條經(jīng)驗(yàn) 獲得超3個(gè)贊
該sort命令將工作數(shù)據(jù)存儲(chǔ)在臨時(shí)磁盤(pán)文件中(通常在/tmp)。

TA貢獻(xiàn)1772條經(jīng)驗(yàn) 獲得超5個(gè)贊
警告:此腳本為每個(gè)塊啟動(dòng)一個(gè)shell,對(duì)于非常大的文件,這可能是數(shù)百個(gè)。
這是我為此目的編寫(xiě)的腳本。在4處理器的機(jī)器上,它將分揀性能提高了100%!
#! /bin/ksh
MAX_LINES_PER_CHUNK=1000000
ORIGINAL_FILE=$1
SORTED_FILE=$2
CHUNK_FILE_PREFIX=$ORIGINAL_FILE.split.
SORTED_CHUNK_FILES=$CHUNK_FILE_PREFIX*.sorted
usage ()
{
echo Parallel sort
echo usage: psort file1 file2
echo Sorts text file file1 and stores the output in file2
echo Note: file1 will be split in chunks up to $MAX_LINES_PER_CHUNK lines
echo and each chunk will be sorted in parallel
}
# test if we have two arguments on the command line
if [ $# != 2 ]
then
usage
exit
fi
#Cleanup any lefover files
rm -f $SORTED_CHUNK_FILES > /dev/null
rm -f $CHUNK_FILE_PREFIX* > /dev/null
rm -f $SORTED_FILE
#Splitting $ORIGINAL_FILE into chunks ...
split -l $MAX_LINES_PER_CHUNK $ORIGINAL_FILE $CHUNK_FILE_PREFIX
for file in $CHUNK_FILE_PREFIX*
do
sort $file > $file.sorted &
done
wait
#Merging chunks to $SORTED_FILE ...
sort -m $SORTED_CHUNK_FILES > $SORTED_FILE
#Cleanup any lefover files
rm -f $SORTED_CHUNK_FILES > /dev/null
rm -f $CHUNK_FILE_PREFIX* > /dev/null
添加回答
舉報(bào)