Hexo

2020-08-20

sort()和sorted()

区别介绍

sort 是应用在 list 上的方法，针对原list排序无返回值
sorted 是内建函数，可对所有可迭代的对象进行排序操作，会返回新的list

内部实现探究

（知乎）python sort 函数采用的排序算法：其中一个回答提到了 python 中的 sorted 排序内部实现是 timsort，并没有说 sort 。
（GitHub）python的sorted排序分析：同样只提到了 python 中的 sorted 排序内部实现是 timsort，并没有说 sort （知乎回答的一个链接）。
（CSDN）C++，java,Python的内部实现sort怎么实现的：内容提到 python内部的sort采用的是混合（hybrid）排序，规模小的时候采用 binary insertion，规模大的时候采用 sample sort 。

###结论：实现机制—Timsort 时间复杂度O(nlogn) 空间复杂度O（n）

Timsort

Timsort是结合了合并排序（merge sort）和插入排序（insertion sort）的排序算法，由Tim Peters在2002年设计了该算法并在Python中使用（TimSort 是 Python 中 list.sort 的默认实现）。Pyhton自从2.3版以来一直采用Timsort算法排序，现在Java SE7和Android也采用Timsort算法

基本流程

基本工作过程是：

扫描数组，确定其中的单调上升段和严格单调下降段，将严格下降段反转。我们将这样的段称之为run。
定义最小run长度，短于此的run通过插入排序合并为长度高于最小run长度；
反复归并一些相邻run，过程中需要避免归并长度相差很大的run，直至整个排序完成；
如何避免归并长度相差很大run呢，依次将run压入栈中，若栈顶run X，run Y，run Z 的长度违反了X>Y+Z 或 Y>Z 则Y run与较小长度的run合并，并再次放入栈中。依据这个法则，能够尽量使得大小相同的run合并，以提高性能。注意Timsort是稳定排序故只有相邻的run才能归并。
Merge操作还可以辅之以galloping模型（记录归并时run的界值，如runA完全小于或大于runB直接合并，提升归并效率

总结

timsort是工业级算法，其混用插入排序与归并排序，二分搜索等算法

充分利用待排序数据可能部分有序的事实，并且依据待排序数据内容动态改变排序策略——选择性进行归并以及galloping

Timsort是稳定的算法，当待排序的数组中已经有排序好的数，它的时间复杂度会小于n logn。Timesrot稳定（最坏和平均时间一致），时间复杂度是O（n log n）。在最坏情况下，Timsort算法需要的临时空间是n/2，在最好情况下，它只需要一个很小的临时存储空间

TimSort–python实现

import time
import random
"""
二分搜索用于插入排序寻找插入位置
"""
def binary_search(the_array, item, start, end):
    if start == end:
        if the_array[start] > item:
            return start
        else:
            return start + 1
    if start > end:
        return start

    mid = round((start + end)/ 2)

    if the_array[mid] < item:
        return binary_search(the_array, item, mid + 1, end)

    elif the_array[mid] > item:
        return binary_search(the_array, item, start, mid - 1)

    else:
        return mid

"""
插入排序用于生成mini run
"""
def insertion_sort(the_array):
   
    l = len(the_array)
    
    for index in range(1, l):
        value = the_array[index]
        pos = binary_search(the_array, value, 0, index - 1)
        the_array[pos+1:index+1] = the_array[pos:index]
        the_array[pos] = value
   
    return the_array
"""
归并，将两个有序的list合并成新的有序list
"""
def merge(left, right):
    
    if not left:
        return right
    if not right:
        return left
    l_len = len(left)
    r_len = len(right)
    result = [None]*(l_len+r_len)
    i, j, k= 0,0,0
    while i < l_len and j< r_len:
        if left[i] <= right[j]:
            result[k] = left[i]
            i+=1
        else:
            result[k] = right[j]
            j+=1
        k+=1
    while i<l_len:
        result[k]=left[i];
        k+=1
        i+=1
    while j<r_len:
        result[k]=right[j]
        k+=1
        j+=1
   
    return result
     
    

def timsort(the_array):
    runs = []
    length = len(the_array)
    new_run = [the_array[0]]
    new_run_reverse = False
    # 将the_array拆分成多了(递增或严格递减)list并将严格递减的list反转后存入runs。
    for i in range(1, length):
       
        if len(new_run) == 1:
            if the_array[i] < the_array[i-1]:
                new_run_reverse = True
            else:
                new_run_reverse = False
            new_run.append(the_array[i])
                
        elif new_run_reverse:
            if the_array[i] < the_array[i-1]:
                new_run.append(the_array[i])
            else:
                new_run.reverse()
                runs.append(new_run)
                #print(new_run)
                new_run=[]
                new_run.append(the_array[i])
        else:
            if the_array[i] >= the_array[i-1]:
                new_run.append(the_array[i])
            else:
                runs.append(new_run)
                #print(new_run)
                new_run=[]
                new_run.append(the_array[i])
       
        if i == length - 1:
            runs.append(new_run)
            #print(new_run)

    mini_run = 32
    sorted_runs=[]
    cur_run=[]
    # 对runs中的每一项list长度不足mini_run用插入排序进行扩充，存入sorted_runs
    for item in runs:
        if len(cur_run) > mini_run:
            sorted_runs.append(insertion_sort(cur_run))
            cur_run = item
        else:
            cur_run.extend(item)
      
    sorted_runs.append(insertion_sort(cur_run))
    
   
    # 依次将run压入栈中，若栈顶run X，Y，Z。
    # 违反了X>Y+Z 或 Y>Z 则Y与较小长度的run合并，并再次放入栈中。
    # 依据这个法则，能够尽量使得大小相同的run合并，以提高性能。
    # Timsort是稳定排序故只有相邻的run才能归并。
    run_stack = []
    sorted_array = []
    
    for run in sorted_runs:
        run_stack.append(run)
        stop = False
        while len(run_stack) >= 3 and not stop:
            
            X = run_stack[len(run_stack)-1]
            Y = run_stack[len(run_stack)-2]
            Z = run_stack[len(run_stack)-3]
            if (not len(X)>len(Y)+len(Z)) or (not len(Y)>len(Z)):
                run_stack.pop()
                run_stack.pop()
                run_stack.pop()
                if len(X) < len(Z):
                    YX = merge(Y,X)
                    run_stack.append(Z)
                    run_stack.append(YX)
                else:
                    ZY = merge(Z,Y)
                    run_stack.append(ZY)
                    run_stack.append(X)
            else:
                stop =True
            
        
        
    #将栈中剩余的run归并
    for run in run_stack:
        sorted_array = merge(sorted_array, run)
        

    #print(sorted_array)

data = []
for x in range(0,100):
    data.append(random.randint(0,10000))

start = time.process_time()
timsort(data)
end = time.process_time()
print(end-start)