шпаргалка

Lineaarse keerukusega järjestamismeetodid: kimbumeetod (bucket sort), positsioonimeetod (radix sort) ja loendamismeetod (counting sort).

[ Назад ]



Bucket sort, or bin sort , is a sorting algorithm that works by partitioning an array into a finite number of buckets. Each bucket is then sorted individually, either using a different sorting algorithm, or by recursively applying the bucket sorting algorithm. Bucket sort is a generalization of pigeonhole sort. Bucket sort runs in linear time (Θ(n)). Since bucket sort is not a comparison sort, it is not subject to the Ω(n log n) lower bound.



Bucket sort works as follows:

1. Set up an array of initially empty "buckets" the size of the range.

2. Go over the original array, putting each object in its bucket.

3. Sort each non-empty bucket.

4. Put elements from non-empty buckets back into the original array



Pseudocode

function bucket-sort(array, n) is

buckets = new array of n empty lists

for i = 0 to (length(array)-1) do

insert array[i] into buckets[msbits(array[i], k)]

for i = 0 to n - 1 do

next-sort(buckets[i])

return the concatenation of buckets[0], ..., buckets[n-1]



Jump to: navigation, search

Elements are distributed among bins

Then, elements are sorted within each bin

Least Significant Digit Radix Sorts

A least significant digit (LSD) radix sort is a fast stable sorting algorithm which can be used to sort keys in lexicographic order. Keys may be a string of characters, or numerical digits in a given radix. The processing of the keys begins at the least significant digit, the rightmost digit, and proceeds to the most significant digit, the leftmost digit. The sequence in which digits are processed by a least significant digit (LSD) radix sort is the opposite of the sequence in which digits are processed by a most significant digit (MSD) radix sort.

A least significant digit (LSD) radix sort operates in O(nk) time, where n is the number of keys, and k is the average key length. You can get this performance for variable-length keys by grouping all of the keys that have the same length together and separately performing an LSD radix sort on each group of keys for each length, from shortest to longest, in order to avoid processing the whole list of keys on every sorting pass.

A radix sorting algorithm was originally used to sort punched cards in several passes. A computer algorithm was invented for radix sort in 1954 at MIT by Harold H. Seward. In many newer applications requiring super speeds and massive memory, the computer radix sort is an improvement on (slower) comparison sorts.

LSD radix sorts have resurfaced as an alternative to other high performance comparison-based sorting algorithms (like heapsort and mergesort) that require Ω(n ? log n) comparisons, where n is the number of items to be sorted. Comparison sorts can do no better than Ω(n ? log n) execution time but offer the flexibility of being able to sort with respect to more complicated orderings than a lexicographic one; however, this ability is of little importance in many practical applications.

Each key is first figuratively dropped into one level of buckets corresponding to the value of the rightmost digit of each key. Each bucket preserves the original order of the keys as the keys are dropped into each bucket. There is a one-to-one correspondence between the number of buckets and the number of values that can be represented by a digit. Then, the process repeats with the next neighboring digit until there are no more digits to process. In other words:

1. Take the least significant digit (or group of bits, both being examples of radices) of each key.

2. Group the keys based on that digit, but otherwise keep the original order of keys. (This is what makes the LSD radix sort a stable sort).

3. Repeat the grouping process with each more significant digit.

The sort in step 2 is usually done using bucket sort or counting sort, which are efficient in this case since there are usually only a small number of digits.

When an LSD radix sort processes keys which all have the same fixed length, then the upper bound of the execution time is O(n), where n is the number of keys to be sorted. When processing fixed-length keys, some implementations of LSD radix sorts are slower than O(n ? log n) comparison-based sorting algorithms, such as heap sort, unless a sufficiently large amount of input data is processed. What a sufficiently large amount of input data precisely is will vary from computer system to computer system and from implementation to implementation. If the keys used are short 2-byte integers, it is practical to complete the sorting with only two passes, processing 1 byte of each key per pass. In this case, and even when sorting 32-bit floating point numbers, radix sort can in practice be significantly faster than any other sorting algorithm. In one possible implementation of an LSD radix sort, variable length keys are padded with zero digits at the beginning of the keys to make all of the keys a fixed length, which requires O(NL) processing time, where N is the number of keys to be sorted and L is the length of the longest key.

One disadvantage of an LSD radix sort is that it does not run in place, which means that the keys or pointers to the keys must be temporarily stored outside of their original memory space during processing. O(n) additional memory space is needed to hold the keys or pointers to the keys. For fixed-length keys, another disadvantage of an LSD radix sort is that it requires one sorting pass over the input list for each digit in a key.

[edit] An example



Sort the list:

170, 45, 75, 90, 2, 24, 802, 66

1. Sorting by least significant digit (1s place) gives:

170, 90, 2, 802, 24, 45, 75, 66

Notice that we keep 2 before 802, because 2 occurred before 802 in the original list, and similarly for 170 and 90.

2. Sorting by next digit (10s place) gives:

2, 802, 24, 45, 66, 170, 75, 90

3. Sorting by most significant digit (100s place) gives:

2, 24, 45, 66, 75, 90, 170, 802

It is important to realize that each of the above steps requires just a single sorting pass over the data, since each item can be placed in its correct bucket without having to be compared with other items.

Some LSD radix sort implementations allocate space for buckets by first counting the number of keys that belong in each bucket before moving keys into those buckets. The number of times that each digit occurs is stored in an array. Consider the previous list of keys viewed in a different way:

170, 045, 075, 090, 002, 024, 802, 066

The first counting pass starts on the least significant digit of each key, producing an array of bucket sizes:

2 (bucket size for digits of 0: 170, 090)

2 (bucket size for digits of 2: 002, 802)

1 (bucket size for digits of 4: 024)

2 (bucket size for digits of 5: 045, 075)

1 (bucket size for digits of 6: 066)

A second counting pass on the next more significant digit of each key will produce an array of bucket sizes:

2 (bucket size for digits of 0: 002, 802)

1 (bucket size for digits of 2: 024)

1 (bucket size for digits of 4: 045)

1 (bucket size for digits of 6: 066)

2 (bucket size for digits of 7: 170, 075)

1 (bucket size for digits of 9: 090)

A third and final counting pass on the most significant digit of each key will produce an array of bucket sizes:

6 (bucket size for digits of 0: 002, 024, 045, 066, 075, 090)

1 (bucket size for digits of 1: 170)

1 (bucket size for digits of 8: 802)

At least one LSD radix sort implementation now counts the number of times that each digit occurs in each column for all columns in a single counting pass. (See the external links section.) Other LSD radix sort implementations allocate space for buckets dynamically as the space is needed.





Counting sort is a sorting algorithm which (like bucket sort) takes advantage of knowing the range of the numbers in the array to be sorted (array A). It uses this range to create an array C of this length. Each index i in array C is then used to count how many elements in A have a value less than i. The counts stored in C can then be used to put the elements in A into their right position in the resulting sorted array. It is less efficient than pigeonhole sort.

Characteristics of counting sort

Counting sort is stable (see sorting algorithm) and has a running time of Θ(n+k), where n and k are the lengths of the arrays A (the input array) and C (the counting array), respectively. In order for this algorithm to be efficient, k must not be too large compared to n. As long as k is O(n) the running time of the algorithm is Θ(n).

The indices of C must run from the minimum to the maximum value in A to be able to index C directly with the values of A. Otherwise, the values of A will need to be translated (shifted), so that the minimum value of A matches the smallest index of C. (Translation by subtracting the minimum value of A from each element to get an index into C therefore gives a counting sort. If a more complex function is used to relate values in A to indices into C, it is a bucket sort.) If the minimum and maximum values of A are not known, an initial pass of the data will be necessary to find these (this pass will take time Θ(n); see selection algorithm).

The length of the counting array C must be at least equal to the range of the numbers to be sorted (that is, the maximum value minus the minimum value plus 1). This makes counting sort impractical for large ranges in terms of time and memory needed. Counting sort may for example be the best algorithm for sorting numbers whose range is between 0 and 100, but it is probably unsuitable for sorting a list of names alphabetically (again, see bucket sort and pigeonhole sort). However counting sort can be used in radix sort to sort a list of numbers whose range is too large for counting sort to be suitable alone.

Because counting sort uses key values as indexes into an array, it is not a comparison sort, allowing it to break the Ω(n log n) lower-bound on those sorts.

[edit] The algorithm

[edit] Informal



1. Count the discrete elements in the array. (after calculating the minimum and the maximum)

2. Accumulate the counts.

3. Fill the destination array from backwards: put each element to its countth position.

Each time put in a new element decrease its count.







КАТЕГОРИИ:

Network | английский | архитектура эвм | астрономия | аудит | биология | вычислительная математика | география | Гражданское право | демография | дискретная математика | законодательство | история | квантовая физика | компиляторы | КСЕ - Концепция современного естествознания | культурология | линейная алгебра | литература | математическая статистика | математический анализ | Международный стандарт финансовой отчетности МСФО | менеджмент | метрология | механика | немецкий | неорганическая химия | ОБЖ | общая физика | операционные системы | оптимизация в сапр | органическая химия | педагогика | политология | правоведение | прочие дисциплины | психология (методы) | радиоэлектроника | религия | русский | сертификация | сопромат | социология | теория вероятностей | управление в технических системах | физкультура | философия | фотография | французский | школьная математика | экология | экономика | экономика (словарь) | язык Assembler | язык Basic, VB | язык Pascal | язык Си, Си++ |