Posted: November 4th, 2017
The median is the value separating the lower half of a data sample from the upper half when the data is sorted. For a dataset, it may be thought of as the “middle” value. For example, in the data set {1, 3, 3, 6, 7, 8, 9}, the median is 6, is the fourth largest value, and also the fourth smallest value, in the sample of 7 elements.
When the number of data elements in the data set is even, the median is the average of the middle two values. For example, in the data set {1, 3, 3, 6, 8, 8, 9, 9}, the median is (6 + 8)/2 = 7, since 6 is the fourth smallest value and 8 is the fourth largest value, in the sample of 8 elements.
Note that the data may not always be presented in sorted order. Order of data does not change the median. For example, the median of {10, 1, 20, 3, 5} is 5. Since 5 is the third largest and also the third smallest element in the data set of 5 elements.
Median is a good central measure of data when the data set has abnormal outliers which are too high or too low compared to other data elements. Such outliers are common in weather-related data where short spans of calamities like cyclones, cloudbursts etc., can result in large parameter values for a short period of the years. However, the median would be robust to such extreme data points.
Goals:
The goals of this assignment are to:
Imagine that you have been recruited in the weather department. The weather department gets the temperature measurements of your city every day when the sun is at its highest point in the sky. Every day, the weather department wants to publish the median temperature of the city since the beginning of time (or the year so far).
Write a program which has two modes, interactive mode and file mode.
Interactive mode is entered when the user does not pass in filenames as command-line arguments. In interactive mode, the user is prompted for a value (of type double) and then the current median over all previously entered values is displayed. This loop repeats until the user enters q to quit. Actually, any non-double value will end the program if running in interactive mode. See sample runs for more examples.
The program runs in File mode if the user includes command-line arguments (CLAs). The program reads the name of input files (ex. somefile.txt) from command-line arguments array and opens an output file whose name is derived from the name of input file name by appending “_out” to the input file name before the “.txt” extension (ex. somefile_out.txt).
For each value read from the file, the program will write the median of the data values read so far for that file to a new line of an output file. If the filename is not a valid file, then no values are read, and a file not found message is displayed. See sample runs.
For every temperature value Ti, the program writes the median temperature of the values read from the first temperature value of the file to the current temperature (Median of T1, T2, T3, … Ti). Each median value is written to its own line of the output file. See sample output files.
All input/output files are guaranteed to be text files with extension “.txt”. The input filenames (without the extension “.txt”) are guaranteed to have no period “.” character.
we can run the program as follows:
java MedianStream file1.txt file2.txt ... filen.txt
input filename | input file contents | output filename | PQ images view in sequence |
file0.txt |
1 3 4 |
file0_out.txt | file0 images |
file1.txt |
10 1 20 3 5 |
file1_out.txt | file1 images |
file2.txt |
9 12 1 |
file2_out.txt | file2 images |
file3.txt |
-50 -30 0 10 60 80 10 -80 -40 85 -15 |
file3_out.txt | file3 images |
You are given the following five classes:
EmptyQueueException.java PriorityQueueADT.java MaxPQ.java MinPQ.java MedianStream.java
As the input stream arrives, store the numbers in two Priority Queues such that at any point of time, about half of the numbers (smaller than the median) are in a MaxPQ and the other half of the numbers (greater than the median) are in a MinPQ. For correct execution of your algorithm, you must maintain the sizes of the two priority queues to be same or differ by 1 at most.
When a new element arrives, check if it greater than or less than the current median. If it is less than the current median, then the element has to be inserted in MaxPQ else it has to be inserted in MinPQ. Let’s assume we need to insert the element into MinPQ.
Case 1: If size(MinPQ) = size(MaxPQ), then the new element can be inserted into MinPQ. The new median is the top of MinPQ.
Case 2: If size(MinPQ) = size(MaxPQ) – 1, then the new element can be inserted into MinPQ. The new median is the average of the highest priority element of MinPQ and MaxPQ respectively.
Case 3: If size(MinPQ) = size(MaxPQ) + 1, then remove the highest priority element of MinPQ and insert it into MaxPQ. Insert the new element into MinPQ. The new median is the average of the highest priority element of MinPQ and MaxPQ respectively.
Similarly, think what would happen when we insert the new element into MaxPQ.
Notes:
Place an order in 3 easy steps. Takes less than 5 mins.