We had assignments to print “*”s in different formation in undergraduate class, which I never liked as they were pointless. Now I got a somewhat justifiable application, plot histogram in terminal. In the last post Generating random numbers from Normal distribution in C I posted the C code to generate random numbers from the Normal distribution using the Polar method. In this post I am posting a simple code to plot the histogram of generated random numbers from this or any other distribution. Let me first post the code and then explain what is going on.
#include <stdio.h> #include <math.h> #include <stdlib.h> #include <time.h> #include <limits.h> #define MAX_WIDTH 240 double randn (double mu, double sigma); /* Arguments: ./a.out mu sigma samples bins [min] [max] */ int main (int argc, char *argv[]) { int samples, bins; double *range, mu = 0.0, std = 1.0, min, max, current_random_no; int i, j, *bin_count, count = 0, bin, max_bin_count = INT_MIN, flag = 0; if (argc < 5) { printf ("Usage: %s mu sigma samples bins [min] [max]\n", argv[0]); return 0; } srand (time (NULL)); mu = atof (argv[1]); std = atof (argv[2]); samples = atoi (argv[3]); bins = atoi (argv[4]); /* Allocate memory */ bin_count = calloc (sizeof (int), bins); range = malloc (sizeof (double) * (bins + 1)); /* Automatically set a min and max range */ min = -(abs (mu) + 5 * std); max = abs (mu) + 5 * std; /* If min and max is specified, use them */ if (argc >= 6) { min = atof (argv[5]); } if (argc >= 7) { max = atof (argv[6]); } /* Generate the bin ranges */ range[0] = min; for (i = 1; i <= bins; i++) { range[i] = range[0] + (max - min) * (1.0 / bins) * i; } for (count = 0; count < samples; count++) { /* Generate random numbers from a distribution, Normal for this */ current_random_no = randn (mu, std); /* Check which range the current sample falls */ for (i = 0, bin = -1; i < bins; i++) { if ((current_random_no >= range[i]) && (current_random_no < range[i + 1])) { bin = i; break; } } /* In case we have the number exactly equalto range[bins] */ if (current_random_no == range[i]) { bin = i - 1; } if ((bin <= bins) && (bin >= 0)) { bin_count[bin]++; } /* Not all the random numbers were generated within the [min,max] range. */ else { flag = 1; } } /* Find the max value of the bin counters. This is used to scale the histogram */ for (i = 0; i < bins; i++) { if (bin_count[i] > max_bin_count) { max_bin_count = bin_count[i]; } } /* Print histogram and ranges */ printf ("[bin_low, bin_high), count\n"); for (i = 0; i < bins; i++) { printf ("[%+7.1f,%+7.1f), %6d:", range[i], range[i + 1], bin_count[i]); for (j = 0; j < (bin_count[i] / (double) max_bin_count) * (MAX_WIDTH / std); j++) { printf ("*"); } printf ("\n"); } /* Not all random numbers were in the range of the histogram */ if (flag == 1) { printf ("\nWARNING: Random numbers generated outside range (%f, %f). These will not be shown in the above histogram.\n", min, max); } free (bin_count); free (range); printf ("\n"); return 0; } double randn (double mu, double sigma) { double U1, U2, W, mult; static double X1, X2; static int call = 0; if (call) { call = !call; return (mu + sigma * (double) X2); } do { U1 = -1 + ((double) rand () / RAND_MAX) * 2; U2 = -1 + ((double) rand () / RAND_MAX) * 2; W = pow (U1, 2) + pow (U2, 2); } while (W >= 1 || W == 0); mult = sqrt ((-2 * log (W)) / W); X1 = U1 * mult; X2 = U2 * mult; call = !call; return (mu + sigma * (double) X1); }
The code to plot the histogram is implemented in the main function and I have used the randn function, which returns random numbers from the normal distribution, to generate the data for which I will plot histograms. The code for the randn function is from the last post. The code first gets the mean mu, standard deviation sigma, which are to be used with the randn function, number of samples and the number of bins from command line argument. Optionally requires the min and max of the range of the histogram in that order.
The basic outline of the code is first to get a [min,max] range, which is either computed with the mu and sigma provided or the min and max is explicitely stated through command line. Once we get the [min,max] range, we calculate the range of each bin using the code depending on the number of bins as follows
range[0] = min; for (i = 1; i <= bins; i++) { range[i] = range[0] + (max - min) * multiplier * i; }
The starting value of the first bin is the min value. We know the number of bins, therefore we know the width of each bin: the value 1.0/bin. The above code finds the range of value (max - min) and finds distance of the current bin’s end, relative to the first bin (the operation (max - min) * multiplier * i), and adds it up with the first bin’s start value to get the ending value of the ith bin.
The next loop generates random numbers and checks in which bin it falls and increments its bin counter. The variable samples holds the number of samples to be taken. Inside this loop first a random number is generated, current_random_no. This code generates the random number from Normal distribution with mean mu and standard deviation sigma. The inner for loop will then check which bin does the current_random_no falls in, and saves the bin number in bin. The ith bin has range range[i] to range[i+1]. If a value x falls in the ith bin then range[i] <= x < range[i+1] holds. For the last bin range[bin-1] <= x <= range[bin] is true, which is handled specially. In case the generated random nummber is outside the range and does not fall inside a bin, the number is not counted and the flag is set. This can happen when incorrect range was set manually. The bin_count array holds the count of the ith bin in location bin_count[i]. Therefore when we get the bin index of the current_random_no we just increment bin_count[bin].
The next for loop finds the maximum bin count. This is used to scale the histogram. Now we are ready to plot the histogram. Each line will contain a bin. First we will print the range of the bin, then bin count, and then the scaled bar. The loop which prints the histogram is pretty straightforward. The loop condition (bin_count[i] / (double) max_bin_count) * (MAX_WIDTH / std) is just a way to scale the bar such that the drawn bar with the “*”s does not wrap line. The MAX_WIDTH macro is the number of max characters for the longest bar in the histogram which does not line wrap. Depending on the width of the terminal it needs to be set for this code.
We set the value of flag when a generated random number was not in range. A warning message is shown in such a case. Then we free up the stuffs and we are done.
Here are some executions and outputs. I have decreased the font size of the plots for better viewing.
First I have made the character size in my terminal and it has now 266 characters per line. The MAX_WIDTH is set to 240. The below plot was done using mu = 0, sigma = 1, samples = 50000, bins = 70 and the min and max were not provided and was calculated automatically as described above.
[bin_low, bin_high), count [ -5.0, -4.9), 0: [ -4.9, -4.7), 0: [ -4.7, -4.6), 0: [ -4.6, -4.4), 0: [ -4.4, -4.3), 0: [ -4.3, -4.1), 1:* [ -4.1, -4.0), 2:* [ -4.0, -3.9), 0: [ -3.9, -3.7), 5:* [ -3.7, -3.6), 4:* [ -3.6, -3.4), 4:* [ -3.4, -3.3), 10:* [ -3.3, -3.1), 17:** [ -3.1, -3.0), 22:** [ -3.0, -2.9), 27:*** [ -2.9, -2.7), 47:**** [ -2.7, -2.6), 80:******* [ -2.6, -2.4), 137:************ [ -2.4, -2.3), 156:************** [ -2.3, -2.1), 268:*********************** [ -2.1, -2.0), 359:******************************* [ -2.0, -1.9), 441:************************************** [ -1.9, -1.7), 586:************************************************** [ -1.7, -1.6), 758:***************************************************************** [ -1.6, -1.4), 970:*********************************************************************************** [ -1.4, -1.3), 1157:************************************************************************************************** [ -1.3, -1.1), 1351:******************************************************************************************************************* [ -1.1, -1.0), 1605:**************************************************************************************************************************************** [ -1.0, -0.9), 1874:*************************************************************************************************************************************************************** [ -0.9, -0.7), 2124:************************************************************************************************************************************************************************************ [ -0.7, -0.6), 2305:**************************************************************************************************************************************************************************************************** [ -0.6, -0.4), 2579:*************************************************************************************************************************************************************************************************************************** [ -0.4, -0.3), 2735:**************************************************************************************************************************************************************************************************************************************** [ -0.3, -0.1), 2836:************************************************************************************************************************************************************************************************************************************************ [ -0.1, +0.0), 2711:************************************************************************************************************************************************************************************************************************************** [ +0.0, +0.1), 2716:************************************************************************************************************************************************************************************************************************************** [ +0.1, +0.3), 2777:******************************************************************************************************************************************************************************************************************************************** [ +0.3, +0.4), 2778:******************************************************************************************************************************************************************************************************************************************** [ +0.4, +0.6), 2391:*********************************************************************************************************************************************************************************************************** [ +0.6, +0.7), 2303:*************************************************************************************************************************************************************************************************** [ +0.7, +0.9), 2037:***************************************************************************************************************************************************************************** [ +0.9, +1.0), 1904:****************************************************************************************************************************************************************** [ +1.0, +1.1), 1587:*************************************************************************************************************************************** [ +1.1, +1.3), 1347:****************************************************************************************************************** [ +1.3, +1.4), 1116:*********************************************************************************************** [ +1.4, +1.6), 926:******************************************************************************* [ +1.6, +1.7), 726:************************************************************** [ +1.7, +1.9), 592:*************************************************** [ +1.9, +2.0), 477:***************************************** [ +2.0, +2.1), 318:*************************** [ +2.1, +2.3), 261:*********************** [ +2.3, +2.4), 174:*************** [ +2.4, +2.6), 137:************ [ +2.6, +2.7), 95:********* [ +2.7, +2.9), 62:****** [ +2.9, +3.0), 48:***** [ +3.0, +3.1), 30:*** [ +3.1, +3.3), 14:** [ +3.3, +3.4), 6:* [ +3.4, +3.6), 3:* [ +3.6, +3.7), 1:* [ +3.7, +3.9), 2:* [ +3.9, +4.0), 1:* [ +4.0, +4.1), 0: [ +4.1, +4.3), 0: [ +4.3, +4.4), 0: [ +4.4, +4.6), 0: [ +4.6, +4.7), 0: [ +4.7, +4.9), 0: [ +4.9, +5.0), 0:
The below was done using mu = 10, sigma = 4, samples = 50000, bins = 70 and min and max automatic. Note that the range is now larger than the previous plot, which means that the distribution has spread out, which basically is indicated with the bar lengths are now lower. The distribution is also skewed downwards with the mean at around 10.
[bin_low, bin_high), count [ -30.0, -29.1), 0: [ -29.1, -28.3), 0: [ -28.3, -27.4), 0: [ -27.4, -26.6), 0: [ -26.6, -25.7), 0: [ -25.7, -24.9), 0: [ -24.9, -24.0), 0: [ -24.0, -23.1), 0: [ -23.1, -22.3), 0: [ -22.3, -21.4), 0: [ -21.4, -20.6), 0: [ -20.6, -19.7), 0: [ -19.7, -18.9), 0: [ -18.9, -18.0), 0: [ -18.0, -17.1), 0: [ -17.1, -16.3), 0: [ -16.3, -15.4), 0: [ -15.4, -14.6), 0: [ -14.6, -13.7), 0: [ -13.7, -12.9), 0: [ -12.9, -12.0), 0: [ -12.0, -11.1), 0: [ -11.1, -10.3), 0: [ -10.3, -9.4), 0: [ -9.4, -8.6), 0: [ -8.6, -7.7), 0: [ -7.7, -6.9), 0: [ -6.9, -6.0), 2:* [ -6.0, -5.1), 0: [ -5.1, -4.3), 5:* [ -4.3, -3.4), 20:* [ -3.4, -2.6), 20:* [ -2.6, -1.7), 40:* [ -1.7, -0.9), 76:** [ -0.9, +0.0), 157:*** [ +0.0, +0.9), 256:**** [ +0.9, +1.7), 446:******* [ +1.7, +2.6), 633:********* [ +2.6, +3.4), 925:************** [ +3.4, +4.3), 1326:******************* [ +4.3, +5.1), 1745:************************* [ +5.1, +6.0), 2354:********************************** [ +6.0, +6.9), 2812:**************************************** [ +6.9, +7.7), 3413:************************************************* [ +7.7, +8.6), 3779:****************************************************** [ +8.6, +9.4), 4075:********************************************************** [ +9.4, +10.3), 4237:************************************************************ [ +10.3, +11.1), 4206:************************************************************ [ +11.1, +12.0), 3891:******************************************************** [ +12.0, +12.9), 3563:*************************************************** [ +12.9, +13.7), 3105:******************************************** [ +13.7, +14.6), 2441:*********************************** [ +14.6, +15.4), 1984:***************************** [ +15.4, +16.3), 1495:********************** [ +16.3, +17.1), 1084:**************** [ +17.1, +18.0), 764:*********** [ +18.0, +18.9), 458:******* [ +18.9, +19.7), 327:***** [ +19.7, +20.6), 165:*** [ +20.6, +21.4), 91:** [ +21.4, +22.3), 50:* [ +22.3, +23.1), 34:* [ +23.1, +24.0), 11:* [ +24.0, +24.9), 4:* [ +24.9, +25.7), 3:* [ +25.7, +26.6), 1:* [ +26.6, +27.4), 2:* [ +27.4, +28.3), 0: [ +28.3, +29.1), 0: [ +29.1, +30.0), 0:
The same plot as above using manual range [-5, +5] is shown below, which is in the same scale of the first plot. This shows the distribution is spread out and shifted downwards. Also not all the generated random numbers were in range,
therefore the plot was cut off and the warning is shown
[bin_low, bin_high), count [ -5.0, -4.9), 2:* [ -4.9, -4.7), 0: [ -4.7, -4.6), 1:* [ -4.6, -4.4), 0: [ -4.4, -4.3), 0: [ -4.3, -4.1), 0: [ -4.1, -4.0), 1:* [ -4.0, -3.9), 0: [ -3.9, -3.7), 2:* [ -3.7, -3.6), 2:* [ -3.6, -3.4), 2:* [ -3.4, -3.3), 4:* [ -3.3, -3.1), 2:* [ -3.1, -3.0), 6:** [ -3.0, -2.9), 6:** [ -2.9, -2.7), 2:* [ -2.7, -2.6), 7:** [ -2.6, -2.4), 3:* [ -2.4, -2.3), 6:** [ -2.3, -2.1), 7:** [ -2.1, -2.0), 6:** [ -2.0, -1.9), 10:** [ -1.9, -1.7), 7:** [ -1.7, -1.6), 12:*** [ -1.6, -1.4), 14:*** [ -1.4, -1.3), 12:*** [ -1.3, -1.1), 21:**** [ -1.1, -1.0), 21:**** [ -1.0, -0.9), 15:*** [ -0.9, -0.7), 17:**** [ -0.7, -0.6), 20:**** [ -0.6, -0.4), 21:**** [ -0.4, -0.3), 25:***** [ -0.3, -0.1), 35:******* [ -0.1, +0.0), 24:***** [ +0.0, +0.1), 31:****** [ +0.1, +0.3), 35:******* [ +0.3, +0.4), 41:******** [ +0.4, +0.6), 47:********* [ +0.6, +0.7), 30:****** [ +0.7, +0.9), 44:******** [ +0.9, +1.0), 47:********* [ +1.0, +1.1), 60:*********** [ +1.1, +1.3), 56:*********** [ +1.3, +1.4), 56:*********** [ +1.4, +1.6), 64:************ [ +1.6, +1.7), 86:**************** [ +1.7, +1.9), 78:************** [ +1.9, +2.0), 101:******************* [ +2.0, +2.1), 111:******************** [ +2.1, +2.3), 103:******************* [ +2.3, +2.4), 118:********************** [ +2.4, +2.6), 129:************************ [ +2.6, +2.7), 145:************************** [ +2.7, +2.9), 141:************************** [ +2.9, +3.0), 166:****************************** [ +3.0, +3.1), 150:*************************** [ +3.1, +3.3), 176:******************************** [ +3.3, +3.4), 179:********************************* [ +3.4, +3.6), 205:************************************* [ +3.6, +3.7), 198:************************************ [ +3.7, +3.9), 225:***************************************** [ +3.9, +4.0), 231:****************************************** [ +4.0, +4.1), 228:***************************************** [ +4.1, +4.3), 251:********************************************* [ +4.3, +4.4), 310:******************************************************** [ +4.4, +4.6), 275:************************************************** [ +4.6, +4.7), 305:******************************************************* [ +4.7, +4.9), 284:*************************************************** [ +4.9, +5.0), 335:************************************************************ WARNING: Random numbers generated outside range (-5.000000, 5.000000). These will not be shown in the above histogram.
This code can be modified to plot other distributions or any data sets. Before plotting make sure to adjust the MAX_WIDTH macro so that the histogram bars does not wrap lines.
Hello, I was wondering if you could help here: http://stackoverflow.com/questions/26213431/how-to-produce-normal-distribution-without-duplicates-in-c/26213873?noredirect=1#comment41324572_26213873
Thanks in advance!
Hello, I was wondering if you could help me out with this specific problem regarding normal distribution but WITHOUT duplicate values.
http://stackoverflow.com/questions/26213431/how-to-produce-normal-distribution-without-duplicates-in-c/26213873?noredirect=1#comment41324572_26213873
Thanks in advance!