[CSC 204] Programming project

Thu Nov 27 11:07:15 EST 2014

On 11/27/2014 09:13 AM, Cade Buhmann wrote:
> I have a question about the Word distribution, line numbers : α, Is 
> that the individual lines that the word occurs on or the total number 
> of different lines that the word occurs on?
Maybe an example will help.  Referring back to the equation in the PDF 
program description will also help.

In the gettysburg.txt file the word "are" appears three times.  It 
appears on lines 3, 5, and 6.

lets call these $\alpha_0$, $\alpha_1$, and $\alpha_2$.  If I sum these 
up I get 14.

I then divide this number by the total number of occurrences ($\gamma$, 
which is 3), and get 4.6667

I then divide this by the total number of lines in the document ($N$, 
which is 25).  That gives me
the number .18667.   Multiply that by 100 to get the final word 
distrubution index (18.667).

That is now telling you that those words, on average, fell within the 
first 18.667% of the document.

Does that clear things up?

-- 
Andrew J. Pounds, Ph.D.  (pounds at theochem.mercer.edu)
Professor of Chemistry and Computer Science
Mercer University,  Macon, GA 31207   (478) 301-5627