[CSC 204] Programming project
Andrew Pounds
pounds at sandbox.mercer.edu
Thu Nov 27 11:07:15 EST 2014
On 11/27/2014 09:13 AM, Cade Buhmann wrote:
> I have a question about the Word distribution, line numbers : α, Is
> that the individual lines that the word occurs on or the total number
> of different lines that the word occurs on?
Maybe an example will help. Referring back to the equation in the PDF
program description will also help.
In the gettysburg.txt file the word "are" appears three times. It
appears on lines 3, 5, and 6.
lets call these $\alpha_0$, $\alpha_1$, and $\alpha_2$. If I sum these
up I get 14.
I then divide this number by the total number of occurrences ($\gamma$,
which is 3), and get 4.6667
I then divide this by the total number of lines in the document ($N$,
which is 25). That gives me
the number .18667. Multiply that by 100 to get the final word
distrubution index (18.667).
That is now telling you that those words, on average, fell within the
first 18.667% of the document.
Does that clear things up?
--
Andrew J. Pounds, Ph.D. (pounds at theochem.mercer.edu)
Professor of Chemistry and Computer Science
Mercer University, Macon, GA 31207 (478) 301-5627
More information about the csc204
mailing list