<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body bgcolor="#FFFFFF" text="#000000">
On 11/27/2014 09:13 AM, wrote:
<br>
<blockquote type="cite" style="color: #000000;">I have a question
about the Word distribution, line numbers : α, Is that the
individual lines that the word occurs on or the total number of
different lines that the word occurs on?
<br>
</blockquote>
Maybe an example will help. Referring back to the equation in the
PDF program description will also help.
<br>
<br>
In the gettysburg.txt file the word "are" appears three times. It
appears on lines 3, 5, and 6.
<br>
<br>
lets call these <img style="vertical-align: middle"
src="cid:part1.05060909.01080005@sandbox.mercer.edu"
alt="$\alpha_0$">, <img style="vertical-align: middle"
src="cid:part2.08080108.07080402@sandbox.mercer.edu"
alt="$\alpha_1$">, and <img style="vertical-align: middle"
src="cid:part3.01030408.02010803@sandbox.mercer.edu"
alt="$\alpha_2$">. If I sum these up I get 14.
<br>
<br>
I then divide this number by the total number of occurrences (<img
style="vertical-align: middle"
src="cid:part4.03000004.00060702@sandbox.mercer.edu"
alt="$\gamma$">, which is 3), and get 4.6667
<br>
<br>
I then divide this by the total number of lines in the document (<img
style="vertical-align: middle"
src="cid:part5.01020304.05040001@sandbox.mercer.edu" alt="$N$">,
which is 25). That gives me
<br>
the number .18667. Multiply that by 100 to get the final word
distrubution index (18.667).
<br>
<br>
<br>
That is now telling you that those words, on average, fell within
the first 18.667% of the document.
<br>
<br>
Does that clear things up?
<pre class="moz-signature" cols="72">--
Andrew J. Pounds, Ph.D. (<a class="moz-txt-link-abbreviated" href="mailto:pounds@theochem.mercer.edu">pounds@theochem.mercer.edu</a>)
Professor of Chemistry and Computer Science
Mercer University, Macon, GA 31207 (478) 301-5627
</pre>
</body>
</html>