Talk:Random sentence from book: Difference between revisions
Content added Content deleted
(→Markov chain?: Weight.) |
|||
Line 7: | Line 7: | ||
:In this task words that are more likely to follow words, should be more likely to occur next, These weights need accumulating and applying in this task whereas that task does not require it. |
:In this task words that are more likely to follow words, should be more likely to occur next, These weights need accumulating and applying in this task whereas that task does not require it. |
||
: --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 10:56, 15 February 2021 (UTC) |
: --[[User:Paddy3118|Paddy3118]] ([[User talk:Paddy3118|talk]]) 10:56, 15 February 2021 (UTC) |
||
==Some stats== |
|||
I was thinking of extending the Python example to weight words following two, three, ... other words but after a while, I thought you would constrain things so that you only generated sentences that are actually in the book! <br> |
|||
I decided instead to find out how many of the generated random sentences exist in the book for the current Python code by appending this snippet: |
|||
<lang python>#%% Sentence counts |
|||
def gen_simple_sentence(word2next=word2next, word2next2=word2next2) -> str: |
|||
"No tidying up of generated word sequence of sentence" |
|||
s = ['.'] |
|||
s += random.choices(*zip(*word2next[s[-1]].items())) |
|||
while True: |
|||
s += random.choices(*zip(*word2next2[(s[-2], s[-1])].items())) |
|||
if s[-1] in sentence_ending: |
|||
break |
|||
return ' '.join(s[1:]) |
|||
if 1: |
|||
N = 1_000 |
|||
words = ['.'] + txt_with_pauses_and_endings.strip().split() |
|||
sent_count = sum(words.count(punct) for punct in sentence_ending) - 1 |
|||
pause_count = sum(words.count(punct) for punct in sentence_pausing) |
|||
avg_words_in_sent = (len(words) - 1 - pause_count |
|||
- words.count('re') - words.count('s')) / sent_count |
|||
print(f'\nSentences in the book have ~{avg_words_in_sent:.1f}, words') |
|||
book = ' '.join(words) # Now sanitised |
|||
copies = sum(gen_simple_sentence() in book for _ in range(N)) |
|||
print(f"Generating {N:_} random sentences produced {copies:_}" |
|||
" that are actually in the book")</lang> |
|||
;The average sentence length is approx. 19 words and around 15% of the generated sentences actually occur in the book. |