A project-based approach to learning Python programming for beginners. Intriguing projects teach you how to tackle challenging problems with code.
You've mastered the basics. Now you're ready to explore some of Python's more powerful tools. Real-World Python will show you how.
Through a series of hands-on projects, you'll investigate and solve real-world problems using sophisticated computer vision, machine learning, data analysis, and language processing tools. You'll be introduced to important modules like OpenCV, NumPy, Pandas, NLTK, Bokeh, Beautiful Soup, Requests, HoloViews, Tkinter, turtle, matplotlib, and more. You'll create complete, working programs and think through intriguing projects that show you how to:
Le informazioni nella sezione "Riassunto" possono far riferimento a edizioni diverse di questo titolo.
Lee Vaughan is a programmer, pop culture enthusiast, educator, and author of Impractical Python Projects
(No Starch Press). As a former executive-level scientist at ExxonMobil, he spent decades constructing and reviewing complex computer models, developed and tested software, and trained geoscientists and engineers.
ATTRIBUTING AUTHORSHIP WITH STYLOMETRY
Stylometry is the quantitative study of literary style through computational text analysis. It’s based on the idea that we all have a unique, consistent, and recognizable style to our writing. This includes our vocabulary, our use of punctuation, the average length of our sentences and words, and so on.
A common application of stylometry is authorship attribution. Do you ever wonder if Shakespeare really wrote all his plays? Or if John Lennon or Paul McCartney wrote the song “In My Life”? Could Robert Galbraith, author of A Cuckoo’s Calling, really be J. K. Rowling in disguise? Stylometry can find the answer!
Stylometry has been used to overturn murder convictions and even helped identify and convict the Unabomber in 1996. Other uses include detecting plagiarism and determining the emotional tone behind words, such as in social media posts. Stylometry can even be used to detect signs of mental depression and suicidal tendencies.
In this chapter, you’ll use multiple stylometric techniques to determine whether Sir Arthur Conan Doyle or H. G. Wells wrote the novel The Lost World.
Project #2: The Hound, The War, and The Lost World
Sir Arthur Conan Doyle (1859–1930) is best known for the Sherlock Holmes stories, considered milestones in the field of crime fiction. H. G. Wells (1866–1946) is famous for several groundbreaking science fiction novels including The War of The Worlds, The Time Machine, The Invisible Man, and The Island of Dr. Moreau.
In 1912, the Strand Magazine published The Lost World, a serialized version of a science fiction novel. It told the story of an Amazon basin expedition, led by zoology professor George Edward Challenger, that encountered living dinosaurs and a vicious tribe of ape-like creatures.
Although the author of the novel is known, for this project, let’s pretend it’s in dispute and it’s your job to solve the mystery. Experts have narrowed the field down to two authors, Doyle and Wells. Wells is slightly favored because The Lost World is a work of science fiction, which is his purview. It also includes brutish troglodytes redolent of the morlocks in his 1895 work The Time Machine. Doyle, on the other hand, is known for detective stories and historical fiction.
THE OBJECTIVE
Write a Python program that uses stylometry to determine whether Sir Arthur Conan Doyle or H. G. Wells wrote the novel The Lost World.
THE STRATEGY
The science of natural language processing (NLP) deals with the interactions between the precise and structured language of computers and the nuanced, frequently ambiguous “natural” language used by humans. Example uses for NLP include machine translations, spam detection, comprehension of search engine questions, and predictive text recognition for cell phone users.
The most common NLP tests for authorship analyze the following features of a text:
• Word length A frequency distribution plot of the length of words in a document
• Stop words A frequency distribution plot of stop words (short, noncontextual function words like the, but, and if)
• Parts of speech A frequency distribution plot of words based on their syntactic functions (such as nouns, pronouns, verbs, adverbs, adjectives, and so on)
• Most common words A comparison of the most commonly used words in a text
• Jaccard similarity A statistic used for gauging the similarity and diversity of a sample set
If Doyle and Wells have distinctive writing styles, these five tests should be enough to distinguish between them. We’ll talk about each test in more detail in the coding section.
To capture and analyze each author’s style, you’ll need a representative corpus, or a body of text. For Doyle, use the famous Sherlock Holmes novel The Hound of the Baskervilles, published in 1902. For Wells, use The War of the Worlds, published in 1898. Both these novels contain more than 50,000 words, more than enough for a sound statistical sampling. You’ll then compare each author’s sample to The Lost World to determine how closely the writing styles match.
To perform stylometry, you’ll use the Natural Language Toolkit (NLTK), a popular suite of programs and libraries for working with human language data in Python. It’s free and works on Windows, macOS, and Linux. Created in 2001 as part of a computational linguistics course at the
University of Pennsylvania, NLTK has continued to develop and expand with the help of dozens of contributors.
Le informazioni nella sezione "Su questo libro" possono far riferimento a edizioni diverse di questo titolo.
EUR 13,51 per la spedizione da U.S.A. a Italia
Destinazione, tempi e costiEUR 1,24 per la spedizione da U.S.A. a Italia
Destinazione, tempi e costiDa: Bellwetherbooks, McKeesport, PA, U.S.A.
paperback. Condizione: As New. LIKE NEW!!! Has a red or black remainder mark on bottom/exterior edge of pages. Codice articolo NS-PB-LN-1718500629
Quantità: 2 disponibili
Da: Bookmans, Tucson, AZ, U.S.A.
paperback. Condizione: Good. Satisfaction 100% guaranteed. Codice articolo mon0002576746
Quantità: 1 disponibili
Da: PBShop.store US, Wood Dale, IL, U.S.A.
PAP. Condizione: New. New Book. Shipped from UK. Established seller since 2000. Codice articolo DB-9781718500624
Quantità: 3 disponibili
Da: PBShop.store UK, Fairford, GLOS, Regno Unito
PAP. Condizione: New. New Book. Shipped from UK. Established seller since 2000. Codice articolo DB-9781718500624
Quantità: 3 disponibili
Da: Books Puddle, New York, NY, U.S.A.
Condizione: New. pp. 370. Codice articolo 26376886741
Quantità: 3 disponibili
Da: moluna, Greven, Germania
Condizione: New. Lee Vaughan is a programmer, pop culture enthusiast, educator, and author of Impractical Python Projects(No Starch Press). As a former executive-level scientist at ExxonMobil, he spent decades constructing and reviewing complex computer mo. Codice articolo 377137020
Quantità: 3 disponibili
Da: Kennys Bookshop and Art Galleries Ltd., Galway, GY, Irlanda
Condizione: New. 2020. Paperback. . . . . . Codice articolo V9781718500624
Quantità: 15 disponibili
Da: Speedyhen, London, Regno Unito
Condizione: NEW. Codice articolo NW9781718500624
Quantità: 2 disponibili
Da: THE SAINT BOOKSTORE, Southport, Regno Unito
Paperback / softback. Condizione: New. New copy - Usually dispatched within 4 working days. 526. Codice articolo B9781718500624
Quantità: 3 disponibili
Da: GreatBookPrices, Columbia, MD, U.S.A.
Condizione: New. Codice articolo 40972676-n
Quantità: Più di 20 disponibili