Spidering Hacks: 100 Industrial-Strength Tips & Tools

Valutazione media 3,68
( su 44 valutazioni fornite da GoodReads )
 
9780596005771: Spidering Hacks: 100 Industrial-Strength Tips & Tools

The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren't enough. If you've ever wanted your data in a different form than it's presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you.Spidering Hacks takes you to the next level in Internet data retrieval--beyond search engines--by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You'll no longer feel constrained by the way host sites think you want to see their data presented--you'll learn how to scrape and repurpose raw data so you can view in a way that's meaningful to you.Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You'll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you've gone too far: what's acceptable and unacceptable). Next, you'll collect media files and data from databases. Then you'll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you'll be able to:

  • Aggregate and associate data from disparate locations, then store and manipulate the data as you like
  • Gain a competitive edge in business by knowing when competitors' products are on sale, and comparing sales ranks and product placement on e-commerce sites
  • Integrate third-party data into your own applications or web sites
  • Make your own site easier to scrape and more usable to others
  • Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day
Like the other books in O'Reilly's popular Hacks series, Spidering Hacks brings you 100 industrial-strength tips and tools from the experts to help you master this technology. If you're interested in data retrieval of any type, this book provides a wealth of data for finding a wealth of data.

Le informazioni nella sezione "Riassunto" possono far riferimento a edizioni diverse di questo titolo.

L'autore:

Kevin Hemenway, coauthor of Mac OS X Hacks, is better known as Morbus Iff, the creator of disobey.com, which bills itself as "content for the discontented." Publisher and developer of more home cooking than you could ever imagine, he'd love to give you a Fry Pan of Intellect upside the head. Politely, of course. And with love.

Tara Calishain is the creator of the site, ResearchBuzz. She is an expert on Internet search engines and how they can be used effectively in business situations.

Contenuti:

Credits; About the Authors; Contributors; Preface; Why Spidering Hacks?; How This Book Is Organized; How to Use This Book; Conventions Used in This Book; How to Contact Us; Got a Hack?; Chapter 1: Walking Softly; 1.1 Hacks #1-7; 1 A Crash Course in Spidering and Scraping; 2 Best Practices for You and Your Spider; 3 Anatomy of an HTML Page; 4 Registering Your Spider; 5 Preempting Discovery; 6 Keeping Your Spider Out of Sticky Situations; 7 Finding the Patterns of Identifiers; Chapter 2: Assembling a Toolbox; 2.1 Hacks #8-32; 2.2 Perl Modules; 2.3 Resources You May Find Helpful; 8 Installing Perl Modules; 9 Simply Fetching with LWP::Simple; 10 More Involved Requests with LWP::UserAgent; 11 Adding HTTP Headers to Your Request; 12 Posting Form Data with LWP; 13 Authentication, Cookies, and Proxies; 14 Handling Relative and Absolute URLs; 15 Secured Access and Browser Attributes; 16 Respecting Your Scrapee's Bandwidth; 17 Respecting robots.txt; 18 Adding Progress Bars to Your Scripts; 19 Scraping with HTML::TreeBuilder; 20 Parsing with HTML::TokeParser; 21 WWW::Mechanize 101; 22 Scraping with WWW::Mechanize; 23 In Praise of Regular Expressions; 24 Painless RSS with Template::Extract; 25 A Quick Introduction to XPath; 26 Downloading with curl and wget; 27 More Advanced wget Techniques; 28 Using Pipes to Chain Commands; 29 Running Multiple Utilities at Once; 30 Utilizing the Web Scraping Proxy; 31 Being Warned When Things Go Wrong; 32 Being Adaptive to Site Redesigns; Chapter 3: Collecting Media Files; 3.1 Hacks #33-42; 33 Detective Case Study: Newgrounds; 34 Detective Case Study: iFilm; 35 Downloading Movies from the Library of Congress; 36 Downloading Images from Webshots; 37 Downloading Comics with dailystrips; 38 Archiving Your Favorite Webcams; 39 News Wallpaper for Your Site; 40 Saving Only POP3 Email Attachments; 41 Downloading MP3s from a Playlist; 42 Downloading from Usenet with nget; Chapter 4: Gleaning Data from Databases; 4.1 Hacks #43-89; 43 Archiving Yahoo! Groups Messages with yahoo2mbox; 44 Archiving Yahoo! Groups Messages with WWW::Yahoo::Groups; 45 Gleaning Buzz from Yahoo!; 46 Spidering the Yahoo! Catalog; 47 Tracking Additions to Yahoo!; 48 Scattersearch with Yahoo! and Google; 49 Yahoo! Directory Mindshare in Google; 50 Weblog-Free Google Results; 51 Spidering, Google, and Multiple Domains; 52 Scraping Amazon.com Product Reviews; 53 Receive an Email Alert for Newly Added Amazon.com Reviews; 54 Scraping Amazon.com Customer Advice; 55 Publishing Amazon.com Associates Statistics; 56 Sorting Amazon.com Recommendations by Rating; 57 Related Amazon.com Products with Alexa; 58 Scraping Alexa's Competitive Data with Java; 59 Finding Album Information with FreeDB and Amazon.com; 60 Expanding Your Musical Tastes; 61 Saving Daily Horoscopes to Your iPod; 62 Graphing Data with RRDTOOL; 63 Stocking Up on Financial Quotes; 64 Super Author Searching; 65 Mapping O'Reilly Best Sellers to Library Popularity; 66 Using All Consuming to Get Book Lists; 67 Tracking Packages with FedEx; 68 Checking Blogs for New Comments; 69 Aggregating RSS and Posting Changes; 70 Using the Link Cosmos of Technorati; 71 Finding Related RSS Feeds; 72 Automatically Finding Blogs of Interest; 73 Scraping TV Listings; 74 What's Your Visitor's Weather Like?; 75 Trendspotting with Geotargeting; 76 Getting the Best Travel Route by Train; 77 Geographic Distance and Back Again; 78 Super Word Lookup; 79 Word Associations with Lexical Freenet; 80 Reformatting Bugtraq Reports; 81 Keeping Tabs on the Web via Email; 82 Publish IE's Favorites to Your Web Site; 83 Spidering GameStop.com Game Prices; 84 Bargain Hunting with PHP; 85 Aggregating Multiple Search Engine Results; 86 Robot Karaoke; 87 Searching the Better Business Bureau; 88 Searching for Health Inspections; 89 Filtering for the Naughties; Chapter 5: Maintaining Your Collections; 5.1 Hacks #90-93; 90 Using cron to Automate Tasks; 91 Scheduling Tasks Without cron; 92 Mirroring Web Sites with wget and rsync; 93 Accumulating Search Results Over Time; Chapter 6: Giving Back to the World; 6.1 Hacks #94-100; 94 Using XML::RSS to Repurpose Data; 95 Placing RSS Headlines on Your Site; 96 Making Your Resources Scrapable with Regular Expressions; 97 Making Your Resources Scrapable with a REST Interface; 98 Making Your Resources Scrapable with XML-RPC; 99 Creating an IM Interface; 100 Going Beyond the Book; Colophon;

Le informazioni nella sezione "Su questo libro" possono far riferimento a edizioni diverse di questo titolo.

I migliori risultati di ricerca su AbeBooks

1.

Hemenway, Kevin; Calishain, Tara
Editore: O'Reilly Media
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi PAPERBACK Quantità: 1
Da
Your Online Bookstore
(Houston, TX, U.S.A.)
Valutazione libreria
[?]

Descrizione libro O'Reilly Media. PAPERBACK. Condizione libro: New. 0596005776 Ships promptly from Texas. Codice libro della libreria CUD1339ANLC051216H0216A

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 7,23
Convertire valuta

Aggiungere al carrello

Spese di spedizione: GRATIS
In U.S.A.
Destinazione, tempi e costi
Edizione Internazionale
Edizione Internazionale

2.

Hemenway, Kevin; Calishain, Tara
Editore: O'Reilly Media
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi Brossura Quantità: > 20
Edizione Internazionale
Da
Sunshine Book Store
(Wilmington, DE, U.S.A.)
Valutazione libreria
[?]

Descrizione libro O'Reilly Media. Condizione libro: New. 0596005776 This is an International Edition. Brand New, Paperback, Delivery within 6-14 business days, Similar Contents as U.S Edition, ISBN and Cover design may differ, printed in Black & White. Choose Expedited shipping for delivery within 3-8 business days. We do not ship to PO Box, APO , FPO Address. In some instances, subjects such as Management, Accounting, Finance may have different end chapter case studies and exercises. International Edition Textbooks may bear a label "Not for sale in the U.S. or Canada" and "Content may different from U.S. Edition" - printed only to discourage U.S. students from obtaining an affordable copy. The U.S. Supreme Court has asserted your right to purchase international editions, and ruled on this issue. Access code/CD is not provided with these editions , unless specified. We may ship the books from multiple warehouses across the globe, including India depending upon the availability of inventory storage. Customer satisfaction guaranteed. Codice libro della libreria NU9780596005771

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 13,01
Convertire valuta

Aggiungere al carrello

Spese di spedizione: GRATIS
In U.S.A.
Destinazione, tempi e costi

3.

Morbus Iff, Kevin Hemenway and Tara Calishain
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi Quantità: 1
Da
Castle Rock
(Pittsford, NY, U.S.A.)
Valutazione libreria
[?]

Descrizione libro Condizione libro: Brand New. Book Condition: Brand New. Codice libro della libreria 97805960057711.0

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 14,27
Convertire valuta

Aggiungere al carrello

Spese di spedizione: EUR 3,70
In U.S.A.
Destinazione, tempi e costi

4.

Kevin Hemenway, Tara Calishain
Editore: O'Reilly Media (2003)
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi Paperback Prima edizione Quantità: 1
Da
Ergodebooks
(RICHMOND, TX, U.S.A.)
Valutazione libreria
[?]

Descrizione libro O'Reilly Media, 2003. Paperback. Condizione libro: New. 1st. Codice libro della libreria DADAX0596005776

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 14,57
Convertire valuta

Aggiungere al carrello

Spese di spedizione: EUR 3,70
In U.S.A.
Destinazione, tempi e costi

5.

Hemenway, Kevin; Calishain, Tara
Editore: O'Reilly Media
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi PAPERBACK Quantità: > 20
Da
Mediaoutlet12345
(Springfield, VA, U.S.A.)
Valutazione libreria
[?]

Descrizione libro O'Reilly Media. PAPERBACK. Condizione libro: New. 0596005776 *BRAND NEW* Ships Same Day or Next!. Codice libro della libreria SWATI2122347173

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 14,61
Convertire valuta

Aggiungere al carrello

Spese di spedizione: EUR 3,70
In U.S.A.
Destinazione, tempi e costi

6.

Iff, Morbus
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi Paperback Quantità: > 20
Print on Demand
Da
BargainBookStores
(Grand Rapids, MI, U.S.A.)
Valutazione libreria
[?]

Descrizione libro Paperback. Condizione libro: New. This item is printed on demand. Item doesn't include CD/DVD. Codice libro della libreria 975547

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 15,11
Convertire valuta

Aggiungere al carrello

Spese di spedizione: EUR 3,70
In U.S.A.
Destinazione, tempi e costi

7.

Kevin Hemenway, Tara Calishain
Editore: O Reilly Media, Inc, USA, United States (2003)
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi Paperback Quantità: 1
Da
The Book Depository US
(London, Regno Unito)
Valutazione libreria
[?]

Descrizione libro O Reilly Media, Inc, USA, United States, 2003. Paperback. Condizione libro: New. 226 x 152 mm. Language: English . Brand New Book. The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren t enough. If you ve ever wanted your data in a different form than it s presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you. Spidering Hacks takes you to the next level in Internet data retrieval--beyond search engines--by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You ll no longer feel constrained by the way host sites think you want to see their data presented--you ll learn how to scrape and repurpose raw data so you can view in a way that s meaningful to you. Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You ll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you ve gone too far: what s acceptable and unacceptable). Next, you ll collect media files and data from databases. Then you ll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you ll be able to: Aggregate and associate data from disparate locations, then store and manipulate the data as you like Gain a competitive edge in business by knowing when competitors products are on sale, and comparing sales ranks and product placement on e-commerce sites Integrate third-party data into your own applications or web sites Make your own site easier to scrape and more usable to others Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day Like the other books in O Reilly s popular Hacks series, Spidering Hacks brings you 100 industrial-strength tips and tools from the experts to help you master this technology. If you re interested in data retrieval of any type, this book provides a wealth of data for finding a wealth of data. Codice libro della libreria AAH9780596005771

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 19,82
Convertire valuta

Aggiungere al carrello

Spese di spedizione: GRATIS
Da: Regno Unito a: U.S.A.
Destinazione, tempi e costi

8.

Kevin Hemenway, Tara Calishain
Editore: O Reilly Media, Inc, USA, United States (2003)
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi Paperback Quantità: 1
Da
The Book Depository
(London, Regno Unito)
Valutazione libreria
[?]

Descrizione libro O Reilly Media, Inc, USA, United States, 2003. Paperback. Condizione libro: New. 226 x 152 mm. Language: English . Brand New Book. The Internet, with its profusion of information, has made us hungry for ever more, ever better data. Out of necessity, many of us have become pretty adept with search engine queries, but there are times when even the most powerful search engines aren t enough. If you ve ever wanted your data in a different form than it s presented, or wanted to collect data from several sites and see it side-by-side without the constraints of a browser, then Spidering Hacks is for you. Spidering Hacks takes you to the next level in Internet data retrieval--beyond search engines--by showing you how to create spiders and bots to retrieve information from your favorite sites and data sources. You ll no longer feel constrained by the way host sites think you want to see their data presented--you ll learn how to scrape and repurpose raw data so you can view in a way that s meaningful to you. Written for developers, researchers, technical assistants, librarians, and power users, Spidering Hacks provides expert tips on spidering and scraping methodologies. You ll begin with a crash course in spidering concepts, tools (Perl, LWP, out-of-the-box utilities), and ethics (how to know when you ve gone too far: what s acceptable and unacceptable). Next, you ll collect media files and data from databases. Then you ll learn how to interpret and understand the data, repurpose it for use in other applications, and even build authorized interfaces to integrate the data into your own content. By the time you finish Spidering Hacks, you ll be able to: Aggregate and associate data from disparate locations, then store and manipulate the data as you like Gain a competitive edge in business by knowing when competitors products are on sale, and comparing sales ranks and product placement on e-commerce sites Integrate third-party data into your own applications or web sites Make your own site easier to scrape and more usable to others Keep up-to-date with your favorite comics strips, news stories, stock tips, and more without visiting the site every day Like the other books in O Reilly s popular Hacks series, Spidering Hacks brings you 100 industrial-strength tips and tools from the experts to help you master this technology. If you re interested in data retrieval of any type, this book provides a wealth of data for finding a wealth of data. Codice libro della libreria AAH9780596005771

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 19,82
Convertire valuta

Aggiungere al carrello

Spese di spedizione: GRATIS
Da: Regno Unito a: U.S.A.
Destinazione, tempi e costi
Edizione Internazionale
Edizione Internazionale

9.

Tara Calishain Joint
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi Soft cover Quantità: > 20
Edizione Internazionale
Da
University Bookstore
(DELHI, DELHI, India)
Valutazione libreria
[?]

Descrizione libro 2003. Soft cover. Condizione libro: New. This book is BRAND NEW Soft cover International edition with black and white printing. ISBN number & cover page may be different but contents identical to the US edition word by word. Book is in English language. Codice libro della libreria UN-SHRO-457

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 10,17
Convertire valuta

Aggiungere al carrello

Spese di spedizione: EUR 10,00
Da: India a: U.S.A.
Destinazione, tempi e costi

10.

Kevin Hemenway
Editore: O'Reilly Media (2017)
ISBN 10: 0596005776 ISBN 13: 9780596005771
Nuovi Paperback Quantità: 20
Print on Demand
Da
Murray Media
(North Miami Beach, FL, U.S.A.)
Valutazione libreria
[?]

Descrizione libro O'Reilly Media, 2017. Paperback. Condizione libro: New. This item is printed on demand. Codice libro della libreria 0596005776

Maggiori informazioni su questa libreria | Fare una domanda alla libreria

Compra nuovo
EUR 20,47
Convertire valuta

Aggiungere al carrello

Spese di spedizione: EUR 2,77
In U.S.A.
Destinazione, tempi e costi

Vedi altre copie di questo libro

Vedi tutti i risultati per questo libro