Text Processing in Python
| |||||||||||||||||||||||||||||
|
| |||||||||||||||||||||||||||||
| Sort customer reviews by: | |||||||||||||||||||||||||||||
|
Show All Reviews on Page
Hide All Reviews on Page
| |||||||||||||||||||||||||||||
| Text Processing in Python | |||||||||||||||||||||||||||||
|
Text Processing in Python describes techniques for manipulation of text using the Python programming language. At the broadest level, text processing is simply taking textual information and doing something with it. This might be restructuring or reformatting it, extracting smaller bits of information from it, or performing calculations that depend on the text. Text processing is arguably what most programmers spend most of their time doing. Because Python is clear, expressive, and object-oriented it is a perfect language for doing text processing, even better than Perl. As the amount of data everywhere continues to increase, this is more and more of a challenge for programmers. This book is not a tutorial on Python. It has two other goals: helping the programmer get the job done pragmatically and efficiently; and giving the reader an understanding - both theoretically and conceptually - of why what works works and what doesn't work doesn't work. Mertz provides practical pointers and tips that emphasize efficent, flexible, and maintainable approaches to the textprocessing tasks that working programmers face daily. |
|||||||||||||||||||||||||||||
| Reader Reviews 1 - 16 of 16 | |||||||||||||||||||||||||||||
| Review Date |
Review Rating(5 High) |
Review Helpful to: |
Customer Review | Reviewer Info |
Permanent Link |
||||||||||||||||||||||||
| Reader Reviews Below Sorted by Newest First | |||||||||||||||||||||||||||||
| 12-19-07 | 4 | (NA) |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
This book is interesting, the field it covers is not one with many texts, so it's hard to do comparative analysis.
On it's strengths, this book is probably best suited for programmers that aren't afraid to learn advanced material. It covers in great detail everything you ever wanted to know about python string processing (and honestly probably a bit more). It has a very readable style, and overall is exceptionally informative. Examples are clear, pointed, and useful. On it's weaknesses, some material (ie parsers) might be extremely dense and hard to understand if you don't have a CS or Linguistics degree. On the other hand, if you do understand it (and the explanation is pretty good), you will end up a much better programmer for it. Overall, I'd recommend this book for professionals with theory background that need to do advanced python work. I'd also recommend it to people without theory background, but only if they're not afraid of getting their feet wet. People who are afraid of learning should probably avoid this book. 4 stars mostly because I'm not really sure how to evaluate this book. (Review Data Last Updated: 2008-08-22 06:40:39 EST)
|
|||||||||||||||||||||||||||||
| 08-23-07 | 5 | (NA) |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
TPIP is an instant classic in that all you need to do is add a solid understanding of python and you can instantly appreciate its classic nature. Text processing is more fundamental to programming than programming itself. For instance, most of the programs a programmer will write will be written with text. So gaining proficiency in dealing with text is key to not only programming but probably every facet of one's experience with a computer.
In TPIP, David Mertz provides the reader with a set of tools for manipulating text in python. The book is organized by type of text processing activity. For example filters are presented from a functional perspective, searching text is presented in terms of regular expressions, etc. Relevant modules are presented with each type of processing task in a reference format. The greatest value in the book is that it approaches a fundamental and important programming topic that most books would treat sparingly or dismiss outright. TPIP might be in league with Friedl's Mastering Regular Expressions in that it takes outwardly uninspiring topics, makes them interesting, and teaches them with pedagogical finesse. Somehow, Mertz inspires the reader to feel intelligent while presenting the topics in an accessible way. Even mxtexttools becomes comprehensible in TPIP. TPIP, though, is not without it shortcomings, especially in organization. The review of python and functional programming are put in appendices and the reference material is interleaved with the text, giving the reader a somewhat disjointed feeling as he makes his way through the book. Better would have been to build the book up from a solid review of the python language, proceeding to a thorough treatment of functional programming in python, to then present the meat of the book, text processing, as a well-organized whole with sensible segue between the chapters. The reference material should be moved to the appendices for easy access. Even if these organization problems are never fixed, one would be well served to study this fine volume. (Review Data Last Updated: 2007-12-19 23:54:38 EST)
|
|||||||||||||||||||||||||||||
| 04-11-07 | 3 | 2\2 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
There is a lot of good stuff in this book, but the presentation is lousy.
The first chapter dives into functional programming using obscure and terse high order functions including nested lambda expressions. He never does provide a "mere mortal" explanation for how these functions work. I was able to figure it out, but then I've been programming for 35 years in 20+ languages. As a learning experience it was valuable debugging exercise for me, but as something for a programmer who was just getting to know Python, I can't think of a greater turn off. Python as a rule is easy to read and easy to write. This book manages to make it unnecessarily hard. Start with another Python book (or two, or three) then come back to this one when you have a lot of time and patience to spend. As I said there *is* some worthwhile information in there. (Review Data Last Updated: 2007-08-24 02:33:30 EST)
|
|||||||||||||||||||||||||||||
| 04-10-07 | 3 | (NA) |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
There is a lot of good stuff in this book, but the presentation is lousy.
The first chapter dives into functional programming using obscure and terse high order functions including nested lambda expressions. He never does provide a "mere mortal" explanation for how they work. I was able to figure it out, but then I've been programming for 35 years in 20+ languages. Of course, once I decyphered his examples I found a bug in the first set of functions he presents. There may well have been other bugs that I didn't find. As a learning experience it was valuable debugging exercise for me, but as something to present to an average to above average programmer who was just getting to know Python, I can't think of a greater turn off. Python as a rule is easy to read and easy to write. This book manages to make it unnecessarily hard. Start with another Python book (or two, or three) then come back to this one when you have a lot of time and patience to spend. As I said there *is* some worthwhile information in there. (Review Data Last Updated: 2007-04-11 11:53:02 EST)
|
|||||||||||||||||||||||||||||
| 09-22-05 | 5 | 1\3 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
I'd second most of the positive statements given by other reviewers. To boot - the author's voice is clear and pleasant. He shares his knowledge as it is, without dumbing it down or condescending. The index is very useful when you want to get in, get the information, and get back to work. This book is a great read for anyone learning or using Python seriously.
(Review Data Last Updated: 2007-07-06 18:56:16 EST)
|
|||||||||||||||||||||||||||||
| 08-02-05 | 5 | 3\3 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
This book is not for everyone, but for "text processing", I know of nothing else that comes close; this book merits careful study. Note that "text processing" would include many web applications -- http is a text driven protocol. Do not be put off by the first chapter! It is the most abstract of any book I have read in decades. As the book says, you can skip it if it is a problem for you. As an illustration of how good this book is, I am now using regular expressions (selectively), and this was only possible with the help if this book! (If you do not even know what regular expressions are, you have not completed Text Processing 1.01.)
(Review Data Last Updated: 2007-07-06 18:56:16 EST)
|
|||||||||||||||||||||||||||||
| 08-01-05 | 5 | (NA) |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
This book is not for everyone, but for "text processing", I know of nothing else that comes close; this book merits careful study. Note that "text processing" would include many web applications -- http is a text driven protocol. Do not be put off by the first chapter! It is the most abstract of any book I have read in decades. As the book says, you can skip it if it is a problem for you. As an illustration of how good this book is, I am now using regular expressions (selectively), and this was only possible with the help if this book! (If you do not even know what regular expressions are, you have not completed Text Processing 1.01.)
(Review Data Last Updated: 2006-07-07 12:41:01 EST)
|
|||||||||||||||||||||||||||||
| 12-30-04 | 5 | 0\2 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
This book is not for novice programmers. However, if you are an reasonably experienced programmer in Python, or any other language for that matter, this book will serve you very well. Text processing is probably the most common use for Python.
Mertz is an exceptionally smart guy. A few of the things in this book were over my head, but most of it was not. He offers terrific insights into programming in general, and probably the best Python overview / tutorial I have ever seen (in one of the Appendices). (Review Data Last Updated: 2007-07-06 18:56:16 EST)
|
|||||||||||||||||||||||||||||
| 08-01-04 | 4 | 6\7 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
In any booksotre you'd probably find hundreds of titles of "Web programming in XXX language". What about "Text processing in XXX language"? From Amazon's similar items the only book that remotely resemble this is the "Python & XML". This is indeed a unique title.
Why do we care about text processing? The author said it is arguably what most programmer spend most time doing. Text is the basis of most Internet communications protocol. Most computer "data" often comes down to "text. And Python is a powerful language for this task. What can you find in this book? Advanced Python programming technique Python is a simple language on the surface. As an expert programmer the author really show you great techniques in solving actual problems. This is in contrast to most other Python books that cover only the language basis. It is also one of a few book that covers functional programming style. The other book that you maybe interested in is the "Python Cookbook". Algorithms You'll find a number of useful algorithm here and there. There is only some brief introduction to computer language theory. On the other hand some third party parser tools are well covered. Internet protocols and data format Several standard Python modules as well as useful third party tools for handling Internet protocols and data format is introduced. (But not much about the protocol themselves). There are places where author would jump into detail without giving enough introduction. In p.5 a number of higher-order functions are defined without explanation. The "smart ASCII" format is used to illustrate several different progamming tools but the format itself is never defined. What could be very interesting topics instead leave readers bewildered. I find this book a bag of tricks where author's experience and opinion are most valuable. Despite some short coming this is one advanced book that help improve your progamming technique to a higher level. (Review Data Last Updated: 2007-07-06 18:56:16 EST)
|
|||||||||||||||||||||||||||||
| 06-10-04 | 5 | 19\20 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
Added note: The review by phrodod was quite nice, IMO. One little thing: s/he mentions my little re_show() utility that I use in the regex tutorial. While phrodod grants leniency in allowing that it is fine if the utility is not of use beyond the examples... my little wrapper proves better than that even. Specifically, the rather nice Python library PEAK chose to incorporate the exact 3-line utility function (with acknowledgments to me) to help user explore regexen. PEAK, of course, is vastly more interesting than my miniscule and accidental contribution to it :-).
----- I felt the review from "A reader from Germany" below rather missed the point of my book. Don't buy it if the book is not right for you, most certainly. But I wonder how that reader got such an entirely inaccurate set of expectations about the book. For example, s/he wrote: > This book is not for people that are new to programming. However, if you buy the book from Amazon (here), the very first thing you see in the Editorial Reviews section is: > Written for experienced programmers... So it seems odd for the above reader to be negatively surprised by the target audience. If s/he bought it at a brick-and-morter store, the back cover pretty much says the same thing. Moreover, the reader mentioned also seems not to understand the meaning of the title: > The title of the book is very much missleading too. > If you think that the book has anything to do with > text processing in the sense of linguistics, you are > mistaken. I'm not sure how anyone would think that the phrase "text processing" is supposed to mean "computational linguistics." There are a couple overlaps between the fields. And you probably need a bit of text processing to extract useful corpora for computational linguistics. But the ordinary meaning of the words makes it clear that these are quite different areas of specialty. Actually, I just wrote an IBM developerWorks article introducing the Natural Language Toolkit (for Python). Anyone who is interested in computational linguistics and Python should take a look at that library. I wonder if the "from Germany" description of the reader indicates the s/he translated the English title slightly wrongly. I can kinda imagine that across languages the field distinction could get lost. And finally: > ...really mad to find out that the book is available online too. This one rather upsets me. I provide a service for readers--both those who pay for the printed copy and those who read it online for free (or make a voluntary donation online). The reader is *mad* that I give away something for free that I was under no obligation to (and even wrestled slightly with my publisher to make sure I could do so)!? If s/he doesn't want the free copy, I don't see how s/he's forced to download it! (Review Data Last Updated: 2007-07-06 18:56:16 EST)
|
|||||||||||||||||||||||||||||
| 06-09-04 | 5 | 16\17 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
Added note: The review by phrodod was quite nice, IMO. One little thing: s/he mentions my little re_show() utility that I use in the regex tutorial. While phrodod grants leniency in allowing that it is fine if the utility is not of use beyond the examples... my little wrapper proves better than that even. Specifically, the rather nice Python library PEAK chose to incorporate the exact 3-line utility function (with acknowledgments to me) to help user explore regexen. PEAK, of course, is vastly more interesting than my miniscule and accidental contribution to it :-).
----- I felt the review from "A reader from Germany" below rather missed the point of my book. Don't buy it if the book is not right for you, most certainly. But I wonder how that reader got such an entirely inaccurate set of expectations about the book. For example, s/he wrote: > This book is not for people that are new to programming. However, if you buy the book from Amazon (here), the very first thing you see in the Editorial Reviews section is: > Written for experienced programmers... So it seems odd for the above reader to be negatively surprised by the target audience. If s/he bought it at a brick-and-morter store, the back cover pretty much says the same thing. Moreover, the reader mentioned also seems not to understand the meaning of the title: > The title of the book is very much missleading too. > If you think that the book has anything to do with > text processing in the sense of linguistics, you are > mistaken. I'm not sure how anyone would think that the phrase "text processing" is supposed to mean "computational linguistics." There are a couple overlaps between the fields. And you probably need a bit of text processing to extract useful corpora for computational linguistics. But the ordinary meaning of the words makes it clear that these are quite different areas of specialty. Actually, I just wrote an IBM developerWorks article introducing the Natural Language Toolkit (for Python). Anyone who is interested in computational linguistics and Python should take a look at that library. I wonder if the "from Germany" description of the reader indicates the s/he translated the English title slightly wrongly. I can kinda imagine that across languages the field distinction could get lost. And finally: > ...really mad to find out that the book is available online too. This one rather upsets me. I provide a service for readers--both those who pay for the printed copy and those who read it online for free (or make a voluntary donation online). The reader is *mad* that I give away something for free that I was under no obligation to (and even wrestled slightly with my publisher to make sure I could do so)!? If s/he doesn't want the free copy, I don't see how s/he's forced to download it! (Review Data Last Updated: 2006-07-07 12:41:01 EST)
|
|||||||||||||||||||||||||||||
| 05-09-04 | 5 | 6\6 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
This book covers many of the details of processing text files to extract and/or generate more textual information from them. The author covers quite a number of advanced techniques, starting with an introduction to functional programming paradigms in the first few pages.
Nothing in this book is truly new, but the author finds many ways of applying the things advanced Pythonistas already know to domains that they might otherwise not consider. This book is very dense, so don't expect to read it and get it in a weekend. And certainly, don't pick this up as a first programming in Python book. If you're just learning to program, consider "Learning Python" by Lutz or "How to Think Like a Computer Scientist" (free web book -- search Google for it). If you already know programming, but need to learn Python, consider the Quick Python Book (old, but great) or "Dive into Python" (another free web book) for learning those. When the time comes that you want to write a small language of your own for your users (or when you need to support an existing small language), this book will be invaluable for telling you how to do this using Python. If you don't understand functional programming, but think you might like to learn it, use this book to teach you that. If you want to understand the regular expression library, use this book for that as well. However, the RE library examples do wrap the output in a special macro that's only likely to be useful for the purposes of the book. Understand that you're not likely to use the macro, and you'll be fine. The tremendous breadth and depth of this book makes me recommend it highly. But even if you're advanced, don't expect it to be light reading. Enjoy. (Review Data Last Updated: 2006-07-07 12:41:01 EST)
|
|||||||||||||||||||||||||||||
| 04-24-04 | 2 | 6\19 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
This book is not for people that are new to programming. It is also not for people new to Python. I have difficulties figuring out whom it was intended for. If you are an experienced programmer and want to learn something about Python, read the tutorial that comes with Python or any other online tutorial that is out there.
The title of the book is very much missleading too. If you think that the book has anything to do with text processing in the sense of linguistics, you are mistaken. It just means that it deals with "few" questions related to doing things with text files: hence text processing. Only 17 pages are dedicated to XML, which is too short and another big shortcoming of the book. It is over priced in my opinion for the small amount of information that it transfers. And it made me really mad to find out that the book is available online too. Don't get me wrong. It is OK to buy books if they are available in print. But they have to offer something. This book is absolutely worth it, maybe if you can scan over some pages once in a while. But you are throwing out your money if you pay for it. (Review Data Last Updated: 2006-07-07 12:41:01 EST)
|
|||||||||||||||||||||||||||||
| 02-29-04 | 4 | 12\12 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
This is the only book that really attacks the issue of string processing using Python. Unfortunately it didn't attack the text processing problems that I wanted discussed.
Also, in the area of Regular Expressions the examples didn't directly use the Python library, instead a wrap around function was used for the many examples and that detracted from using the book as a reference book for this purpose. I found that Python has several different ways to do string processing. Also, some of those ways come up with conflicting results. At the time of this writing the authors of Python are re-organizing and improving this area. What is truly great about the book is the discussion of state machines, parsers, and functional programming. Although these topics detract from the focus on string processing somewhat this book is perhaps the only popular Python book out there that does these topics justice. I thought they were very well written. My overall complaint is that this book includes too many things outside of text processing using the core Python language. But other readers may appreciate this aspect more than I did. If you want coverage on handling email specifically, the author covers that. Same with HTML processing and other specialized topics. I just wanted to low down on using the full string processing capabilities of the core Python language -- not necessarily all the specialized libraries. I found string processing to be messy with Python but found Ruby to be much easier. That is perhaps because Ruby is a newer language and it has some features of Perl built in. Ruby however does not have the extent of libraries available like Python, nor does it have as nice of Windows GUI. Overall, if you are looking for a book on text processing this is the only book out there, and a big plus with this book is what you will learn on function programming, state machines and parsers. The author worked hard to produce a book in this specialized area. He has lots of code examples. Highly recommended for Python programmers. John Dunbar (Review Data Last Updated: 2006-07-07 12:41:01 EST)
|
|||||||||||||||||||||||||||||
| 02-14-04 | 5 | 10\11 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
Confessedly, writing this little note was inspired by an article on authors who pseudonymously or anonymously review their own books. I find that a bit dishonest, so you should see my name here.
Maybe this is a quibble, but I found the review by jluc_ from Vancouver, BC Canada to miss one point. S/he has some criticisms of the organization and focus that I won't argue with (obviously, my own take is different). But there is also a remark that: "For example, he uses smart ASCII formatting examples several times. Does he _clearly_ define that format somewhere? Nope, the reader is assumed to magically know/guess its syntax, though several entries for "smart ASCII" in the index will first lead him to believe a definition does exist in the book." It is quite true--and quite deliberate--that I omit any formal description of smart ASCII from the book. For a real world example of the format, you can look at the online version of the whole book itself at [http://gnosis.cx/TPiP/], which is in this format. But there is no official grammar of the markup (though the parser chapters do this partially, as a way of explaining parsers, not the format. In my real work experience, text formats I encounter are RARELY completely and accurately documented. Rather, the tools I have written in my life have treated formats in a heuristic and seat-of-the-pants way, typically based only on some examples of a format. In using the smart ASCII format for various examples, I want to put readers in a mind to exactly this sort of real-world experience. Don't sweat the lack of formal documentation, just get to work with the programming! (Review Data Last Updated: 2006-07-07 12:41:01 EST)
|
|||||||||||||||||||||||||||||
| 01-19-04 | 5 | 3\4 |
| Reviewer | Permalink | ||||||||||||||||||||||||
|
This book is extremely well-written and, above all, USEFUL! Dr. Mertz approaches the problem of handling text from a fresh perspective. With illustrative examples in different styles (from procedural, functional and object-oriented programming perspectives; and occasionally even metaphysical!), this book actually causes you to think about the types of problems you encounter and provides criteria for deciding the best approach. The examples are short but by no means simple. Aside from a slight stylistic quibble I have with the second chapter, I heartily recommentd this book for those who wish to expand their knowledge of Python and enhance their repertoire of techniques. Try it - it's fun and enlightening!
(Review Data Last Updated: 2006-07-07 12:41:01 EST)
|
|||||||||||||||||||||||||||||
| Reader Reviews 1 - 16 of 16 | |||||||||||||||||||||||||||||
| All Books | Arts | Biography | Click Here For An A-Z Index Of All 213 Best-Seller Subjects | Business | Children's | Comics | ||||||
| Computers | Cooking | Engineering | Entertainment | Health | History | Home | Horror | Humor | Law | Fiction | Medicine | Mystery |
| Nonfiction | Outdoors | Parenting | Professional | Reference | Religion | Romance | Science | Sci-Fi | Sports | Teens | Travel | |