Python Tips
Follow
Find
19.9K views | +0 today
Python Tips
Various news, tutorials, and other stuff about Python programming.
Curated by Mathieu D.
Your new post is loading...
Your new post is loading...
Scooped by Mathieu D.
Scoop.it!

Decoding CAPTCHA's with Python

Decoding CAPTCHA's with Python | Python Tips | Scoop.it
Most people don’t know this but my honours thesis was about using a computer program to read text out of web images. My theory was that if you could get a high level of successful extraction you could use it as another source of data which could be used to improve search engine results. I was even quite successful in doing it, but never really followed my experiments up. My honours advisor Dr Junbin Gao http://csusap.csu.edu.au/~jbgao/ had suggested the following writing my thesis I should write some form of article on what I had learnt. Well I finally got around to doing it. While what follows is not exactly what I was studying it is something I wish had existed when I started looking around.

So as I mentioned essentially what I attempted to do was take standard images on the web, and extract the text out them as a way of improving search results. Interestingly I based most of my research/ideas by looking at methods of cracking CAPTCHA's. A CAPTCHA as you may well know is one of those annoying "Type in the letters you see in the image above" things you see on many website signup pages or comment sections.

A CAPTCHA image is designed so that a human can read it without difficulty while a computer is unable to. This in practice has never really worked with pretty much every CAPTCHA that is published on the web getting cracked within a matter of months. Knowing this my theory was that since people can get a computer to read something that it shouldn’t be able to, then normal images such as website logos should be much easier to break using the same methods.
I was actually surprisingly successful in my goal with over 60% successful recognition rates for most of the images I used in my sample set. Rather high considering the variety of different images that are on the web.

What I did find however while doing my research was a lack of sample code or applications which show you how to crack CAPTCHA's. While there are some excellent tutorials and many published papers on it they are very light on algorithms or sample code. In fact I didn't find any beyond some non working PHP scripts and some Perl fragments which strung together a few non related programs and gave some reasonable results when presented with very simple CAPTCHA’s. None of them helped me very much. I found that what I needed was some detailed code with examples I could run and tweak and see how it worked. I think I am just one of those people that can read the theory, and follow along, but without something to prod and poke I never really understand it. Most of the papers and articles said they would not publish code due the potential for missuse. Personally I think it is a waste of time since in reality building a CAPTCHA breaker is quite easy once you know how.

So because of the lack of examples, and the problems I had initially getting started, I thought I would put together this article with full detailed explanations working code showing how to go about breaking a CAPTCHA.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Armin Ronacher: SQLAlchemy and You

Without doubt are most new Python web programmers these days chosing the Django framework as their gateway drug to Python web development. As such many people's first experience with a Python ORM (or maybe an ORM altogether) is the Django one. When they are later switching to something else they often find SQLAlchemy unnecessarily complex and hard to use. Why is that the case?
I made a quick poll on Twitter about why people prefer the Django ORM over SQLAlchemy and I got back a few interesting results. First of all that question was obviously asked with the intent to attract answers from people that do prefer Django over SQLAlchemy or at least have some issues with SQLAlchemy that they don't seem to have with Django. Without a doubt there is a large fanbase behind SQLAlchemy, myself included.
SQLAlchemy in general just has a much larger featureset and it's the only ORM for Python which allows you to take full advantage of your database and does not stand in your way. It exposes all features of your underlying database if you want and can be heavily fine tuned.
This article assumes that you have some basic Django knowledge and want to give SQLAlchemy a try. Step by step it walks through the differences and common idioms.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Django tips: the difference between 'blank' and 'null'

New users of Django, even people who have lots of experience writing database-driven applications, often run into a seemingly simple problem: how do you set up a model with “optional” fields that don’t always have to be filled in? Django’s validation system assumes by default that all fields are required, so obviously you have to tell it which fields it’s OK to leave blank.
But therein lies the problem: there are two different ways you can “leave it blank”.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Porting to Python 3: An in-depth guide

Porting to Python 3: An in-depth guide | Python Tips | Scoop.it
Porting to Python 3 doesn’t have to be daunting. This book guides you through the process of porting your Python 2 code to Python 3, from choosing a porting strategy to solving your distribution issues. Using plenty of code examples is takes you cross the hurdles and shows you the new Python features.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Ned Batchelder: Caged python

Ned Batchelder: Caged python | Python Tips | Scoop.it
For a presentation, I wanted to produce samples of Python interactive sessions. I could have opened a terminal window and typed my input, and copied the resulting session and pasted it into a text file, but that's not repeatable, and is labor intensive and error-prone.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Python and the Principle of Least Astonishment

When you use something for a long time you will develop some kind of sensing of what goes together and what does not appear to fit the common pattern. The Python community seems to have given this effect a name: if something matches the common patterns it's “pythonic” if it's not, it's deemed “unpythonic”. Most aspects of the language itself are designed to not surprise you if you use them in case there would be more than one possible behavior. This is what many people refer to the Principle of Least Astonishment). In my mind there are only a handful exceptions to that rule in the language design which I will cover here as well.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Django + JQuery Mobile Quick Start Tutorial

I've uploaded a tutorial and a very minimal library for creating JQuery Mobile sites with Django.  It's actually the first time I've used PyPI's documentation hosting system, mostly because Launchpad doesn't have a web-page hosting mechanism. The library (django-jqm) is not intended to be a platform for creating "real" sites, it's just a bunch of sample code for getting started fast creating very basic sites from your existing Django sites.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Django sessions – part I: Cookies

Django sessions – part I: Cookies | Python Tips | Scoop.it
HTTP is a stateless protocol – the server is not required to retain information or status about each user for the duration of multiple requests.

For smart web applications, however, this isn’t good enough. You want to login into an application and have it remember you across requests. A good example is maintaining a "shopping cart" at some merchandise website, which you gradually fill as you browse through the products that interest you.

To solve this problem, HTTP cookies were invented by Netscape back in the 1990s. Cookies are formally defined in RFC2965, but to spare you all that jabber, cookies can be described very simply.

A cookie is just an arbitrary string sent by the server to the client as part of the HTTP response. The client will then return this cookie back to the server in subsequent requests. The information stored in the cookie is opaque to the client – it’s only for the server’s own use. This scheme allows the client to identify itself back to the server with some state the server has assigned it.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Anatomy of a Web Service, part 2 – “RedBarrel” « Fetchez le Python

Anatomy of a Web Service, part 2 – “RedBarrel” « Fetchez le Python | Python Tips | Scoop.it
I was talking about web services the other day: read it back as an introduction to this post.

I am pursuing this DSL experiment as I have now finished a working prototype of a micro-framework. I’ve called it RedBarrel (Monty reference). I’ve called the DSL files “RBR files”.

RedBarrel is a pure Python implementation of the DSL I’ve described in the previous post and does the following:

* loads the DSL file and run a WSGI web application (via rb-run)
* Allows you to check the syntax of an RBR file (via rb-check)
* generates a documentation page for the APIs at /__doc__ Note that description fields can be in reStructuredtext and are rendered in HTML
* publishes the DSL file at /__api__
* runs the code pointed in the DSL and does the post- and pre- processing as described
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

The Python Standard Library: Data Structures

The Python Standard Library: Data Structures | Python Tips | Scoop.it
Python includes several standard programming data structures, such as list, tuple, dict, and set, as part of its built-in types. Many applications do not require other structures, but when they do, the standard library provides powerful and well-tested versions that are ready to use.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Anatomy of a Web Service

Anatomy of a Web Service | Python Tips | Scoop.it
This is cycling in my head for a while now, and I think it’s close to become something concrete.

Let me summarize the idea: web services are most of the time doing the same post- and pre-processing tasks over and over and there should be a way to describe them via a DSL.

Nothing revolutionary here, but what if Nginx could handle for you all the boring parts and let you just handle the meat of your services. Having a DSL to describe web services potentially allows such delegation.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Terry Jones: Graceful shutdown of a Twisted service with outstanding deferreds

I’ve been spending a bit of time thinking again about queues and services. I wrote a Twisted class in 2009 to maintain a resizable dispatch queue (code in Launchpad, description on the Twisted mailing list). For this post I’ve pulled out (and simplified slightly) one of its helper classes, a DeferredPool.

This simple class maintains a set of deferreds and gives you a mechanism to get a deferred that will fire when (if!) the size of the set ever drops to zero. This is useful because it can be used to gracefully shut down a service that has a bunch of outstanding requests in flight. For each incoming request (that’s handled via a deferred), you add the deferred to the pool. When a signal arrives to tell the service to stop, you stop taking new requests and ask the pool for a deferred that will fire when all the outstanding deferreds are done, then you exit. This can all be done elegantly in Twisted, the last part by having the stopService method return the deferred you get back from the pool (perhaps after you add more cleanup callbacks to it).
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Eli Bendersky: Django sessions – part III: User authentication

In the previous two articles of this series we learned how Django implements sessions, thus allowing the abstraction of persistent state in a web application. The session framework can be employed by developers to implement all kinds of interesting features for their application, but Django also uses it for its own needs. Specifically, Django’s user authentication system relies on the session framework to do its job.
The user authentication system allows users to log in and out of the application, and act based on a set of permissions. Borrowing from the Django Book:
This system is often referred to as an auth/auth (authentication and authorization) system. That name recognizes that dealing with users is often a two-step process. We need to
Verify (authenticate) that a user is who he or she claims to be (usually by checking a username and password against a database of users)Verify that the user is authorized to perform some given operation (usually by checking against a table of permissions)
In this, the final part of the series, I want to explain how Django’s user authentication is implemented. I will focus on item 1 in the list above – authentication, which makes actual use of sessions
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Just a little Python: Zarkov is a Lightweight Map-Reduce Framework

Just a little Python: Zarkov is a Lightweight Map-Reduce Framework | Python Tips | Scoop.it
Over the past few weeks I've been working on a service in Python that I'm calling, in the tradition of naming projects after characters in Flash Gordon, Zarkov. So what exactly is Zarkov? Well, Zarkov is many things (and may grow to more):
Zarkov is an event loggerZarkov is a lightweight map-reduce frameworkZarkov is an aggregation serviceZarkov is a webservice
In my previous post, I discussed Zarkov as an event logger. While this may be useful (say for logging to a central location from several different servers), there's a bit more to Zarkov. Today I'll focus on the map-reduce framework provided by Zarkov. If you want instructions on setting up Zarkov or getting events into it, please see my previous post.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Basic intro to Python Meta programming

Basic intro to Python Meta programming | Python Tips | Scoop.it
A refrain I’ve often read on developer forums is that if one must ask about how to do meta-programming (in Python) then they probably shouldn’t attempt it. It’s true that meta-programming is an advanced concept and it can really warp your mind if you dive into it. So while most programmers may not need it, aspiring hackers won’t be able to avoid it. What the former consider a program, is merely data for the latter.

In Python, the journey of meta programming starts by understanding that classes are objects. And that ‘type’ is a very special object. Not only is it used to introspect other objects but it is also used to create new ones.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

“Eppur si muove!” – Dealing with Timezones in Python

As a result of our world not being a flat disc but a rotating geoid and our solar system only having one sun, we have different time of days at different parts at precisely the same time. Everybody learns that in school these days and is well aware of the effects on human life (“Call your aunt over sea and she will pick up at an odd time”, jetlag etc.). But unfortunately that whole timezone thing is only partially based on constraints our world gave us and in computing we have to deal with these oddities as well.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Alex Gaynor: So you want to write a fast Python?

Thinking about writing your own Python implementation? Congrats, there are plenty out there , but perhaps you have something new to bring to the table. Writing a fast Python is a pretty hard task, and there's a lot of stuff you need to keep in mind, but if you're interested in forging ahead, keep reading!
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Python persistence management

Persistence is all about keeping objects around, even between executions of a program. In this article you'll get a general understanding of various persistence mechanisms for Python objects, from relational databases to Python pickles and beyond.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Django sessions – part II: How sessions work

Django sessions – part II: How sessions work | Python Tips | Scoop.it
Sessions are Django’s high-level tool for keeping a persistent state for users on the server. Sessions allow to store arbitrary data per visitor, and have this data available the next time the visitor visits the site. As we’ll learn in this article, sessions are still based on cookies, but cookie management is abstracted away, handling a lot of issues on the way – as sessions provide a more convenient, robust and safe way to store the data.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Laurent Luce: Python’s string objects implementation

Laurent Luce: Python’s string objects implementation | Python Tips | Scoop.it
This article describes how string objects are managed by Python internally and how string search is done.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Help us ironing Packaging

Help us ironing Packaging | Python Tips | Scoop.it
packaging has landed in the standard library, but the road to Python 3.3 is still filled with a lot of work. We’ve pushed the Documentation yesterday in the tip, and it now appears here: http://docs.python.org/dev/packaging/

There are a lot of stuff you can do to help us improving packaging. If you wish to help out, read up.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

PyCharm: PyCharm 1.5 Released! Documentation, SQL/Database, Django templates debugging and more

PyCharm 1.5 final release is now available for download.
It’s been an interesting version to work on. We’ve added a number of useful features that should make you more productive and your code better.
more...
No comment yet.
Scooped by Mathieu D.
Scoop.it!

Ruslan Spivak: Power set generation – a joy of Python

When I worked on name mangling in SlimIt I needed a function that would generate subsets of a character string in lexicographic order.
For example, if I had a character sequence ‘abc’ I would expect to call my function and get items in the following order: ‘a’, ‘b’, ‘c’, ‘ab’, ‘ac’, ‘bc’, ‘abc’. The produced sequence is called a power set and binary counting is a well known algorithm for generating such power sets .

The idea is that we represent our character sequence as a binary string of n bits, then start counting from zero to 2**n – 1 and if a bit k is set to 1 then we would put k-th element from the character sequence into an output set.
Let’s say we have a sequence ‘abc’, then counting to 2**3 – 1 would produce binary numbers 000, 001, 010, 011, 100, …, which correspond to ”, ‘c’, ‘b’, ‘bc’, ‘a’, …, in the character sequence.
more...
No comment yet.