cgi —- Common Gateway Interface support

源代码:Lib/cgi.py


Support module for Common Gateway Interface (CGI) scripts.

This module defines a number of utilities for use by CGI scripts written inPython.

概述

A CGI script is invoked by an HTTP server, usually to process user inputsubmitted through an HTML <FORM> or <ISINDEX> element.

Most often, CGI scripts live in the server's special cgi-bin directory.The HTTP server places all sorts of information about the request (such as theclient's hostname, the requested URL, the query string, and lots of othergoodies) in the script's shell environment, executes the script, and sends thescript's output back to the client.

The script's input is connected to the client too, and sometimes the form datais read this way; at other times the form data is passed via the "query string"part of the URL. This module is intended to take care of the different casesand provide a simpler interface to the Python script. It also provides a numberof utilities that help in debugging scripts, and the latest addition is supportfor file uploads from a form (if your browser supports it).

The output of a CGI script should consist of two sections, separated by a blankline. The first section contains a number of headers, telling the client whatkind of data is following. Python code to generate a minimal header sectionlooks like this:

  1. print("Content-Type: text/html") # HTML is following
  2. print() # blank line, end of headers

The second section is usually HTML, which allows the client software to displaynicely formatted text with header, in-line images, etc. Here's Python code thatprints a simple piece of HTML:

  1. print("<TITLE>CGI script output</TITLE>")
  2. print("<H1>This is my first CGI script</H1>")
  3. print("Hello, world!")

使用cgi模块。

通过敲下 import cgi 来开始。

当你在写一个新脚本时,考虑加上这些语句:

  1. import cgitb
  2. cgitb.enable()

This activates a special exception handler that will display detailed reports inthe Web browser if any errors occur. If you'd rather not show the guts of yourprogram to users of your script, you can have the reports saved to filesinstead, with code like this:

  1. import cgitb
  2. cgitb.enable(display=0, logdir="/path/to/logdir")

It's very helpful to use this feature during script development. The reportsproduced by cgitb provide information that can save you a lot of time intracking down bugs. You can always remove the cgitb line later when youhave tested your script and are confident that it works correctly.

To get at submitted form data, use the FieldStorage class. If the formcontains non-ASCII characters, use the encoding keyword parameter set to thevalue of the encoding defined for the document. It is usually contained in theMETA tag in the HEAD section of the HTML document or by theContent-Type header). This reads the form contents from thestandard input or the environment (depending on the value of variousenvironment variables set according to the CGI standard). Since it may consumestandard input, it should be instantiated only once.

The FieldStorage instance can be indexed like a Python dictionary.It allows membership testing with the in operator, and also supportsthe standard dictionary method keys() and the built-in functionlen(). Form fields containing empty strings are ignored and do not appearin the dictionary; to keep such values, provide a true value for the optionalkeep_blank_values keyword parameter when creating the FieldStorageinstance.

For instance, the following code (which assumes that theContent-Type header and blank line have already been printed)checks that the fields name and addr are both set to a non-emptystring:

  1. form = cgi.FieldStorage()
  2. if "name" not in form or "addr" not in form:
  3. print("<H1>Error</H1>")
  4. print("Please fill in the name and addr fields.")
  5. return
  6. print("<p>name:", form["name"].value)
  7. print("<p>addr:", form["addr"].value)
  8. ...further form processing here...

Here the fields, accessed through form[key], are themselves instances ofFieldStorage (or MiniFieldStorage, depending on the formencoding). The value attribute of the instance yieldsthe string value of the field. The getvalue() methodreturns this string value directly; it also accepts an optional second argumentas a default to return if the requested key is not present.

If the submitted form data contains more than one field with the same name, theobject retrieved by form[key] is not a FieldStorage orMiniFieldStorage instance but a list of such instances. Similarly, inthis situation, form.getvalue(key) would return a list of strings. If youexpect this possibility (when your HTML form contains multiple fields with thesame name), use the getlist() method, which always returnsa list of values (so that you do not need to special-case the single itemcase). For example, this code concatenates any number of username fields,separated by commas:

  1. value = form.getlist("username")
  2. usernames = ",".join(value)

If a field represents an uploaded file, accessing the value via thevalue attribute or the getvalue()method reads the entire file in memory as bytes. This may not be what youwant. You can test for an uploaded file by testing either thefilename attribute or the fileattribute. You can then read the data from the fileattribute before it is automatically closed as part of the garbage collection ofthe FieldStorage instance(the read() and readline() methods willreturn bytes):

  1. fileitem = form["userfile"]
  2. if fileitem.file:
  3. # It's an uploaded file; count lines
  4. linecount = 0
  5. while True:
  6. line = fileitem.file.readline()
  7. if not line: break
  8. linecount = linecount + 1

FieldStorage objects also support being used in a withstatement, which will automatically close them when done.

If an error is encountered when obtaining the contents of an uploaded file(for example, when the user interrupts the form submission by clicking ona Back or Cancel button) the done attribute of theobject for the field will be set to the value -1.

The file upload draft standard entertains the possibility of uploading multiplefiles from one field (using a recursive multipart/* encoding).When this occurs, the item will be a dictionary-like FieldStorage item.This can be determined by testing its type attribute, which should bemultipart/form-data (or perhaps another MIME type matchingmultipart/*). In this case, it can be iterated over recursivelyjust like the top-level form object.

When a form is submitted in the "old" format (as the query string or as a singledata part of type application/x-www-form-urlencoded), the items willactually be instances of the class MiniFieldStorage. In this case, thelist, file, and filename attributes are always None.

A form submitted via POST that also has a query string will contain bothFieldStorage and MiniFieldStorage items.

在 3.4 版更改: The file attribute is automatically closed upon thegarbage collection of the creating FieldStorage instance.

在 3.5 版更改: Added support for the context management protocol to theFieldStorage class.

Higher Level Interface

The previous section explains how to read CGI form data using theFieldStorage class. This section describes a higher level interfacewhich was added to this class to allow one to do it in a more readable andintuitive way. The interface doesn't make the techniques described in previoussections obsolete —- they are still useful to process file uploads efficiently,for example.

The interface consists of two simple methods. Using the methods you can processform data in a generic way, without the need to worry whether only one or morevalues were posted under one name.

In the previous section, you learned to write following code anytime youexpected a user to post more than one value under one name:

  1. item = form.getvalue("item")
  2. if isinstance(item, list):
  3. # The user is requesting more than one item.
  4. else:
  5. # The user is requesting only one item.

This situation is common for example when a form contains a group of multiplecheckboxes with the same name:

  1. <input type="checkbox" name="item" value="1" />
  2. <input type="checkbox" name="item" value="2" />

In most situations, however, there's only one form control with a particularname in a form and then you expect and need only one value associated with thisname. So you write a script containing for example this code:

  1. user = form.getvalue("user").upper()

The problem with the code is that you should never expect that a client willprovide valid input to your scripts. For example, if a curious user appendsanother user=foo pair to the query string, then the script would crash,because in this situation the getvalue("user") method call returns a listinstead of a string. Calling the upper() method on a list is not valid(since lists do not have a method of this name) and results in anAttributeError exception.

Therefore, the appropriate way to read form data values was to always use thecode which checks whether the obtained value is a single value or a list ofvalues. That's annoying and leads to less readable scripts.

A more convenient approach is to use the methods getfirst()and getlist() provided by this higher level interface.

  • FieldStorage.getfirst(name, default=None)
  • This method always returns only one value associated with form field name.The method returns only the first value in case that more values were postedunder such name. Please note that the order in which the values are receivedmay vary from browser to browser and should not be counted on. 1 If no suchform field or value exists then the method returns the value specified by theoptional parameter default. This parameter defaults to None if notspecified.
  • FieldStorage.getlist(name)
  • This method always returns a list of values associated with form field name.The method returns an empty list if no such form field or value exists forname. It returns a list consisting of one item if only one such value exists.

Using these methods you can write nice compact code:

  1. import cgi
  2. form = cgi.FieldStorage()
  3. user = form.getfirst("user", "").upper() # This way it's safe.
  4. for item in form.getlist("item"):
  5. do_something(item)

函数

These are useful if you want more control, or if you want to employ some of thealgorithms implemented in this module in other circumstances.

  • cgi.parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False)
  • Parse a query in the environment or from a file (the file defaults tosys.stdin). The keep_blank_values and strict_parsing parameters arepassed to urllib.parse.parse_qs() unchanged.
  • cgi.parseqs(_qs, keep_blank_values=False, strict_parsing=False)
  • This function is deprecated in this module. Use urllib.parse.parse_qs()instead. It is maintained here only for backward compatibility.
  • cgi.parseqsl(_qs, keep_blank_values=False, strict_parsing=False)
  • This function is deprecated in this module. Use urllib.parse.parse_qsl()instead. It is maintained here only for backward compatibility.
  • cgi.parsemultipart(_fp, pdict, encoding="utf-8", errors="replace")
  • Parse input of type multipart/form-data (for file uploads).Arguments are fp for the input file, pdict for a dictionary containingother parameters in the Content-Type header, and encoding,the request encoding.

Returns a dictionary just like urllib.parse.parse_qs(): keys are thefield names, each value is a list of values for that field. For non-filefields, the value is a list of strings.

This is easy to use but not much good if you are expecting megabytes to beuploaded —- in that case, use the FieldStorage class insteadwhich is much more flexible.

在 3.7 版更改: Added the encoding and errors parameters. For non-file fields, thevalue is now a list of strings, not bytes.

  • cgi.parseheader(_string)
  • Parse a MIME header (such as Content-Type) into a main value and adictionary of parameters.
  • cgi.test()
  • Robust test CGI script, usable as main program. Writes minimal HTTP headers andformats all information provided to the script in HTML form.
  • cgi.print_environ()
  • Format the shell environment in HTML.
  • cgi.printform(_form)
  • Format a form in HTML.
  • cgi.print_directory()
  • Format the current directory in HTML.
  • cgi.print_environ_usage()
  • Print a list of useful (used by CGI) environment variables in HTML.
  • cgi.escape(s, quote=False)
  • Convert the characters '&', '<' and '>' in string s to HTML-safesequences. Use this if you need to display text that might contain suchcharacters in HTML. If the optional flag quote is true, the quotation markcharacter (") is also translated; this helps for inclusion in an HTMLattribute value delimited by double quotes, as in <a href="…">. Notethat single quotes are never translated.

3.2 版后已移除: This function is unsafe because quote is false by default, and thereforedeprecated. Use html.escape() instead.

Caring about security

There's one important rule: if you invoke an external program (via theos.system() or os.popen() functions. or others with similarfunctionality), make very sure you don't pass arbitrary strings received fromthe client to the shell. This is a well-known security hole whereby cleverhackers anywhere on the Web can exploit a gullible CGI script to invokearbitrary shell commands. Even parts of the URL or field names cannot betrusted, since the request doesn't have to come from your form!

To be on the safe side, if you must pass a string gotten from a form to a shellcommand, you should make sure the string contains only alphanumeric characters,dashes, underscores, and periods.

Installing your CGI script on a Unix system

Read the documentation for your HTTP server and check with your local systemadministrator to find the directory where CGI scripts should be installed;usually this is in a directory cgi-bin in the server tree.

Make sure that your script is readable and executable by "others"; the Unix filemode should be 0o755 octal (use chmod 0755 filename). Make sure that thefirst line of the script contains #! starting in column 1 followed by thepathname of the Python interpreter, for instance:

  1. #!/usr/local/bin/python

Make sure the Python interpreter exists and is executable by "others".

Make sure that any files your script needs to read or write are readable orwritable, respectively, by "others" —- their mode should be 0o644 forreadable and 0o666 for writable. This is because, for security reasons, theHTTP server executes your script as user "nobody", without any specialprivileges. It can only read (write, execute) files that everybody can read(write, execute). The current directory at execution time is also different (itis usually the server's cgi-bin directory) and the set of environment variablesis also different from what you get when you log in. In particular, don't counton the shell's search path for executables (PATH) or the Python modulesearch path (PYTHONPATH) to be set to anything interesting.

If you need to load modules from a directory which is not on Python's defaultmodule search path, you can change the path in your script, before importingother modules. For example:

  1. import sys
  2. sys.path.insert(0, "/usr/home/joe/lib/python")
  3. sys.path.insert(0, "/usr/local/lib/python")

(This way, the directory inserted last will be searched first!)

Instructions for non-Unix systems will vary; check your HTTP server'sdocumentation (it will usually have a section on CGI scripts).

Testing your CGI script

Unfortunately, a CGI script will generally not run when you try it from thecommand line, and a script that works perfectly from the command line may failmysteriously when run from the server. There's one reason why you should stilltest your script from the command line: if it contains a syntax error, thePython interpreter won't execute it at all, and the HTTP server will most likelysend a cryptic error to the client.

Assuming your script has no syntax errors, yet it does not work, you have nochoice but to read the next section.

Debugging CGI scripts

First of all, check for trivial installation errors —- reading the sectionabove on installing your CGI script carefully can save you a lot of time. Ifyou wonder whether you have understood the installation procedure correctly, tryinstalling a copy of this module file (cgi.py) as a CGI script. Wheninvoked as a script, the file will dump its environment and the contents of theform in HTML form. Give it the right mode etc, and send it a request. If it'sinstalled in the standard cgi-bin directory, it should be possible tosend it a request by entering a URL into your browser of the form:

  1. http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home

If this gives an error of type 404, the server cannot find the script — perhapsyou need to install it in a different directory. If it gives another error,there's an installation problem that you should fix before trying to go anyfurther. If you get a nicely formatted listing of the environment and formcontent (in this example, the fields should be listed as "addr" with value "AtHome" and "name" with value "Joe Blow"), the cgi.py script has beeninstalled correctly. If you follow the same procedure for your own script, youshould now be able to debug it.

The next step could be to call the cgi module's test() functionfrom your script: replace its main code with the single statement

  1. cgi.test()

This should produce the same results as those gotten from installing thecgi.py file itself.

When an ordinary Python script raises an unhandled exception (for whateverreason: of a typo in a module name, a file that can't be opened, etc.), thePython interpreter prints a nice traceback and exits. While the Pythoninterpreter will still do this when your CGI script raises an exception, mostlikely the traceback will end up in one of the HTTP server's log files, or bediscarded altogether.

Fortunately, once you have managed to get your script to execute some code,you can easily send tracebacks to the Web browser using the cgitb module.If you haven't done so already, just add the lines:

  1. import cgitb
  2. cgitb.enable()

to the top of your script. Then try running it again; when a problem occurs,you should see a detailed report that will likely make apparent the cause of thecrash.

If you suspect that there may be a problem in importing the cgitb module,you can use an even more robust approach (which only uses built-in modules):

  1. import sys
  2. sys.stderr = sys.stdout
  3. print("Content-Type: text/plain")
  4. print()
  5. ...your code here...

This relies on the Python interpreter to print the traceback. The content typeof the output is set to plain text, which disables all HTML processing. If yourscript works, the raw HTML will be displayed by your client. If it raises anexception, most likely after the first two lines have been printed, a tracebackwill be displayed. Because no HTML interpretation is going on, the tracebackwill be readable.

Common problems and solutions

  • Most HTTP servers buffer the output from CGI scripts until the script iscompleted. This means that it is not possible to display a progress report onthe client's display while the script is running.

  • Check the installation instructions above.

  • Check the HTTP server's log files. (tail -f logfile in a separate windowmay be useful!)

  • Always check a script for syntax errors first, by doing something likepython script.py.

  • If your script does not have any syntax errors, try adding import cgitb;cgitb.enable() to the top of the script.

  • When invoking external programs, make sure they can be found. Usually, thismeans using absolute path names —- PATH is usually not set to a veryuseful value in a CGI script.

  • When reading or writing external files, make sure they can be read or writtenby the userid under which your CGI script will be running: this is typically theuserid under which the web server is running, or some explicitly specifieduserid for a web server's suexec feature.

  • Don't try to give a CGI script a set-uid mode. This doesn't work on mostsystems, and is a security liability as well.

备注

  • 1
  • Note that some recent versions of the HTML specification do state whatorder the field values should be supplied in, but knowing whether a requestwas received from a conforming browser, or even from a browser at all, istedious and error-prone.