From RNWiki
Revision as of 20:04, 27 October 2009 by Fkelly (talk | contribs) (Upon submission)
Jump to: navigation, search

This is where I (fkelly) propose that we write our filtering standards for 2.5 .. this is all draft level stuff now


Filtering of data is a fundamental aspect of any web based system. It affects security, performance and the acceptability of the system to users. It underlies every other facet of the system. In generic *nuke (those based on PHPNUKE(tm)) based systems as well as in RavenNuke™ specifically, filtering has traditionally been scattershot or "fractured" -- in other words not based on any set of design principles or standards. The topic has been discussed ad nauseum in forum threads without arriving at any resolution. For RavenNuke™ 2.5 we aim to change that.

The purpose of this document is to lay out a consistent set of standards that all programmers working on RavenNuke™ software will use. The document will reference specific functions and programs that are contained (or will be contained) in RavenNuke™. It is not intended as a general reference or to be used outside of the RavenNuke™ context.

The Flow of things

Within RavenNuke™ (and content management systems generally) there is basically a relatively simple flow. Content for the system is stored in a MYSQL database. The data in that database gets put in by HTML forms. The forms are presented to a user who fills them in. Optionally, there can be Javascript client side validation of the form before it is submitted. Upon submission, the form is processed by another PHP program. The "posted" data should be validated and prepared for the database. Database input or change statements are issued, the database is updated and the user is presented with another form or report.

Within this flow you can see several areas where filtering or validation takes place. Javascript is generally used for client side validation. RavenNuke™ is moving in the direction of using Jquery based validation classes as the foundation for Javascript client side validation. When the form is submitted there is a layer of software that lies between the form and its processing program to assure that the form comes from within the system, thus preventing cross site request forgery. A PHP program receives the posted form and validates all data. Before any data is submitted to the database a specific set of MYSQL related "steps" must be carried out to prevent SQL injection type attacks -- basically adding slashes (escape characters) before certain special characters.

Let's start with a Form

Ideally, filtering would be built into the design of a form and would essentially be "declarative". That is, the form designer would specify the type of each input element and or validation or edit criteria. If an input element is supposed to be an email address, then a standard email validation routine would be run both at the client side (Javascript -- Jquery provides a validation routine) when the form is submitted. If the element is a checkbox then the only values that can be in it after submission are "on" or a value you have associated with the checked attribute. In the case of text fields the designer needs to specify whether he wants to allow any html and if so which attributes. In the case of textareas the settings for allowablehtml will determine which html features can be used ... and the receiving program will have to run the posted data through standard validation routines.

In the RavenNuke™ context we do not have the libraries, frameworks or capabilities to implement such declarative filtering of forms. In addition there is a large amount of legacy code which would essentially have to be rewritten to put into such a framework. So, such an approach is not practical in the short run. Instead what we need to do is to look at our forms on a one-by-one basis and refit them to implement standards as specified here. One issue (that I am not sure about) is whether we want to do Javascript validation of all forms and fields on the client side before the form is ever submitted. Or perhaps set it as a standard that we do so whenever we go in to modify a legacy program? Whatever Javascript validation we do should be, as in the current RNYA module, based on the Jquery validation library. We should also have a standard for presenting any errors to users including putting messages in a standard location on the form and whether we want to validate field by field as the user moves focus off a field or only upon a submission attempt. (or some combination for fields that have dependencies and need to validated together)?

Upon submission

So, the user fills out the form and hits submit. Even though we have taken steps in RavenNuke™ 2.4 to stop cross site request forgery, we still cannot trust that the form submitted is one from within our system. So we can't be sure, even if we have exhaustive Javascript validation, that the fields are properly validated. A forged form could contain a script command in a checkbox field, just for instance. So what does our receiving PHP program need to do? To eliminate the possibility of PHP warnings and errors we first need to check if the posted field is set. So we will have syntax like:

if isset($_POST['field1'] {
do some validation

Now I have a question. If we know what form we are receiving the POST data from and we know what fields are on it then should not the absence of one of those fields in the POST data be considered evidence of a security violation ... should we have some kind of way to pass this to Sentinel to ban the user?

But in any event, assuming the field is present, we then want to do as specific a filtering job as we can. In other words, if we know the values a field can have, we should check that it has one of them. If the field is a State field, it should have one of the 50 state values. In fact anything that comes from an option list should have one of those option values. A numeric field that is supposed to be an integer should be an integer. If the values have to be less than, say, 150 then they should be. By taking this approach to fields where we know the possible values we eliminate the need to send them through more extensive filtering libraries such as KSES or HTML Purifier. No?

If we are going to be doing these validations in a standard way throughout RavenNuke™ should we not have a library of common validation logic that could be called? In fact, should we not insist that it be called instead of doing "one-off" coding.

Likewise, there are some fields where there is pre-written logic that can be used to validate. These include email addresses, phone numbers, zip codes, and URL's. Again, we should have a standard approach to these and the logic should be totally consistent with what we use on the Javascript side.

Finally we come to the more complex case of text fields and textarea input where we want to allow some HTML. In these cases we need to pass the field through a standard library. Currently we do it by passing the field to the check_html function in mainfile. This in turn sees if stripslashes is needed (more on that later) and passes the field to kses.php which in its turn "normalizes" and validates the html in the data and checks for security violations.

Note to self: gotta mention that NukeSentinel™ gets a shot at the form data before our receiving program ever sees it. And that NukeSentinel™ post logic filtering has to go :)