Programming blog by Ezekiel Victor

Beware of Lucene DAO when constructing queries

Jun 02, 15 Beware of Lucene DAO when constructing queries

When using Lucene in Java or Scala, you may be tempted to skip the QueryParser and use the “DAO” (for lack of a better term) to construct queries using the classes provided. It is generally a best practice to use DAOs and such abstractions when available over raw query compilation for a variety of reasons, foremost being security (implicit injection protection) and query syntax integrity.

However, you may experience perplexing, incorrect result sets with your Lucene query if the following circumstances are true:

  • Your index is written with an analyzer other than the default StandardAnalyzer (e.g. EnglishAnalyzer or any of the plethora of others).
  • Your query is a boolean query with number of OR (aka SHOULD) clauses where ≥ 2.
  • Your query requires a minimum m number of boolean clauses should match where m ≥ n.

Ordinary query, incorrect results

Here is a simple example of a query that exhibits the latter two circumstances above as built entirely with the DAO (code examples henceforth using Scala for brevity):

This query in plain English means “find documents that contain at least 2 of the terms thanksobama, and barack.”

Now imagine an index of documents as follows written with EnglishAnalyzer (or some other non-StandardAnalyzer):

Running the query on the above documents should yield 2 hits—documents #1 and #2. However, you will receive 0 results in Lucene 3.5.0 (and possibly other versions; did not check).

Unfortunately for me, I was stuck with Lucene 3.5.0 in this particular codebase. Luckily I found a way to sidestep the bug by avoiding the DAO for at least part of the query construction.

Same query, but without DAO (and working now!)

Surprise, surprise, this works! Documents #1 and #2 from before will match as expected.

Note on protecting against query injection

If you must use QueryParser.parse  as in the case above, you should also make it a habit to use  QueryParser.escape (a static method) on the string you pass to the parse method (e.g. myQueryParser.parse(QueryParser.escape("potentially dangerous user input")) ). The reasons are beyond the scope of this post; just Google “query injection” and pick one of the endless writings on that.

read more

Fixing autocomplete (autofill) on AngularJS form submit

Many browsers support autocomplete but do not trigger “change” events on inputs when the user initiates autocomplete. This is a problem for many libraries and code in the wild that rely on performing some action (e.g. input validation) when input data change.

With respect to AngularJS forms, the problem becomes obvious if you are using AngularJS form validation. If you autofill an empty form then press Submit, AngularJS will think the inputs are still empty:

AngularJS unaware of autofilled inputs

AngularJS unaware of autofilled inputs

Here is a simple login form subject to this issue:

Unfortunately the underlying issue of not having an appropriate event related to autofill must be addressed by browser vendors. However in the meantime, we can use a custom directive to ensure AngularJS knows about form changes made by autofill at time of form submit.

Patch directive

In CoffeeScript

In JavaScript

Directive in use

The directive is simple to use; just apply it to your form and away you go:

read more

How to use rich objects and typed objects with Bootstrap Typeahead

The JS components of Twitter Bootstrap are somewhat lacking in functionality when compared to well established plugins that do the same things (autocompletes, tooltips, etc.). However, I still use them on Bootstrap projects just because it is nice to have homogeneous interfaces to things—and, of course, it is easy then to jump on other Bootstrap projects and be in familiar territory right off the bat.

One of the continuing problems I face is with Bootstrap Typeahead, in particular its lack of native support for non-String items. I constantly come across this; rare is it that one can identify an item in a list by that item’s user-friendly string representation, i.e. more often than not you need to be working with a fully typed thing, not just a string.

Other autocompletes and typeaheads support rich objects out of the box but Typeahead needs a little massaging—fortunately not much, and not anything requiring changes to the core. Here is a sample showing a Typeahead that uses a rich UsState class (a natural extension to the Twitter demo using string state names):

Bootstrap Typeahead only gets tripped up with rich object sources for the updater method namely. The key component here is ensuring your class has a toString() method that serializes instances correctly, as well as a fromString() static method to deserialize. For my purposes I include the popular json2.js for JSON.stringify() and JSON.parse():

Then, in your Typeahead updater:

Note that Typeahead passes the stringified object, which we deserialize, and expects a string return to stuff into the text input.

And because I love CoffeeScript so much, here is the same demo in CS:

read more

PHP error_reporting E_STRICT at runtime

Often times PHP applications configure error_reporting at runtime, perhaps because it appears to be more bootstrappable:

In dev environments, it is often desirable to log errors of level E_STRICT to be warned about practices that are “deprecated or not future proof.” Unfortunately, enabling E_STRICT at runtime using one of the methods above will never work—giving you a false sense of security with respect to strict-level errors! This is because PHP performs strict checks at compile time before your runtime error_reporting configs are set. In other words, this will not work:

To get E_STRICT errors, you must take care to set the value of error_reporting before compile time, such as in php.ini, or httpd.conf or .htaccess if you are running Apache:

Setting error_reporting in php.ini:

Setting error_reporting in httpd.conf or .htaccess:

The manual does mention this topic, though it may be easily overlooked.

Last but not least, always remember to turn display_errors off in production!

display_errors Off in php.ini

display_errors Off in httpd.conf or .htaccess

read more

Decoupling JS and HTML with AngularJS

The primary reason for the influx of interest in JavaScript MVC frameworks is improving separation of concerns between HTML and JS. I’m going to demonstrate why decoupling of HTML and JS is so important and how AngularJS helps solve the problem.

Why decoupling is so important

Traditional DOM selection, traversal, and manipulation is the root of tightly coupled HTML and JS—and the bane of a front-end developer’s existence. It’s not that it is difficult to write or understand, but rather that it is exceedingly difficult to maintain. Even the most seemingly innocuous JS containing node names, CSS classes, or event bindings causes a tightly coupled architecture to quickly emerge. Consider a simple jQuery app that lets the user arbitrarily add items to a list:

I took it a step further than many jQuery apps in the wild by introducing a rudimentary template for the list item rather than compositing the item entirely in JS. However, the app still naturally suffers from some serious coupling. Here are a few realistic scenarios the app might face in the future:

  1. Multiple vehicle entry methods needed.
    • Will need to change element selection by ID to selection by class.
    • Will need to adjust DOM selection in JS.
    • What is conceptually a change only to HTML clearly will also require changes to JS.
  2. List item template changes.
    • Will need to change template DOM selection and manipulation in JS.
    • Same as above; conceptually HTML-only, actually not.
  3. Multiple displays of garage needed.
    • Similar problems as #1 and #2.

The code changes required to implement these features are significant and involve disparate contexts and thus are costly and prone to error. Here is a simple diagram to visualize the coupling:

Separation of concerns

Surely there must be a better way. Consider how the diagram above might change if we introduce MVC-inspired separation of concerns:

Enter AngularJS

Here’s the same app using basic AngularJS instead of jQuery. (If you’re keen on UX, you’ll notice I left out a few usability features from the previous example. We’ll add those back momentarily using more advanced AngularJS techniques.)

The DOM selection, traversal, and manipulation disappear! What you see is effectively the diagram of the three circles—the vehicle data is now abstract and the DOM and JS coupling is virtually eliminated. The three “what ifs” mentioned before which are conceptually HTML-only changes now actually are HTML-only changes since the underlying data model doesn’t change (and the JS is only referencing the abstract data model).

If I add only a little bit of HTML on the view layer, I get cool new features that operate on the same data set—no changes to JS required (conceptual view-only change now a reality):

Usability sugar as promised




So you may have noticed a few UX niceties in the jQuery version of the app that we lost later on:

  • Vehicle name input focused after changing vehicle type.
  • Vehicle name input focused after clicking submit button with empty name and getting “name required” error.
  • Vehicle name input focused after clicking submit button and successfully adding a vehicle.

Doing this in AngularJS is to some extent another topic—still along the lines of separation of concerns, but not exactly MVC—very specific to the problem domain of the DOM. The way AngularJS solved this particular problem is clever, unique among JS frameworks, and extremely robust; the solution is in what are called directives, “a way to teach HTML new tricks.” What I am doing here is essentially inventing a couple new HTML element attributes that will abstract out the concept that is focusing an input on various cues:

Note the new directives focusAfterChange and focusIf defined in the JS along with associated usage in HTML as <input ... focus-after-change="..." focus-if="..." />, which elegantly encapsulates the behavior and still manages to abstract away DOM selection, etc.

Comments or questions are always welcome!

read more