A &bittersweet& Lesson On Copyright

Posted on  by 



Lesson

Recently I upgraded some older Rails applications to Rails 3.1 and Ruby 1.9.2 (from 2.3 and 1.8.7 respectively). One post-upgrade issue was that text content had a lot of garbage showing up like –, ’, “, etc. For example, here’s an actual example from a comment in one of the applications:

One of my “things to do before I’m 50” is

A &bittersweet& lesson on copyright act

A &bittersweet& Lesson On Copyright Code

This should read:

A &bittersweet& Lesson On Copyright Law

A &bittersweet& Lesson On Copyright

A definition, not any particular or certain one of a class or group: a man; a chemical; a house. Department of Animal Services 221 N. Figueroa Street, Suite 600, Los Angeles, CA 90012 (888) 452-7381 Administrative Office Hours: Mon. (8am-5pm) Sat., Sun. & Holidays (Closed). Three cases from The First 48 archive that took place in broad daylight are featured: in New Orleans a night out ends with a deadly morning after a random encounter; in Tulsa an argument over a cell phone escalates into a double homicide; and in Birmingham a lovers' breakup goes very wrong in a public park.

A Quiet Place

One of my “things to do before I’m 50” is

It turns out these are just special characters that were improperly encoded for utf-8. The fix is simple enough: loop through your content and replace where needed.

A &bittersweet& Lesson On Copyright Act

If your database is big, this could take a long time unless you disable callbacks. The script below highlights both how to replace the characters using Ruby and how to disable your Rails callbacks to make this script run in seconds instead of hours (depending on the complexity of your callbacks).

A Bola

If you noticed, I used a regular expression for the curly close quote. This is because there is an invisible control character that is not easily copy/pasted into your code. Using [[:cntrl:]] is just an easier way to catch it.





Coments are closed