[jruby] "invalid byte sequence in UTF-8" from shell command with binary output [JRuby-1.7.20.1/JRuby-9000]

Lenny Marks lenny at aps.org
Wed Jul 29 11:54:21 JST 2015


We recently updated some deployments from Java 6 to Java 7 and suddenly started seeing "invalid byte sequence in UTF-8” errors from some code we have that shells out to pdftk. I did some digging and discovered the reason for the behavior change between Java 6 and 7 was because ProcessManager is only used on Java > 1.6.

https://github.com/jruby/jruby/blob/1_7_20/core/src/main/ruby/jruby/kernel.rb#L17

The process_manager.rb code runs a gsub on the process output which can explode when the process outputs binary data (e.g. pdf). JRuby 1.7.20 and master.

https://github.com/jruby/jruby/blob/9adaca4eec9d66bdfddd45aa2631c8953c0c1e3f/core/src/main/ruby/jruby/kernel/jruby/process_manager.rb#L48

This seems incorrect to me and differs from MRI behavior (see below). I wouldn’t expect JRuby to munge process output. File a bug?

MRI:

1.9.3-p547 :001 > File.open('foo', 'w') { |f| f.write("\x92") }
 => 1 
1.9.3-p547 :002 > `cat foo`
 => "\x92" 

JRuby
jruby-1.7.20.1 :010 > File.open('foo', 'w') { |f| f.write("\x92") }
 => 1 
jruby-1.7.20.1 :011 > `cat foo`
ArgumentError: invalid byte sequence in UTF-8

-lenny


More information about the JRuby mailing list