[jruby] "invalid byte sequence in UTF-8" from shell command with binary output [JRuby-1.7.20.1/JRuby-9000]
Lenny Marks
lenny at aps.org
Wed Jul 29 11:54:21 JST 2015
We recently updated some deployments from Java 6 to Java 7 and suddenly started seeing "invalid byte sequence in UTF-8” errors from some code we have that shells out to pdftk. I did some digging and discovered the reason for the behavior change between Java 6 and 7 was because ProcessManager is only used on Java > 1.6.
https://github.com/jruby/jruby/blob/1_7_20/core/src/main/ruby/jruby/kernel.rb#L17
The process_manager.rb code runs a gsub on the process output which can explode when the process outputs binary data (e.g. pdf). JRuby 1.7.20 and master.
https://github.com/jruby/jruby/blob/9adaca4eec9d66bdfddd45aa2631c8953c0c1e3f/core/src/main/ruby/jruby/kernel/jruby/process_manager.rb#L48
This seems incorrect to me and differs from MRI behavior (see below). I wouldn’t expect JRuby to munge process output. File a bug?
MRI:
1.9.3-p547 :001 > File.open('foo', 'w') { |f| f.write("\x92") }
=> 1
1.9.3-p547 :002 > `cat foo`
=> "\x92"
JRuby
jruby-1.7.20.1 :010 > File.open('foo', 'w') { |f| f.write("\x92") }
=> 1
jruby-1.7.20.1 :011 > `cat foo`
ArgumentError: invalid byte sequence in UTF-8
-lenny
More information about the JRuby
mailing list