Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'%include ***.sas /source2' crashes the session. when the included file has another enconding with session encoding #266

Closed
kjnh10 opened this issue Oct 16, 2019 · 19 comments

Comments

@kjnh10
Copy link
Contributor

kjnh10 commented Oct 16, 2019

Describe the bug
'%include ***.sas /source2' crashes the session when the included file has another encoding with te session encoding

reproduce.py

import saspy
sess = saspy.SASsession(
    cfgname="winiomlinux",
    omruser='****',
    omrpw='****',
    )

print(sess)
# ....
# SAS Version           = 9.04.01M6P11072018
# SASPy Version         = 3.1.6
# Teach me SAS          = False
# Batch                 = False
# Results               = Pandas
# SAS Session Encoding  = shift-jis
# Python Encoding value = shift_jis

sess.submit('%include "/****/utf8.sas" /SOURCE2;')
# -> this will crash session and says
# Connection Reset: SAS process has terminated unexpectedly. Pid State= 1

utf8.sas

/* comment only file. saved with utf8. even does not include special characters */

If I change the encoding of the included file to shift-jis, this code works.
Just note: SAS Enterprise guide can handle '%include "/****/utf8.sas" /SOURCE2;' even though the session encoding is shift-jis (SAS server is same)

To Reproduce
See above

Expected behavior
this works fine.

Screenshots
None

Desktop (please complete the following information):

  • OS: [jupyter on windows, sas server on linux]
  • SAS Version: 9.04.01M6P11072018
  • SASPy Version: 3.1.6
@tomweber-sas
Copy link
Contributor

so, that file contains only the following line?

/* comment only file. saved with utf8. even does not include special characters */

And how are you 'changing the encoding of the file'?

I'd like to try to reproduce this, but there's nothing in those characters that would be anything than their single byte ASCII values in latiin1, wlatin1, utf-8, or shift-jis, so I'm confused at how you say the file is encoded, and how you change the encoding?

I have seen that the IOM client will crash when it get's 0x00 bytes streamed to it from SAS for things it expects to come through as stings. I'm disappointed by that, and I have made some changes in my use of it in some cases to do my own binary streaming for cases I can. I don't know if this is what is happening or not, that's why I'd like to reproduce it here. I should be able to start up an IOM server running shift-jis, but I'm not really sure about this file you're including.

Thanks,
Tom

@tomweber-sas
Copy link
Contributor

So, I think I'm running as you are, though I can't be sure about that. I am running SAS in shift-jis, and Icreated a file with that line in it. Here's what I get:

image

Are you able to view your file in binary (hex), so we can see what it really contains?

image

Thanks!
Tom

@kjnh10
Copy link
Contributor Author

kjnh10 commented Oct 18, 2019

Hi @tomweber-sas,

I misunderstood what lead to the error.
The important point was whether a file includes Japanese characters or not.

I'm really sorry.

I will upload sample files anywhere.

kjnh10 added a commit to kjnh10/saspy that referenced this issue Oct 18, 2019
Both files include Japanese characters

utf8.sas -> fail
sjis.sas  -> work
@kjnh10
Copy link
Contributor Author

kjnh10 commented Oct 18, 2019

Hi I uploaded 2 files.

kjnh10@6d0e1db

Also note that I controlled encoding to be used by VS code.

@kjnh10
Copy link
Contributor Author

kjnh10 commented Oct 18, 2019

image

@tomweber-sas
Copy link
Contributor

Ok, that certainly makes more sense. I'll see if I can reproduce this on my side. I am expecting I will be able to, and that it may even be that issue I mentioned with the IOM client, but I'll need to prove that either way and see what I can do; that's just speculation on my part so fat. Thanks for the updated data!
Tom

@tomweber-sas
Copy link
Contributor

Just a quick update. I've been busy on these other issues. But, I was just able reproduce this here, so I'll be debugging it tomorrow to see what I see. Hopefully there's a way for me to fix this in some code of mine somehow; that would be ideal :). More tomorrow ...
Tom

@tomweber-sas
Copy link
Contributor

@kjnh10 I have a fix for this! It's at master. It's the saspyiom.jar file, so if you can pull from master, just be sure you're classpath is pointing to this new jar. I was able to catch the transcoding error down in the orb stuff and recover from it. I can't get he log itself as that failed, but I get an error message and am able to continue work w/ saspy with no problems after that. See how it works for you.

Here's a view of it on my system. I was getting the same connection reset, SAS terminated error when submitting it before the fix.

issu266

@kjnh10
Copy link
Contributor Author

kjnh10 commented Oct 24, 2019

@tomweber-sas ,

Thanks, I confirmed that a session does not crash when importing utf-8 code!
So this is huge progress for me, since the crash gets all work lost.....

But the fact that I can't import the file by transcoding error remains though SAS Enterprise guide can handle.

Do you have any thoughts why SAS Enterprise guide can handle utf-8 file?

@tomweber-sas
Copy link
Contributor

Great, thanks for confirming that.
As for EG, I tried running this in EG too, but when I submit the utf8 one, I get no output or anything; and it doesn't crash :) But I don't get the following lines of code run either. It produces nothing.
When I submit the shit-jis one I get the log with the error and some other notes, like in saspy. And the rest of the code runs too.

What are you actually getting when submitting the utf8 file?

shit-jis:
issu266

utf-8:
issu266
issu266

@kjnh10
Copy link
Contributor Author

kjnh10 commented Oct 28, 2019

Hi @tomweber-sas ,

Thanks for comfirming.

In my environment, same as your environment.
But following lines of code (like libname statement) in utf8.sas works even though there is no output. I represented this behavior as 'SAS EG can handle including utf8 file. ' before.

I'm gettig feeling that saspy has the same behavior though it does output reading log errors.
SAS EG is just ignoring log errors?

@kjnh10
Copy link
Contributor Author

kjnh10 commented Oct 28, 2019

Sorry,
My comment was maybe wrong....
I am rechecking the behavior.

Please wait for a while.

@tomweber-sas
Copy link
Contributor

Actually, I believe I'm seeing what you said. I added a couple data steps so I could see if they were executed or not. They were. On the SAS server, it's not causing any real problem; the other things being submitted still run. But the log, with whatever that was, never shows up as it's getting transcode errors in the IOM client side. At least that's what I'm seeing in saspy. Can only assume it's the same issue for EG.

Running the following only shows the first data step in the log, though both tables a, b were created. Nothing else shows up in the log.

EG:

data a;x=1;run;
libname sashelp list;
%include 'C:\Users\sastpw\Documents\issue266u.sas' /source2;
libname work list;
data b;x=2;run;

Log:

1                                                           SAS システム                          2019年10月28日 月曜日 07時51分49秒

1          ;*';*";*/;quit;run;
2          OPTIONS PAGENO=MIN;
3          %LET _CLIENTTASKLABEL='Program (2)';
4          %LET _CLIENTPROCESSFLOWNAME='Process Flow';
5          %LET _CLIENTPROJECTPATH='C:\Users\sastpw\Documents\EG1.egp';
6          %LET _CLIENTPROJECTPATHHOST='d10a626';
7          %LET _CLIENTPROJECTNAME='EG1.egp';
8          %LET _SASPROGRAMFILE='';
9          %LET _SASPROGRAMFILEHOST='';
10         
11         ODS _ALL_ CLOSE;
12         OPTIONS DEV=PNG;
13         GOPTIONS XPIXELS=0 YPIXELS=0;
14         FILENAME EGSR TEMP;
15         ODS tagsets.sasreport13(ID=EGSR) FILE=EGSR
16             STYLE=HtmlBlue
17             STYLESHEET=(URL="file:///C:/Program%20Files/SASHome/SASEnterpriseGuide/7.1/Styles/HtmlBlue.css")
18             NOGTITLE
19             NOGFOOTNOTE
20             GPATH=&sasworklocation
21             ENCODING=UTF8
22             options(rolap="on")
23         ;
NOTE: TAGSETS.SASREPORT13(EGSR) Bodyファイルの書き込み先: EGSR
24         
25         GOPTIONS ACCESSIBLE;
26         data a;x=1;run;

NOTE: データセットWORK.Aは1オブザベーション、1変数です。
NOTE: DATAステートメント処理(合計処理時間):
      処理時間           0.00 秒
      CPU時間            0.01 秒
      

32         
33         GOPTIONS NOACCESSIBLE;
34         %LET _CLIENTTASKLABEL=;
35         %LET _CLIENTPROCESSFLOWNAME=;
36         %LET _CLIENTPROJECTPATH=;
37         %LET _CLIENTPROJECTPATHHOST=;
38         %LET _CLIENTPROJECTNAME=;
39         %LET _SASPROGRAMFILE=;
40         %LET _SASPROGRAMFILEHOST=;
41         
6                                                           SAS システム                          2019年10月28日 月曜日 07時51分49秒

42         ;*';*";*/;quit;run;
43         ODS _ALL_ CLOSE;
44         
45         
46         QUIT; RUN;
47         

And, with saspy, the same:

sas.list_tables('work')

[]

print(sas.submit("""
data a;x=1;run;
libname sashelp list;
%include 'C:\\Users\\sastpw\\Documents\\issue266u.sas' /source2;
libname work list;
data b;x=2;run;
""")['LOG'])

We failed in reading the Log
U_SHIFT_JIS_CE から U_UTF8_CE エンコーディングへのデータのトランスコードに失敗しました。 SAS セッションのエンコーディングでサポートされない文字が含まれています。SAS システムオプションの encoding= と locale= を調べて、処理データの受け入れが可能かどうかを確認してください。16 進表現によるソース文字列:12                                                          SAS システム                          2019年10月28日 月曜日 08時11分58秒

sas.list_tables('work')

[('A', 'DATA'), ('B', 'DATA')]

So, I think the behavior is the same, except I return the error about the log being messed up while EG just ignores it and you don't notice part of the log is missing. As the transcoding is taking place in the IOM client code, I don't have the buffer with the log in it to even try to return or the option to replace non-translatable characters with the replacement char or anything else. I just don't get that part of the log.

Thanks,
Tom

@tomweber-sas
Copy link
Contributor

So, I've been playing with a little more and what is different is that in the EG log, you actually see the first data step (along with the other boiler plate generated code). But none of the rest of the log past the error.

I've been able to get saspy to return most of the log before and after the error. Though I've done this by reducing the size of the buffer I use to read the log. I was reading the whole log each time, which is why you got nothing except for the error. I've shrunk the buffer down to 32 chars, smaller than I would really want to go but just to see, and I get all of the log, except the 32 chars worth that contained the bytes that weren't able to be transcoded.

So, I need to pick a buffer size between small enough to not loose a significant amount of information for this particular situation, and large enough as to not be a performance issue for every other usual case that never has this happen.

Here's the parts of the log where the missing section is, with the 32 byte buffer; everything before and after is complete in both cases:

No error; shift-jis data

      物理名 = C:\Program Files\SASHome\SASFoundation\9.4\whouse\sashelp
      ファイル名= C:\Program Files\SASHome\SASFoundation\9.4\whouse\sashelp
      所有者名= BUILTIN\Administrators
      ファイルサイズ=              8KB
      ファイルサイズ (バイト)= 8192
104        %include 'C:\Users\sastpw\Documents\issue266j.sas' /source2;
NOTE: %INCLUDE (レベル1)ファイルC:\Users\sastpw\Documents\issue266j.sasはファイルC:\Users\sastpw\Documents\issue266j.sasです。
105       +NOTE: HTML5(SASPY_INTERNAL) Bodyファイルの書き込み先: _TOMODS1
                 _____
                 180
ERROR 180-322: ステートメントが有効でないか、適切な順序で使用されていません。

NOTE: %INCLUDE (レベル1)を終了します。
106        libname work list;

107        data b;x=2;run;

NOTE: データセットWORK.Bは1オブザベーション、1変数です。
NOTE: DATAステートメント処理(合計処理時間):
      処理時間           0.00 秒
      CPU時間            0.00 秒

error; utf8 data

      物理名 = C:\Program Files\SASHome\SASFoundation\9.4\whouse\sashelp
      ファイル名= C:\Program Files\SASHome\SASFoundation\9.4\whouse\sashelp
      所有者名= BUILTIN\Administrators
      ファイルサイズ=              8KB
      ファイルサイズ (バイト)= 8192
88         %include 'C:\Users\sastpw\Documents\issue266u.sas' /source2;
NOTE: %INCLUDE (レベル1)ファイルC:\Users\sastpw\Documents\issue266u.sasはファイルC:\Users\sastpw\Documents\issue266u.sasです。
89        +NOTE: HTML5(SASPY_INTERNAL) Body繝輔ぃ繧、We failed in reading the Log
U_SHIFT_JIS_CE から U_UTF8_CE エンコーディングへのデータのトランスコードに失敗しました。 SAS セッションのエンコーディングでサポートされない文字が含まれています。SAS システムオプションの encoding= と locale= を調べて、処理データの受け入れが可能かどうかを確認してください。16 進表現によるソース文字列:                 _____
                 180
ERROR 180-322: ステートメントが有効でないか、適切な順序で使用されていません。

NOTE: %INCLUDE (レベル1)を終了します。
90         libname work list;

91         data b;x=2;run;

NOTE: データセットWORK.Bは1オブザベーション、1変数です。
NOTE: DATAステートメント処理(合計処理時間):
      処理時間           0.00 秒
      CPU時間            0.01 秒

What are your thoughts?
Tom

@kjnh10
Copy link
Contributor Author

kjnh10 commented Oct 30, 2019

@tomweber-sas
This looks very good to me and I want this to be merged!!

Thanks,

@tomweber-sas
Copy link
Contributor

Cool. I've pushed it to master. This changed saspyiom.jar, so be sure when you pull, that your classpath for this jar is pointing the the new one from master. If you're pointing to the install, then you're good.

Also, I can't bring myself to set the buffer, for getting the log, to 32 bytes; I want to get the log in 1 call, not 10s, 100s, Ks ... So, I still have a large default size so as to not cause any performance issue. Again, this isn't the 99% use case.

So, first is that this no longer abends, you see the error message about the transcoding error:

We failed in reading the Log
U_SHIFT_JIS_CE から U_UTF8_CE エンコーディングへのデータのトランスコード ... whatever comes out

and it keeps running fine. Depending upon what was submitted, you may get some amount of log before and after (context) the error message, though maybe none. I've added an configuration definition option for this access method so you can control the size of the log buffer; logbufsz=
You can set this in your config def or specify it dynamically on the SASsession call:

iomj     = {'java'      : '/usr/bin/java',
            'authkey'   : 'saspy1',
            'iomhost'   : 'localhost',
            'iomport'   : 8591,
            'timeout'   : 0,
            'classpath' : cp,
            'logbufsz'  : 32   # set this if you want it all the time
            }           
# or don't put it in the definition, but only set it for a session you need it for on the fly
sas = saspy.SASsession(cfgname='iom', logbufsz=32)

32 is the smallest you can set, and you'll get all the context I can give you. Like in the post a few above.

You can set it low and run like that all the time; generally, the log isn't so large that you would actually notice. Or, if you ever see this error, you can then just try that again after setting it low on a new SAssession.

I'm glad you posted this problem here, since now, it's fixed and is working better than I would have expected; you can now get (most) all of the log, and an error message so you know something happened. So that better then EG? :) :)

I'll get this documented too, for the next release when I create it. But you should be good with it from master.

Let me know if it works as you expect at your site with your cases!

Thanks,
Tom

@kjnh10
Copy link
Contributor Author

kjnh10 commented Oct 31, 2019

@tomweber-sas
Wow, this is more than what I expeted.
I hope these will be released:)

Thanks as always.

@tomweber-sas
Copy link
Contributor

I aim to please :) And, actually, I think I just got the last issue I've been working on, along with this and others from the past 2 weeks, resolved, So, once I get that finalized, and update the doc, tested ... I will be building a new release - V3.1.7. That will contain this fix and the others, so then you'll have this in the latest production version.

Thanks again!
Tom

@tomweber-sas
Copy link
Contributor

This is now available in the current production release - 3.1.7. I'll close this now, but let me know if there's anything else!

Thanks!
Tom

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants