- 苏州马小云
-
IPythonConfigurationThisinstallationworkflowlooselyfollowstheonecontributedbyFernandoPerezhere.ThisshouldbeperformedonthemachinewheretheIPythonNotebookwillbeexecuted,typicallyoneoftheHadoopnodes.FirstcreateanIPythonprofileforusewithPySpark.1ipythonprofilecreatepysparkThisshouldhavecreatedtheprofiledirectory~/.ipython/profile_pyspark/.Editthefile~/.ipython/profile_pyspark/ipython_notebook_config.pytohave:12345c=get_config()c.NotebookApp.ip="*"c.NotebookApp.open_browser=Falsec.NotebookApp.port=8880#orwhateveryouwant;beawareofconflictswithCDHIfyouwantapasswordpromptaswell,firstgenerateapasswordforthenotebookapp:1python-c"fromIPython.libimportpasswd;printpasswd()">~/.ipython/profile_pyspark/nbpasswd.txtandsetthefollowinginthesame/ipython_notebook_config.pyfileyoujustedited:12PWDFILE="~/.ipython/profile_pyspark/nbpasswd.txt"c.NotebookApp.password=open(PWDFILE).read().strip()Finally,createthefile~/.ipython/profile_pyspark/startup/00-pyspark-setup.pywiththefollowingcontents:123456789importosimportsysspark_home=os.environ.get("SPARK_HOME",None)ifnotspark_home:raiseValueError("SPARK_HOMEenvironmentvariableisnotset")sys.path.insert(0,os.path.join(spark_home,"python"))sys.path.insert(0,os.path.join(spark_home,"python/lib/py4j-0.8.1-src.zip"))execfile(os.path.join(spark_home,"python/pyspark/shell.py"))StartingIPythonNotebookwithPySparkIPythonNotebookshouldberunonamachinefromwhichPySparkwouldberunon,typicallyoneoftheHadoopnodes.First,makesurethefollowingenvironmentvariablesareset:12345#fortheCDH-installedSparkexportSPARK_HOME="/opt/cloudera/parcels/CDH/lib/spark"#thisiswhereyouspecifyalltheoptionsyouwouldnormallyaddafterbin/pysparkexportPYSPARK_SUBMIT_ARGS="--masteryarn--deploy-modeclient--num-executors24--executor-memory10g--executor-cores5"NotethatyoumustsetwhateverotherenvironmentvariablesyouwanttogetSparkrunningthewayyoudesire.Forexample,thesettingsaboveareconsistentwithrunningtheCDH-installedSparkinYARN-clientmode.IfyouwantedtorunyourowncustomSpark,youcouldbuildit,puttheJARonHDFS,settheSPARK_JARenvironmentvariable,alongwithanyothernecessaryparameters.Forexample,seehereforrunningacustomSparkonYARN.Finally,decidefromwhatdirectorytoruntheIPythonNotebook.Thisdirectorywillcontainthe.ipynbfilesthatrepresentthedifferentnotebooksthatcanbeserved.SeetheIPythondocsformoreinformation.Fromthisdirectory,execute:1ipythonnotebook--profile=pysparkNotethatifyoujustwanttoservethenotebookswithoutinitializingSpark,youcanstartIPythonNotebookusingaprofilethatdoesnotexecutetheshell.pyscriptinthestartupfile.ExampleSessionAtthispoint,theIPythonNotebookservershouldberunning.Pointyourbrowserto:8880/,whichshouldopenupthemainaccesspointtotheavailablenotebooks.Thisshouldlooksomethinglikethis:Thiswillshowthelistofpossible.ipynbfilestoserve.Ifitisempty(becausethisisthefirsttimeyou"rerunningit)youcancreateanewnotebook,whichwillalsocreateanew.ipynbfile.Asanexample,hereisascreenshotfromasessionthatusesPySparktoanalyzetheGDELTeventdataset:Thefull.ipynbfilecanbeobtainedasaGitHubgist.