AI(Artificial Intelligence)/Data Analysis

[SPARK] WINDOWS์— PySpark ์„ค์น˜

ํƒฑ์ ค 2021. 1. 14. 16:27

โ€ป python(version 3 ์ด์ƒ) ์„ค์น˜๋˜์—ˆ๋‹ค๋Š” ๊ฐ€์ • ํ•˜์— ์„ค์น˜

 

  1. JAVA ์„ค์น˜
  2. SPARK ์„ค์น˜
  3. winutils ์„ค์น˜
  4. pyspark ์„ค์น˜
  5. ์„ค์น˜ ํ™•์ธ

1. JAVA ์„ค์น˜

SPARK 3.0.1์€ java 11์„ ์ง€์›ํ•˜๋ฏ€๋กœ ๋ฐ‘์˜ url์— ๋“ค์–ด๊ฐ€ ์ค‘๊ฐ„์˜ 11 JDK ๋‹ค์šด๋กœ๋“œ๋ฅผ ์„ ํƒ.

๊ทธ ์ „์— ์˜ค๋ผํด ๊ณ„์ • ๋งŒ๋“ค๊ธฐ ํ•„์ˆ˜

www.oracle.com/java/technologies/javase-downloads.html

 

์œˆ๋„์šฐ ๋ฒ„์ „ ํด๋ฆญ ํ›„ next, next ๋ˆ„๋ฅด๋ฉด์„œ ์„ค์น˜.

 

์ œ์–ดํŒ - ์‹œ์Šคํ…œ ๋ฐ ๋ณด์•ˆ - ์‹œ์Šคํ…œ ๋“ค์–ด๊ฐ€์„œ ๊ณ ๊ธ‰ ์‹œ์Šคํ…œ ์„ค์ •, ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ํด๋ฆญ

ํ™˜๊ฒฝ ๋ณ€์ˆ˜, ์‹œ์Šคํ…œ ๋ณ€์ˆ˜ ํŽธ์ง‘

  • ํ™˜๊ฒฝ ๋ณ€์ˆ˜ ํŽธ์ง‘ Path์— %JAVA_HOME%bin์ถ”๊ฐ€
  • JAVA_HOME ์‹œ์Šคํ…œ ๋ณ€์ˆ˜ ์ถ”๊ฐ€ C:\Program Files\Java\jdk-11.0.9

๋ณ€์ˆ˜ ์ถ”๊ฐ€ ํ›„ ์ž˜ ์„ค์น˜๋˜์—ˆ๋Š”์ง€ cmd์—์„œ ํ™•์ธ: java -version

์„ค์น˜ ์™„๋ฃŒ

 

2. ์ŠคํŒŒํฌ ๋‹ค์šด๋กœ๋“œ (ํ•˜๋‘ก 2.7)

spark.apache.org/downloads.html

 

Downloads | Apache Spark

Download Apache Spark™ Choose a Spark release: Choose a package type: Download Spark: Verify this release using the and project release KEYS. Note that, Spark 2.x is pre-built with Scala 2.11 except version 2.4.2, which is pre-built with Scala 2.12. Spar

spark.apache.org

tgz ํŒŒ์ผ ๋‹ค์šด๋กœ๋“œ ๋งํฌ ๋“ค์–ด๊ฐ€๋ฉด HTTP ๋“ฑ ์‚ฌ์ดํŠธ ๋‚˜์˜ค๋Š” ๋ฐ ๊ทธ ์ค‘ ํ•˜๋‚˜ ํด๋ฆญํ•ด์„œ ๋‹ค์šด๋กœ๋“œ ์‹œ์ž‘

 

๋‚ด PC์˜ C: ๋“œ๋ผ์ด๋ธŒ ์•ˆ์— Spark ํด๋” ์ƒ์„ฑํ•ด tgz ํŒŒ์ผ์„ ํด๋” ์•ˆ์œผ๋กœ ์˜ฎ๊ฒจ์ค€๋‹ค.

์ดํ›„ tgz ์••์ถ• ํ•ด์ œํ•˜๊ธฐ

 

3. Winutils ์„ค์น˜ํ•˜๊ธฐ

๋‚ด PC - Windows(C:) ๋ฐ‘์— Hadoop ํด๋” ์ƒ์„ฑ ํ›„ ๊ทธ ์•ˆ์— bin ํด๋” ์ƒ์„ฑ

winutils.exe ํŒŒ์ผ ๊ทธ ํด๋”์— ๋‹ค์šด๋กœ๋“œ

http://github.com/cdarlint/winutils

 

cdarlint/winutils

winutils.exe hadoop.dll and hdfs.dll binaries for hadoop windows - cdarlint/winutils

github.com

SPARK ์„ค์น˜ ์‹œ ํ•˜๋‘ก 2.7 ๋ฒ„์ „ ์„ ํƒํ–ˆ์œผ๋ฏ€๋กœ ํ•˜๋‘ก 2.7 ํด๋”์— ๋“ค์–ด๊ฐ€ winutils.exe ๋‹ค์šด๋กœ๋“œ

์ด์ œ ๋‹ค์‹œ ํ™˜๊ฒฝ๋ณ€์ˆ˜ ํŽธ์ง‘(๊ณ ๊ธ‰ ์‹œ์Šคํ…œ ์„ค์ • - ํ™˜๊ฒฝ ๋ณ€์ˆ˜)์œผ๋กœ ๊ฐ€์„œ ์‹œ์Šคํ…œ ๋ณ€์ˆ˜ ํŽธ์ง‘

์‹œ์Šคํ…œ ๋ณ€์ˆ˜ ์ƒˆ๋กœ ๋งŒ๋“ค๊ธฐ - SPARK_HOME์ด๋ฆ„์œผ๋กœ C:\Spark\spark-3.0.1-bin-hadoop2.7 ๋ณ€์ˆ˜ ์ถ”๊ฐ€

์‹œ์Šคํ…œ ๋ณ€์ˆ˜ ์ƒˆ๋กœ ๋งŒ๋“ค๊ธฐ - HADOOP_HOME ์ด๋ฆ„์œผ๋กœ C:\Hadoop ๋ณ€์ˆ˜ ์ถ”๊ฐ€

์ด์ œ Path ์‹œ์Šคํ…œ ๋ณ€์ˆ˜์— %SPARK_HOME%\bin ๊ณผ %HADOOP_HOME%\bin ์ถ”๊ฐ€

 

์œ„์—์„œ๋ถ€ํ„ฐ ์ถ”๊ฐ€ํ–ˆ๋˜ ํ™˜๊ฒฝ ๋ณ€์ˆ˜๋“ค๊นŒ์ง€ ํ™•์ธ ๊ฐ€๋Šฅ

 

4. cmd์—์„œ pyspark pip์œผ๋กœ ์„ค์น˜ 'pip install pyspark'

5. cmd์—์„œ ์„ค์น˜ ํ™•์ธ 'pyspark'

 

 

 

728x90