winutil模拟器设置
-
选择对应版本配置配置
HADOOP_HOME
、Path
(null) entry in command string: null chmod 0644
- 将对应
hadoop.dll
拷贝到C:\windows\system32
目录中
- 将对应
- 模拟器执行Linux命令:
winutils.exe [Linux shell]
Spark本地环境
-
下载spark安装包
-
配置
SPARK_HOME
、Path
环境变量,路径中不要有空格
-
命令行窗口输入
spark-shell
启动spark服务
Saprk开发环境
windows下需要winutils.exe、hadoop.dll等文件,不是强依赖Hadoop环境、Spark环境
- 安装Scala本地环境
- 下载安装包
- 配置
SCALA_HOME
、Path
环境变量
-
安装IDEA的scala插件,选择与IDEA版本匹配
- 新建Maven项目工程
-
增加scala框架支持,选择
scala
-
设置SDK库
-
配置Maven依赖
-
<properties>
<spark.version>2.4.3</spark.version>
<scala.version>2.12</scala.version>
</properties>
<dependencies>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.12</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<!-- scala版本与spark版本要匹配-->
<artifactId>spark-core_${scala.version}</artifactId>
<version>${spark.version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-assembly-plugin</artifactId>
<version>3.1.1</version>
<configuration>
<classifier>dist</classifier>
<appendAssemblyId>true</appendAssemblyId>
<descriptorRefs>
<descriptor>jar-with-dependencies</descriptor>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<repositories>
<repository>
<id>nexus-aliyun</id>
<name>Nexus aliyun</name>
<url>http://maven.aliyun.com/nexus/content/groups/public</url>
</repository>
</repositories>
WordCount
package com.timebusker
import org.apache.spark.{SparkConf, SparkContext}
/*
* @DESC:WordCount:单词统计
* @author:timebusker
* @date:2019 /5/13
*/
object WordCount {
def main(args: Array[String]): Unit = {
val inputFile = "D:\\allfiles.txt"
val conf = new SparkConf().setAppName("WordCount").setMaster("local")
val sc = new SparkContext(conf)
val textFile = sc.textFile(inputFile)
val wordCount = textFile.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey((a, b) => a + b)
// 控制台输出
wordCount.foreach(println)
// 保存为文件
wordCount.saveAsTextFile("D:\\result")
}
}