制造业数据分析师面试题库：MES/ERP/SCADA系统解析与OEE建模

本文来源于数据从业者全栈知识库，更多体系化内容请访问知识库。

概述

制造业数据工作特点

工业4.0转型：传统制造向智能制造升级
IoT设备普及：大量传感器和设备数据
质量管控严格：零缺陷生产要求
供应链复杂：多层级供应商管理
成本敏感：精益生产和成本控制
合规要求：安全生产和环保标准

核心技术栈

数据采集：SCADA、MES、ERP系统
时序数据库：InfluxDB、TimescaleDB
实时处理：Apache Kafka、Spark Streaming
机器学习：预测性维护、异常检测
可视化：Grafana、工业大屏

1. 数据分析师 - 制造业

基础能力考察

1.1 制造业务理解

问题：请解释制造业中的MES、ERP、SCADA系统分别负责什么功能？它们之间的数据流关系如何？

参考答案：

ERP（企业资源计划）：负责企业层面的资源规划，包括订单管理、财务、人力资源、供应链管理
MES（制造执行系统）：连接ERP和车间层，负责生产计划执行、工序管理、质量管控、设备管理
SCADA（数据采集与监控）：负责实时数据采集、设备监控、过程控制

数据流关系：

ERP (计划层)
    ↓ 生产订单、物料需求
MES (执行层)
    ↓ 生产指令、质量标准
SCADA (控制层)
    ↑ 实时数据、设备状态
    ↑ 生产进度、质量数据

1.2 生产效率分析

问题：请设计一个分析生产线OEE（整体设备效率）的数据模型。

参考答案：

class OEEAnalyzer:
    def __init__(self):
        self.availability_threshold = 0.85
        self.performance_threshold = 0.95
        self.quality_threshold = 0.99

    def calculate_oee(self, production_data):
        """计算OEE = 可用率 × 性能率 × 质量率"""
        # 可用率 = 实际运行时间 / 计划生产时间
        availability = production_data['actual_runtime'] / production_data['planned_runtime']

        # 性能率 = 实际产量 / (实际运行时间 × 理论产能)
        performance = (production_data['actual_output'] /
                      (production_data['actual_runtime'] * production_data['theoretical_speed']))

        # 质量率 = 合格品数量 / 总产量
        quality = production_data['good_output'] / production_data['actual_output']

        oee = availability * performance * quality

        return {
            'oee': oee,
            'availability': availability,
            'performance': performance,
            'quality': quality,
            'improvement_priorities': self.identify_bottlenecks(availability, performance, quality)
        }

    def identify_bottlenecks(self, availability, performance, quality):
        """识别改进重点"""
        priorities = []
        if availability < self.availability_threshold:
            priorities.append('设备可用率')
        if performance < self.performance_threshold:
            priorities.append('生产效率')
        if quality < self.quality_threshold:
            priorities.append('质量控制')
        return priorities

高级应用场景

1.3 供应链风险分析

问题：如何构建供应商风险评估模型？

参考答案：

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier

class SupplierRiskAnalyzer:
    def __init__(self):
        self.risk_factors = [
            'delivery_performance',  # 交付表现
            'quality_score',        # 质量评分
            'financial_health',     # 财务健康度
            'capacity_utilization', # 产能利用率
            'geographic_risk',      # 地理风险
            'compliance_score'      # 合规评分
        ]

    def calculate_risk_score(self, supplier_data):
        """计算供应商风险评分"""
        # 标准化处理
        scaler = StandardScaler()
        normalized_data = scaler.fit_transform(supplier_data[self.risk_factors])

        # 权重设置
        weights = {
            'delivery_performance': 0.25,
            'quality_score': 0.25,
            'financial_health': 0.20,
            'capacity_utilization': 0.15,
            'geographic_risk': 0.10,
            'compliance_score': 0.05
        }

        # 计算加权风险评分
        risk_scores = []
        for i, supplier in enumerate(normalized_data):
            weighted_score = sum(supplier[j] * weights[factor]
                               for j, factor in enumerate(self.risk_factors))
            risk_scores.append(weighted_score)

        supplier_data['risk_score'] = risk_scores
        supplier_data['risk_level'] = pd.cut(risk_scores,
                                           bins=[0, 0.3, 0.6, 1.0],
                                           labels=['低风险', '中风险', '高风险'])

        return supplier_data

    def recommend_actions(self, supplier_data):
        """推荐风险应对措施"""
        recommendations = []
        for _, supplier in supplier_data.iterrows():
            if supplier['risk_level'] == '高风险':
                recommendations.append({
                    'supplier_id': supplier['supplier_id'],
                    'actions': ['寻找备用供应商', '增加库存缓冲', '加强监控'],
                    'priority': 'High'
                })
            elif supplier['risk_level'] == '中风险':
                recommendations.append({
                    'supplier_id': supplier['supplier_id'],
                    'actions': ['定期评估', '改进计划'],
                    'priority': 'Medium'
                })

        return recommendations

1.4 质量异常根因分析

问题：请设计一个自动化的质量异常根因分析系统。

参考答案：

class QualityRootCauseAnalyzer:
    def __init__(self):
        self.process_parameters = [
            'temperature', 'pressure', 'humidity', 'speed',
            'material_batch', 'operator_id', 'equipment_id'
        ]

    def analyze_defect_patterns(self, quality_data, process_data):
        """分析缺陷模式"""
        import scipy.stats as stats

        # 合并质量和工艺数据
        merged_data = pd.merge(quality_data, process_data, on='timestamp')

        # 按缺陷类型分组分析
        defect_analysis = {}
        for defect_type in merged_data['defect_type'].unique():
            if defect_type != 'normal':
                defect_data = merged_data[merged_data['defect_type'] == defect_type]
                normal_data = merged_data[merged_data['defect_type'] == 'normal']

                significant_factors = []
                for param in self.process_parameters:
                    if param in merged_data.columns:
                        # 进行t检验
                        t_stat, p_value = stats.ttest_ind(
                            defect_data[param].dropna(),
                            normal_data[param].dropna()
                        )

                        if p_value < 0.05:  # 显著性水平
                            significant_factors.append({
                                'parameter': param,
                                'p_value': p_value,
                                'defect_mean': defect_data[param].mean(),
                                'normal_mean': normal_data[param].mean(),
                                'impact_direction': 'higher' if defect_data[param].mean() > normal_data[param].mean() else 'lower'
                            })

                defect_analysis[defect_type] = significant_factors

        return defect_analysis

    def generate_improvement_suggestions(self, root_cause_analysis):
        """生成改进建议"""
        suggestions = {}

        for defect_type, factors in root_cause_analysis.items():
            defect_suggestions = []

            for factor in factors:
                param = factor['parameter']
                direction = factor['impact_direction']

                if param == 'temperature':
                    if direction == 'higher':
                        defect_suggestions.append('降低工艺温度，加强冷却控制')
                    else:
                        defect_suggestions.append('提高工艺温度，确保充分反应')

                elif param == 'pressure':
                    if direction == 'higher':
                        defect_suggestions.append('降低工艺压力，检查压力控制系统')
                    else:
                        defect_suggestions.append('增加工艺压力，提高压实效果')

                # 可以继续添加其他参数的建议逻辑

            suggestions[defect_type] = defect_suggestions

        return suggestions

2. 数据科学家 - 制造业

机器学习应用

2.1 预测性维护建模

问题：请设计一个设备故障预测模型，包括特征工程和模型选择策略。

参考答案：

import pandas as pd
import numpy as np
from sklearn.ensemble import IsolationForest, RandomForestClassifier
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import TimeSeriesSplit
import warnings
warnings.filterwarnings('ignore')

class PredictiveMaintenanceModel:
    def __init__(self):
        self.feature_window = 24  # 24小时特征窗口
        self.prediction_horizon = 72  # 提前72小时预警

    def engineer_features(self, sensor_data):
        """构造预测性维护特征"""
        features = []

        # 按设备分组处理
        for equipment_id in sensor_data['equipment_id'].unique():
            equipment_data = sensor_data[sensor_data['equipment_id'] == equipment_id].copy()
            equipment_data = equipment_data.sort_values('timestamp')

            # 时间序列特征
            for col in ['temperature', 'vibration', 'pressure', 'current']:
                if col in equipment_data.columns:
                    # 滑动窗口统计特征
                    equipment_data[f'{col}_mean_{self.feature_window}h'] = equipment_data[col].rolling(
                        window=self.feature_window).mean()
                    equipment_data[f'{col}_std_{self.feature_window}h'] = equipment_data[col].rolling(
                        window=self.feature_window).std()
                    equipment_data[f'{col}_max_{self.feature_window}h'] = equipment_data[col].rolling(
                        window=self.feature_window).max()
                    equipment_data[f'{col}_min_{self.feature_window}h'] = equipment_data[col].rolling(
                        window=self.feature_window).min()

                    # 趋势特征
                    equipment_data[f'{col}_trend'] = equipment_data[col].diff().rolling(
                        window=12).mean()

                    # 异常检测特征
                    isolation_forest = IsolationForest(contamination=0.1)
                    equipment_data[f'{col}_anomaly_score'] = isolation_forest.fit_predict(
                        equipment_datacol.fillna(method='ffill'))

            # 运行时间特征
            equipment_data['runtime_hours'] = (equipment_data['timestamp'] -
                                             equipment_data['timestamp'].iloc[0]).dt.total_seconds() / 3600

            # 维护历史特征
            if 'last_maintenance' in equipment_data.columns:
                equipment_data['days_since_maintenance'] = (
                    equipment_data['timestamp'] - equipment_data['last_maintenance']).dt.days

            features.append(equipment_data)

        return pd.concat(features, ignore_index=True)

    def create_failure_labels(self, equipment_data):
        """创建故障预测标签"""
        # 基于未来故障时间创建标签
        equipment_data['failure_in_next_72h'] = 0

        for equipment_id in equipment_data['equipment_id'].unique():
            equipment_mask = equipment_data['equipment_id'] == equipment_id
            equipment_subset = equipment_data[equipment_mask].copy()

            # 找到故障时间点
            failure_times = equipment_subset[equipment_subset['failure_occurred'] == 1]['timestamp']

            for failure_time in failure_times:
                # 在故障前72小时内的数据点标记为正样本
                prediction_window = pd.Timedelta(hours=self.prediction_horizon)
                prediction_mask = (
                    (equipment_subset['timestamp'] >= failure_time - prediction_window) &
                    (equipment_subset['timestamp'] <= failure_time)
                )
                equipment_data.loc[equipment_mask & prediction_mask, 'failure_in_next_72h'] = 1

        return equipment_data

    def train_model(self, feature_data):
        """训练预测模型"""
        # 准备特征和标签
        feature_columns = [col for col in feature_data.columns
                          if col not in ['timestamp', 'equipment_id', 'failure_occurred', 'failure_in_next_72h']]

        X = feature_data[feature_columns].fillna(method='ffill').fillna(0)
        y = feature_data['failure_in_next_72h']

        # 时间序列交叉验证
        tscv = TimeSeriesSplit(n_splits=5)

        # 训练随机森林模型
        model = RandomForestClassifier(
            n_estimators=100,
            max_depth=10,
            min_samples_split=20,
            class_weight='balanced',  # 处理不平衡数据
            random_state=42
        )

        # 标准化特征
        scaler = StandardScaler()
        X_scaled = scaler.fit_transform(X)

        model.fit(X_scaled, y)

        # 特征重要性分析
        feature_importance = pd.DataFrame({
            'feature': feature_columns,
            'importance': model.feature_importances_
        }).sort_values('importance', ascending=False)

        return {
            'model': model,
            'scaler': scaler,
            'feature_columns': feature_columns,
            'feature_importance': feature_importance
        }

    def predict_failures(self, model_dict, new_data):
        """预测设备故障"""
        model = model_dict['model']
        scaler = model_dict['scaler']
        feature_columns = model_dict['feature_columns']

        # 特征工程
        engineered_data = self.engineer_features(new_data)

        # 预测
        X_new = engineered_data[feature_columns].fillna(method='ffill').fillna(0)
        X_new_scaled = scaler.transform(X_new)

        failure_probability = model.predict_proba(X_new_scaled)[:, 1]
        failure_prediction = model.predict(X_new_scaled)

        # 添加预测结果
        engineered_data['failure_probability'] = failure_probability
        engineered_data['failure_prediction'] = failure_prediction
        engineered_data['risk_level'] = pd.cut(failure_probability,
                                              bins=[0, 0.3, 0.7, 1.0],
                                              labels=['低风险', '中风险', '高风险'])

        return engineered_data

2.2 工艺参数优化

问题：如何使用机器学习优化生产工艺参数以提高产品质量？

参考答案：

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import GridSearchCV
from scipy.optimize import minimize
import numpy as np

class ProcessOptimizer:
    def __init__(self):
        self.process_params = [
            'temperature', 'pressure', 'flow_rate',
            'catalyst_concentration', 'reaction_time'
        ]
        self.quality_metrics = ['yield', 'purity', 'viscosity']

    def build_process_model(self, historical_data):
        """构建工艺参数-质量关系模型"""
        models = {}

        for quality_metric in self.quality_metrics:
            if quality_metric in historical_data.columns:
                X = historical_data[self.process_params]
                y = historical_data[quality_metric]

                # 网格搜索优化超参数
                param_grid = {
                    'n_estimators': [100, 200],
                    'max_depth': [5, 10, 15],
                    'learning_rate': [0.01, 0.1, 0.2]
                }

                gbr = GradientBoostingRegressor(random_state=42)
                grid_search = GridSearchCV(gbr, param_grid, cv=5, scoring='r2')
                grid_search.fit(X, y)

                models[quality_metric] = {
                    'model': grid_search.best_estimator_,
                    'score': grid_search.best_score_,
                    'params': grid_search.best_params_
                }

        return models

    def optimize_parameters(self, models, constraints, objectives):
        """多目标工艺参数优化"""
        def objective_function(params):
            """目标函数：最大化质量指标加权和"""
            param_dict = dict(zip(self.process_params, params))
            param_array = np.array([params])

            total_score = 0
            for metric, weight in objectives.items():
                if metric in models:
                    predicted_quality = models[metric]['model'].predict(param_array)[0]
                    total_score += weight * predicted_quality

            return -total_score  # 最小化负值等于最大化

        # 参数约束
        bounds = []
        for param in self.process_params:
            if param in constraints:
                bounds.append((constraints[param]['min'], constraints[param]['max']))
            else:
                bounds.append((0, 100))  # 默认约束

        # 优化求解
        result = minimize(
            objective_function,
            x0=[np.mean([bound[0], bound[1]]) for bound in bounds],  # 初始值
            bounds=bounds,
            method='L-BFGS-B'
        )

        optimal_params = dict(zip(self.process_params, result.x))

        # 预测优化后的质量指标
        predicted_qualities = {}
        param_array = np.array([result.x])
        for metric in self.quality_metrics:
            if metric in models:
                predicted_qualities[metric] = models[metric]['model'].predict(param_array)[0]

        return {
            'optimal_parameters': optimal_params,
            'predicted_qualities': predicted_qualities,
            'optimization_success': result.success,
            'improvement_potential': -result.fun
        }

    def sensitivity_analysis(self, models, base_params):
        """参数敏感性分析"""
        sensitivity_results = {}

        for param in self.process_params:
            param_effects = {}
            base_array = np.array([list(base_params.values())])
            base_predictions = {}

            # 基准预测
            for metric in self.quality_metrics:
                if metric in models:
                    base_predictions[metric] = models[metric]['model'].predict(base_array)[0]

            # 参数变化影响分析
            param_index = self.process_params.index(param)
            change_percentages = [-20, -10, -5, 5, 10, 20]

            for change_pct in change_percentages:
                modified_params = base_array.copy()
                modified_params[0, param_index] *= (1 + change_pct / 100)

                effects = {}
                for metric in self.quality_metrics:
                    if metric in models:
                        new_prediction = models[metric]['model'].predict(modified_params)[0]
                        effect = ((new_prediction - base_predictions[metric]) /
                                base_predictions[metric] * 100)
                        effects[metric] = effect

                param_effects[f'{change_pct}%'] = effects

            sensitivity_results[param] = param_effects

        return sensitivity_results

3. 数据工程师 - 制造业

工业数据架构

3.1 IoT数据采集架构

问题：请设计一个制造业IoT数据采集和处理架构，支持百万级传感器的实时数据处理。

参考答案：

import asyncio
import json
from datetime import datetime
from typing import Dict, List
import kafka
from influxdb_client import InfluxDBClient
import redis

class IoTDataPipeline:
    def __init__(self, config):
        self.config = config
        self.kafka_producer = kafka.KafkaProducer(
            bootstrap_servers=config['kafka']['servers'],
            value_serializer=lambda v: json.dumps(v).encode('utf-8')
        )
        self.influx_client = InfluxDBClient(
            url=config['influxdb']['url'],
            token=config['influxdb']['token'],
            org=config['influxdb']['org']
        )
        self.redis_client = redis.Redis(
            host=config['redis']['host'],
            port=config['redis']['port']
        )

    def validate_sensor_data(self, data):
        """传感器数据验证"""
        required_fields = ['device_id', 'timestamp', 'value', 'sensor_type']

        # 基础字段检查
        if not all(field in data for field in required_fields):
            return False, "缺少必需字段"

        # 数据类型检查
        if not isinstance(data['value'], (int, float)):
            return False, "数值类型错误"

        # 时间戳检查
        try:
            timestamp = datetime.fromisoformat(data['timestamp'])
            now = datetime.now()
            if abs((timestamp - now).total_seconds()) > 300:  # 5分钟容忍度
                return False, "时间戳异常"
        except:
            return False, "时间戳格式错误"

        # 数值范围检查
        sensor_limits = {
            'temperature': (-50, 1000),
            'pressure': (0, 1000),
            'vibration': (0, 100),
            'flow_rate': (0, 1000)
        }

        if data['sensor_type'] in sensor_limits:
            min_val, max_val = sensor_limits[data['sensor_type']]
            if not (min_val <= data['value'] <= max_val):
                return False, f"数值超出范围 [{min_val}, {max_val}]"

        return True, "验证通过"

    async def process_sensor_data(self, raw_data):
        """处理传感器数据"""
        # 数据验证
        is_valid, message = self.validate_sensor_data(raw_data)
        if not is_valid:
            await self.handle_invalid_data(raw_data, message)
            return

        # 数据清洗和转换
        cleaned_data = self.clean_data(raw_data)

        # 异常检测
        anomaly_score = await self.detect_anomaly(cleaned_data)
        cleaned_data['anomaly_score'] = anomaly_score

        # 数据分发
        await asyncio.gather(
            self.send_to_real_time_processing(cleaned_data),
            self.store_to_time_series_db(cleaned_data),
            self.update_device_status(cleaned_data)
        )

    def clean_data(self, data):
        """数据清洗"""
        cleaned = data.copy()

        # 数值平滑（移动平均）
        device_id = data['device_id']
        sensor_type = data['sensor_type']

        # 从Redis获取历史数据
        history_key = f"sensor_history:{device_id}:{sensor_type}"
        history = self.redis_client.lrange(history_key, 0, 4)  # 获取最近5个值

        if history:
            history_values = [float(val) for val in history]
            history_values.append(data['value'])
            smoothed_value = sum(history_values) / len(history_values)
            cleaned['smoothed_value'] = smoothed_value
        else:
            cleaned['smoothed_value'] = data['value']

        # 更新历史数据
        self.redis_client.lpush(history_key, data['value'])
        self.redis_client.ltrim(history_key, 0, 9)  # 保留最近10个值
        self.redis_client.expire(history_key, 3600)  # 1小时过期

        return cleaned

    async def detect_anomaly(self, data):
        """异常检测"""
        device_id = data['device_id']
        sensor_type = data['sensor_type']
        current_value = data['value']

        # 从Redis获取统计信息
        stats_key = f"sensor_stats:{device_id}:{sensor_type}"
        stats = self.redis_client.hgetall(stats_key)

        if stats:
            mean = float(stats.get(b'mean', current_value))
            std = float(stats.get(b'std', 0))
            count = int(stats.get(b'count', 1))

            # 更新统计信息（在线算法）
            new_count = count + 1
            new_mean = (mean * count + current_value) / new_count

            if count > 1:
                # 在线方差更新
                old_variance = std ** 2
                new_variance = ((count - 1) * old_variance +
                              (current_value - mean) * (current_value - new_mean)) / count
                new_std = new_variance ** 0.5
            else:
                new_std = 0

            # 异常评分（基于z-score）
            if new_std > 0:
                z_score = abs(current_value - new_mean) / new_std
                anomaly_score = min(z_score / 3.0, 1.0)  # 标准化到[0,1]
            else:
                anomaly_score = 0

            # 更新Redis统计信息
            self.redis_client.hset(stats_key, mapping={
                'mean': new_mean,
                'std': new_std,
                'count': new_count
            })
            self.redis_client.expire(stats_key, 86400)  # 24小时过期

        else:
            # 初始化统计信息
            self.redis_client.hset(stats_key, mapping={
                'mean': current_value,
                'std': 0,
                'count': 1
            })
            anomaly_score = 0

        return anomaly_score

    async def send_to_real_time_processing(self, data):
        """发送到实时处理系统"""
        topic_mapping = {
            'temperature': 'sensor_temperature',
            'pressure': 'sensor_pressure',
            'vibration': 'sensor_vibration',
            'flow_rate': 'sensor_flow'
        }

        topic = topic_mapping.get(data['sensor_type'], 'sensor_general')

        # 添加分区键（按设备ID分区）
        partition_key = data['device_id']

        self.kafka_producer.send(
            topic,
            value=data,
            key=partition_key.encode('utf-8')
        )

    async def store_to_time_series_db(self, data):
        """存储到时序数据库"""
        write_api = self.influx_client.write_api()

        point = {
            "measurement": f"sensor_{data['sensor_type']}",
            "tags": {
                "device_id": data['device_id'],
                "factory": data.get('factory', 'unknown'),
                "line": data.get('production_line', 'unknown')
            },
            "fields": {
                "value": data['value'],
                "smoothed_value": data['smoothed_value'],
                "anomaly_score": data['anomaly_score']
            },
            "time": data['timestamp']
        }

        write_api.write(
            bucket=self.config['influxdb']['bucket'],
            record=point
        )

    async def update_device_status(self, data):
        """更新设备状态"""
        device_id = data['device_id']

        # 设备状态逻辑
        status = "normal"
        if data['anomaly_score'] > 0.8:
            status = "warning"
        elif data['anomaly_score'] > 0.95:
            status = "critical"

        # 更新Redis设备状态
        device_status = {
            'last_update': data['timestamp'],
            'status': status,
            'anomaly_score': data['anomaly_score']
        }

        self.redis_client.hset(
            f"device_status:{device_id}",
            mapping=device_status
        )

        # 如果是告警状态，发送告警消息
        if status in ['warning', 'critical']:
            alert_data = {
                'device_id': device_id,
                'alert_type': status,
                'timestamp': data['timestamp'],
                'anomaly_score': data['anomaly_score'],
                'sensor_type': data['sensor_type'],
                'value': data['value']
            }

            self.kafka_producer.send('alerts', value=alert_data)

    async def handle_invalid_data(self, data, error_message):
        """处理无效数据"""
        error_record = {
            'original_data': data,
            'error_message': error_message,
            'timestamp': datetime.now().isoformat(),
            'error_type': 'validation_failed'
        }

        # 发送到错误处理队列
        self.kafka_producer.send('data_errors', value=error_record)

        # 记录错误统计
        error_key = f"error_count:{data.get('device_id', 'unknown')}"
        self.redis_client.incr(error_key)
        self.redis_client.expire(error_key, 86400)

3.2 数据仓库设计

问题：设计制造业数据仓库的主题域和数据模型。

参考答案：

-- 制造业数据仓库设计

-- 1. 时间维度表
CREATE TABLE dim_time (
    time_key INT PRIMARY KEY,
    date_value DATE,
    year_value INT,
    quarter_value INT,
    month_value INT,
    week_value INT,
    day_value INT,
    hour_value INT,
    minute_value INT,
    is_working_day BOOLEAN,
    shift_code VARCHAR(10),
    INDEX idx_date (date_value),
    INDEX idx_shift (shift_code)
);

-- 2. 设备维度表
CREATE TABLE dim_equipment (
    equipment_key INT PRIMARY KEY AUTO_INCREMENT,
    equipment_id VARCHAR(50) UNIQUE NOT NULL,
    equipment_name VARCHAR(200),
    equipment_type VARCHAR(100),
    manufacturer VARCHAR(100),
    model VARCHAR(100),
    production_line_id VARCHAR(50),
    factory_id VARCHAR(50),
    installation_date DATE,
    capacity_per_hour DECIMAL(10,2),
    status VARCHAR(20),
    effective_date DATE,
    expiry_date DATE,
    INDEX idx_equipment_id (equipment_id),
    INDEX idx_line (production_line_id),
    INDEX idx_factory (factory_id)
);

-- 3. 产品维度表
CREATE TABLE dim_product (
    product_key INT PRIMARY KEY AUTO_INCREMENT,
    product_id VARCHAR(50) UNIQUE NOT NULL,
    product_name VARCHAR(200),
    product_category VARCHAR(100),
    product_family VARCHAR(100),
    standard_cost DECIMAL(10,2),
    target_quality_score DECIMAL(5,2),
    effective_date DATE,
    expiry_date DATE,
    INDEX idx_product_id (product_id),
    INDEX idx_category (product_category)
);

-- 4. 工厂维度表
CREATE TABLE dim_factory (
    factory_key INT PRIMARY KEY AUTO_INCREMENT,
    factory_id VARCHAR(50) UNIQUE NOT NULL,
    factory_name VARCHAR(200),
    region VARCHAR(100),
    country VARCHAR(100),
    manager_name VARCHAR(100),
    capacity_rating VARCHAR(50),
    certification_level VARCHAR(50),
    INDEX idx_factory_id (factory_id),
    INDEX idx_region (region)
);

-- 5. 生产事实表
CREATE TABLE fact_production (
    production_key BIGINT PRIMARY KEY AUTO_INCREMENT,
    time_key INT,
    equipment_key INT,
    product_key INT,
    factory_key INT,
    batch_number VARCHAR(100),
    planned_quantity DECIMAL(12,2),
    actual_quantity DECIMAL(12,2),
    defect_quantity DECIMAL(12,2),
    scrap_quantity DECIMAL(12,2),
    production_time_minutes INT,
    setup_time_minutes INT,
    downtime_minutes INT,
    material_cost DECIMAL(12,2),
    labor_cost DECIMAL(12,2),
    overhead_cost DECIMAL(12,2),
    quality_score DECIMAL(5,2),
    efficiency_rate DECIMAL(5,4),
    created_timestamp TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    FOREIGN KEY (time_key) REFERENCES dim_time(time_key),
    FOREIGN KEY (equipment_key) REFERENCES dim_equipment(equipment_key),
    FOREIGN KEY (product_key) REFERENCES dim_product(product_key),
    FOREIGN KEY (factory_key) REFERENCES dim_factory(factory_key),
    INDEX idx_time (time_key),
    INDEX idx_equipment (equipment_key),
    INDEX idx_product (product_key),
    INDEX idx_batch (batch_number)
);

-- 6. 质量检测事实表
CREATE TABLE fact_quality (
    quality_key BIGINT PRIMARY KEY AUTO_INCREMENT,
    time_key INT,
    equipment_key INT,
    product_key INT,
    batch_number VARCHAR(100),
    inspection_type VARCHAR(50),
    defect_type VARCHAR(100),
    defect_severity VARCHAR(20),
    inspector_id VARCHAR(50),
    test_parameter VARCHAR(100),
    measured_value DECIMAL(12,4),
    specification_min DECIMAL(12,4),
    specification_max DECIMAL(12,4),
    is_conforming BOOLEAN,
    corrective_action VARCHAR(500),
    FOREIGN KEY (time_key) REFERENCES dim_time(time_key),
    FOREIGN KEY (equipment_key) REFERENCES dim_equipment(equipment_key),
    FOREIGN KEY (product_key) REFERENCES dim_product(product_key),
    INDEX idx_time_quality (time_key),
    INDEX idx_batch_quality (batch_number),
    INDEX idx_defect (defect_type)
);

-- 7. 设备监控事实表
CREATE TABLE fact_equipment_monitoring (
    monitoring_key BIGINT PRIMARY KEY AUTO_INCREMENT,
    time_key INT,
    equipment_key INT,
    sensor_type VARCHAR(50),
    measured_value DECIMAL(12,4),
    normal_range_min DECIMAL(12,4),
    normal_range_max DECIMAL(12,4),
    anomaly_score DECIMAL(5,4),
    alert_level VARCHAR(20),
    maintenance_due_days INT,
    FOREIGN KEY (time_key) REFERENCES dim_time(time_key),
    FOREIGN KEY (equipment_key) REFERENCES dim_equipment(equipment_key),
    INDEX idx_time_monitoring (time_key),
    INDEX idx_equipment_monitoring (equipment_key),
    INDEX idx_sensor (sensor_type),
    INDEX idx_alert (alert_level)
);

-- 8. 创建生产效率分析视图
CREATE VIEW view_production_efficiency AS
SELECT
    f.factory_name,
    e.production_line_id,
    e.equipment_name,
    p.product_category,
    t.date_value,
    t.shift_code,
    SUM(fp.actual_quantity) as total_output,
    SUM(fp.planned_quantity) as total_planned,
    SUM(fp.actual_quantity) / SUM(fp.planned_quantity) as output_efficiency,
    SUM(fp.production_time_minutes) as total_production_time,
    SUM(fp.downtime_minutes) as total_downtime,
    (SUM(fp.production_time_minutes) - SUM(fp.downtime_minutes)) /
    SUM(fp.production_time_minutes) as availability_rate,
    AVG(fp.quality_score) as avg_quality_score,
    SUM(fp.defect_quantity) / SUM(fp.actual_quantity) as defect_rate
FROM fact_production fp
JOIN dim_time t ON fp.time_key = t.time_key
JOIN dim_equipment e ON fp.equipment_key = e.equipment_key
JOIN dim_product p ON fp.product_key = p.product_key
JOIN dim_factory f ON fp.factory_key = f.factory_key
GROUP BY
    f.factory_name, e.production_line_id, e.equipment_name,
    p.product_category, t.date_value, t.shift_code;

-- 9. 创建设备健康度分析视图
CREATE VIEW view_equipment_health AS
SELECT
    e.equipment_id,
    e.equipment_name,
    e.production_line_id,
    f.factory_name,
    AVG(fem.anomaly_score) as avg_anomaly_score,
    COUNT(CASE WHEN fem.alert_level = 'critical' THEN 1 END) as critical_alerts,
    COUNT(CASE WHEN fem.alert_level = 'warning' THEN 1 END) as warning_alerts,
    MIN(fem.maintenance_due_days) as days_to_maintenance,
    CASE
        WHEN AVG(fem.anomaly_score) > 0.8 THEN 'Poor'
        WHEN AVG(fem.anomaly_score) > 0.5 THEN 'Fair'
        WHEN AVG(fem.anomaly_score) > 0.2 THEN 'Good'
        ELSE 'Excellent'
    END as health_status
FROM dim_equipment e
JOIN fact_equipment_monitoring fem ON e.equipment_key = fem.equipment_key
JOIN dim_factory f ON e.factory_id = f.factory_id
JOIN dim_time t ON fem.time_key = t.time_key
WHERE t.date_value >= DATE_SUB(CURDATE(), INTERVAL 7 DAY)
GROUP BY e.equipment_id, e.equipment_name, e.production_line_id, f.factory_name;

4. BI分析师 - 制造业

制造业BI解决方案

4.1 生产监控仪表板设计

问题：设计一个制造业实时生产监控仪表板，包括关键指标和可视化方案。

参考答案：

import plotly.graph_objects as go
import plotly.express as px
from plotly.subplots import make_subplots
import pandas as pd
import numpy as np
from datetime import datetime, timedelta

class ManufacturingDashboard:
    def __init__(self):
        self.colors = {
            'primary': '#1f77b4',
            'success': '#2ca02c',
            'warning': '#ff7f0e',
            'danger': '#d62728',
            'info': '#17a2b8'
        }

    def create_oee_gauge(self, current_oee, target_oee=0.85):
        """创建OEE仪表盘"""
        fig = go.Figure(go.Indicator(
            mode = "gauge+number+delta",
            value = current_oee,
            domain = {'x': [0, 1], 'y': [0, 1]},
            title = {'text': "整体设备效率 (OEE)"},
            delta = {'reference': target_oee, 'valueformat': ".1%"},
            gauge = {
                'axis': {'range': [None, 1], 'tickformat': '.0%'},
                'bar': {'color': self.colors['primary']},
                'steps': [
                    {'range': [0, 0.5], 'color': self.colors['danger']},
                    {'range': [0.5, 0.75], 'color': self.colors['warning']},
                    {'range': [0.75, 0.85], 'color': self.colors['info']},
                    {'range': [0.85, 1], 'color': self.colors['success']}
                ],
                'threshold': {
                    'line': {'color': "red", 'width': 4},
                    'thickness': 0.75,
                    'value': target_oee
                }
            }
        ))

        fig.update_layout(
            height=300,
            font={'color': "darkblue", 'family': "Arial"}
        )

        return fig

    def create_production_timeline(self, production_data):
        """创建生产时间线图"""
        fig = make_subplots(
            rows=3, cols=1,
            subplot_titles=['产量趋势', '质量评分', '设备利用率'],
            vertical_spacing=0.08,
            shared_xaxes=True
        )

        # 产量趋势
        fig.add_trace(
            go.Scatter(
                x=production_data['timestamp'],
                y=production_data['hourly_output'],
                mode='lines+markers',
                name='实际产量',
                line=dict(color=self.colors['primary'], width=2)
            ),
            row=1, col=1
        )

        fig.add_trace(
            go.Scatter(
                x=production_data['timestamp'],
                y=production_data['target_output'],
                mode='lines',
                name='目标产量',
                line=dict(color=self.colors['warning'], dash='dash')
            ),
            row=1, col=1
        )

        # 质量评分
        fig.add_trace(
            go.Scatter(
                x=production_data['timestamp'],
                y=production_data['quality_score'],
                mode='lines+markers',
                name='质量评分',
                line=dict(color=self.colors['success'], width=2)
            ),
            row=2, col=1
        )

        # 设备利用率
        fig.add_trace(
            go.Scatter(
                x=production_data['timestamp'],
                y=production_data['utilization_rate'],
                mode='lines+markers',
                name='设备利用率',
                line=dict(color=self.colors['info'], width=2),
                fill='tonexty'
            ),
            row=3, col=1
        )

        fig.update_layout(
            height=600,
            title_text="生产监控时间线",
            showlegend=True
        )

        fig.update_xaxes(title_text="时间", row=3, col=1)
        fig.update_yaxes(title_text="产量 (件/小时)", row=1, col=1)
        fig.update_yaxes(title_text="质量评分", row=2, col=1)
        fig.update_yaxes(title_text="利用率 (%)", row=3, col=1)

        return fig

    def create_defect_analysis(self, defect_data):
        """创建缺陷分析图表"""
        fig = make_subplots(
            rows=1, cols=2,
            subplot_titles=['缺陷类型分布', '缺陷趋势分析'],
            specs={"type": "pie"}, {"type": "bar"}
        )

        # 缺陷类型饼图
        defect_counts = defect_data.groupby('defect_type')['count'].sum()

        fig.add_trace(
            go.Pie(
                labels=defect_counts.index,
                values=defect_counts.values,
                name="缺陷分布",
                marker_colors=px.colors.qualitative.Set3
            ),
            row=1, col=1
        )

        # 缺陷趋势柱状图
        daily_defects = defect_data.groupby(['date', 'defect_type'])['count'].sum().reset_index()

        for defect_type in daily_defects['defect_type'].unique():
            type_data = daily_defects[daily_defects['defect_type'] == defect_type]
            fig.add_trace(
                go.Bar(
                    x=type_data['date'],
                    y=type_data['count'],
                    name=defect_type
                ),
                row=1, col=2
            )

        fig.update_layout(
            height=400,
            title_text="质量缺陷分析"
        )

        return fig

    def create_equipment_heatmap(self, equipment_data):
        """创建设备状态热力图"""
        # 准备热力图数据
        pivot_data = equipment_data.pivot_table(
            index='equipment_id',
            columns='hour',
            values='efficiency',
            aggfunc='mean'
        )

        fig = go.Figure(data=go.Heatmap(
            z=pivot_data.values,
            x=pivot_data.columns,
            y=pivot_data.index,
            colorscale='RdYlGn',
            text=pivot_data.values,
            texttemplate="%{text:.1%}",
            textfont={"size": 10},
            colorbar=dict(
                title="设备效率",
                tickformat=".0%"
            )
        ))

        fig.update_layout(
            title='24小时设备效率热力图',
            xaxis_title='小时',
            yaxis_title='设备ID',
            height=500
        )

        return fig

    def create_kpi_cards(self, kpi_data):
        """创建KPI卡片"""
        kpi_cards = []

        kpi_configs = [
            {
                'title': '当日产量',
                'value': kpi_data['daily_output'],
                'unit': '件',
                'target': kpi_data['daily_target'],
                'format': '{:,.0f}',
                'color': self.colors['primary']
            },
            {
                'title': 'OEE',
                'value': kpi_data['current_oee'],
                'unit': '%',
                'target': 0.85,
                'format': '{:.1%}',
                'color': self.colors['success']
            },
            {
                'title': '缺陷率',
                'value': kpi_data['defect_rate'],
                'unit': '%',
                'target': 0.02,
                'format': '{:.2%}',
                'color': self.colors['warning'],
                'reverse': True  # 越低越好
            },
            {
                'title': '设备可用率',
                'value': kpi_data['availability'],
                'unit': '%',
                'target': 0.95,
                'format': '{:.1%}',
                'color': self.colors['info']
            }
        ]

        for config in kpi_configs:
            # 计算趋势
            is_good = (config['value'] >= config['target']) if not config.get('reverse') else (config['value'] <= config['target'])
            trend_color = self.colors['success'] if is_good else self.colors['danger']
            trend_icon = '↑' if is_good else '↓'

            card_html = f"""
            <div style="
                background-color: white;
                border-left: 4px solid {config['color']};
                padding: 20px;
                margin: 10px;
                border-radius: 8px;
                box-shadow: 0 2px 4px rgba(0,0,0,0.1);
                min-width: 200px;
            ">
                <h3 style="margin: 0; color: #666; font-size: 14px;">{config['title']}</h3>
                <div style="display: flex; align-items: center; margin: 10px 0;">
                    <span style="font-size: 28px; font-weight: bold; color: {config['color']};">
                        {config['format'].format(config['value'])}
                    </span>
                    <span style="margin-left: 10px; color: {trend_color}; font-size: 20px;">
                        {trend_icon}
                    </span>
                </div>
                <div style="font-size: 12px; color: #999;">
                    目标: {config['format'].format(config['target'])}
                </div>
            </div>
            """
            kpi_cards.append(card_html)

        return kpi_cards

    def create_alert_panel(self, alert_data):
        """创建告警面板"""
        # 按严重程度分类告警
        critical_alerts = alert_data[alert_data['severity'] == 'critical']
        warning_alerts = alert_data[alert_data['severity'] == 'warning']

        alert_html = f"""
        <div style="background-color: white; padding: 20px; border-radius: 8px; box-shadow: 0 2px 4px rgba(0,0,0,0.1);">
            <h3 style="margin: 0 0 15px 0; color: #333;">实时告警</h3>

            <div style="margin-bottom: 15px;">
                <span style="background-color: {self.colors['danger']}; color: white; padding: 4px 8px; border-radius: 4px; font-size: 12px;">
                    严重告警: {len(critical_alerts)}
                </span>
                <span style="background-color: {self.colors['warning']}; color: white; padding: 4px 8px; border-radius: 4px; font-size: 12px; margin-left: 10px;">
                    警告: {len(warning_alerts)}
                </span>
            </div>
        """

        # 显示最近的告警
        recent_alerts = alert_data.head(5)
        for _, alert in recent_alerts.iterrows():
            severity_color = self.colors['danger'] if alert['severity'] == 'critical' else self.colors['warning']
            alert_html += f"""
            <div style="border-left: 3px solid {severity_color}; padding: 8px 12px; margin: 8px 0; background-color: #f8f9fa;">
                <div style="font-weight: bold; color: {severity_color};">{alert['equipment_id']}</div>
                <div style="font-size: 12px; color: #666;">{alert['message']}</div>
                <div style="font-size: 10px; color: #999;">{alert['timestamp']}</div>
            </div>
            """

        alert_html += "</div>"
        return alert_html

4.2 成本分析报表

问题：设计制造成本分析报表，包括直接成本、间接成本和成本动因分析。

参考答案：

class ManufacturingCostAnalyzer:
    def __init__(self):
        self.cost_categories = {
            'direct_material': '直接材料',
            'direct_labor': '直接人工',
            'manufacturing_overhead': '制造费用',
            'quality_cost': '质量成本',
            'maintenance_cost': '维护成本'
        }

    def analyze_cost_structure(self, cost_data):
        """分析成本结构"""
        # 按产品和成本类别汇总
        cost_summary = cost_data.groupby(['product_id', 'cost_category']).agg({
            'cost_amount': 'sum',
            'quantity': 'sum'
        }).reset_index()

        # 计算单位成本
        cost_summary['unit_cost'] = cost_summary['cost_amount'] / cost_summary['quantity']

        # 成本结构分析
        total_cost_by_product = cost_summary.groupby('product_id')['cost_amount'].sum()
        cost_structure = cost_summary.merge(
            total_cost_by_product.to_frame('total_cost'),
            left_on='product_id',
            right_index=True
        )
        cost_structure['cost_percentage'] = (
            cost_structure['cost_amount'] / cost_structure['total_cost'] * 100
        )

        return cost_structure

    def create_cost_waterfall(self, cost_breakdown):
        """创建成本瀑布图"""
        categories = list(cost_breakdown.keys())
        values = list(cost_breakdown.values())

        # 计算累积值
        cumulative = np.cumsum([0] + values[:-1])

        fig = go.Figure()

        # 添加起始柱
        fig.add_trace(go.Bar(
            name='成本组成',
            x=categories,
            y=values,
            base=cumulative,
            marker_color=['lightblue' if v > 0 else 'lightcoral' for v in values]
        ))

        # 添加连接线
        for i in range(len(categories)-1):
            fig.add_shape(
                type="line",
                x0=i+0.4, y0=cumulative[i+1],
                x1=i+0.6, y1=cumulative[i+1],
                line=dict(color="gray", width=1, dash="dash")
            )

        fig.update_layout(
            title='产品成本结构瀑布图',
            xaxis_title='成本类别',
            yaxis_title='成本金额 (元)',
            showlegend=False,
            height=500
        )

        return fig

    def analyze_cost_drivers(self, production_data, cost_data):
        """成本动因分析"""
        from scipy.stats import pearsonr

        # 合并生产和成本数据
        merged_data = pd.merge(production_data, cost_data, on=['batch_id', 'date'])

        # 计算相关性
        cost_drivers = {
            'production_volume': '生产量',
            'machine_hours': '机器工时',
            'labor_hours': '人工工时',
            'defect_rate': '缺陷率',
            'setup_time': '调机时间',
            'material_waste': '材料浪费'
        }

        correlation_results = {}

        for driver, driver_name in cost_drivers.items():
            if driver in merged_data.columns:
                correlation, p_value = pearsonr(
                    merged_data[driver],
                    merged_data['total_cost']
                )

                correlation_results[driver_name] = {
                    'correlation': correlation,
                    'p_value': p_value,
                    'significance': 'significant' if p_value < 0.05 else 'not_significant'
                }

        return correlation_results

    def create_cost_trend_analysis(self, historical_cost_data):
        """成本趋势分析"""
        fig = make_subplots(
            rows=2, cols=2,
            subplot_titles=['总成本趋势', '单位成本趋势', '成本构成变化', '成本波动分析'],
            specs=[[{"secondary_y": True}, {"secondary_y": False}],
                   [{"type": "pie"}, {"type": "box"}]]
        )

        # 总成本趋势
        monthly_cost = historical_cost_data.groupby('month').agg({
            'total_cost': 'sum',
            'production_volume': 'sum'
        }).reset_index()

        fig.add_trace(
            go.Scatter(
                x=monthly_cost['month'],
                y=monthly_cost['total_cost'],
                mode='lines+markers',
                name='总成本',
                line=dict(color='blue', width=2)
            ),
            row=1, col=1
        )

        # 添加生产量到次坐标轴
        fig.add_trace(
            go.Scatter(
                x=monthly_cost['month'],
                y=monthly_cost['production_volume'],
                mode='lines+markers',
                name='生产量',
                line=dict(color='red', width=2),
                yaxis='y2'
            ),
            row=1, col=1
        )

        # 单位成本趋势
        monthly_cost['unit_cost'] = monthly_cost['total_cost'] / monthly_cost['production_volume']
        fig.add_trace(
            go.Scatter(
                x=monthly_cost['month'],
                y=monthly_cost['unit_cost'],
                mode='lines+markers',
                name='单位成本',
                line=dict(color='green', width=2)
            ),
            row=1, col=2
        )

        # 成本构成饼图（最新月份）
        latest_month_data = historical_cost_data[
            historical_cost_data['month'] == historical_cost_data['month'].max()
        ]
        cost_composition = latest_month_data.groupby('cost_category')['cost_amount'].sum()

        fig.add_trace(
            go.Pie(
                labels=cost_composition.index,
                values=cost_composition.values,
                name="成本构成"
            ),
            row=2, col=1
        )

        # 成本波动箱线图
        for category in historical_cost_data['cost_category'].unique():
            category_data = historical_cost_data[
                historical_cost_data['cost_category'] == category
            ]
            fig.add_trace(
                go.Box(
                    y=category_data['cost_amount'],
                    name=category,
                    boxpoints='outliers'
                ),
                row=2, col=2
            )

        fig.update_layout(
            height=800,
            title_text="制造成本综合分析",
            showlegend=True
        )

        return fig

5. 数据产品经理 - 制造业

工业4.0产品设计

5.1 智能制造平台产品设计

问题：设计一个智能制造数据平台的产品架构和核心功能模块。

参考答案：

class SmartManufacturingPlatform:
    def __init__(self):
        self.modules = {
            'data_collection': '数据采集模块',
            'real_time_monitoring': '实时监控模块',
            'predictive_analytics': '预测分析模块',
            'quality_management': '质量管理模块',
            'production_planning': '生产计划模块',
            'maintenance_management': '维护管理模块',
            'energy_management': '能源管理模块',
            'supply_chain': '供应链模块'
        }

    def define_product_requirements(self):
        """定义产品需求"""
        requirements = {
            'functional_requirements': {
                'real_time_data_processing': {
                    'description': '实时处理来自生产线的传感器数据',
                    'performance_criteria': {
                        'latency': '< 100ms',
                        'throughput': '> 10万条/秒',
                        'availability': '99.9%'
                    },
                    'user_stories': [
                        '作为生产经理，我希望实时看到所有设备的运行状态',
                        '作为质量工程师，我希望及时发现质量异常'
                    ]
                },
                'predictive_maintenance': {
                    'description': '基于设备数据预测维护需求',
                    'performance_criteria': {
                        'prediction_accuracy': '> 85%',
                        'false_positive_rate': '< 10%',
                        'prediction_horizon': '7-30天'
                    },
                    'user_stories': [
                        '作为维护工程师，我希望提前知道哪些设备需要维护',
                        '作为成本管理人员，我希望优化维护成本'
                    ]
                },
                'production_optimization': {
                    'description': '优化生产计划和工艺参数',
                    'performance_criteria': {
                        'oee_improvement': '> 5%',
                        'cost_reduction': '> 3%',
                        'optimization_time': '< 1小时'
                    },
                    'user_stories': [
                        '作为生产计划员，我希望系统推荐最优的生产计划',
                        '作为工艺工程师，我希望找到最佳工艺参数'
                    ]
                }
            },
            'non_functional_requirements': {
                'scalability': '支持1000+设备并发接入',
                'security': '符合工业网络安全标准',
                'usability': '普通操作员30分钟内可掌握基本操作',
                'integration': '支持主流MES/ERP系统集成'
            }
        }

        return requirements

    def design_data_architecture(self):
        """设计数据架构"""
        architecture = {
            'data_sources': {
                'real_time_sensors': {
                    'types': ['温度', '压力', '振动', '电流', '流量'],
                    'frequency': '1-10秒',
                    'protocols': ['OPC UA', 'Modbus', 'MQTT']
                },
                'manufacturing_systems': {
                    'mes': '制造执行系统',
                    'erp': '企业资源计划',
                    'scada': '数据采集与监控',
                    'qms': '质量管理系统'
                },
                'external_data': {
                    'weather': '天气数据',
                    'supply_chain': '供应链数据',
                    'market': '市场需求数据'
                }
            },
            'data_processing_layers': {
                'edge_computing': {
                    'purpose': '边缘设备数据预处理',
                    'technologies': ['EdgeX Foundry', 'Azure IoT Edge'],
                    'functions': ['数据过滤', '本地存储', '初步分析']
                },
                'stream_processing': {
                    'purpose': '实时数据流处理',
                    'technologies': ['Apache Kafka', 'Apache Flink'],
                    'functions': ['数据清洗', '实时计算', '异常检测']
                },
                'batch_processing': {
                    'purpose': '历史数据批量处理',
                    'technologies': ['Apache Spark', 'Hadoop'],
                    'functions': ['复杂分析', '机器学习训练', '报表生成']
                }
            },
            'data_storage': {
                'time_series_db': {
                    'technology': 'InfluxDB',
                    'use_case': '传感器时序数据'
                },
                'relational_db': {
                    'technology': 'PostgreSQL',
                    'use_case': '业务主数据'
                },
                'document_db': {
                    'technology': 'MongoDB',
                    'use_case': '非结构化数据'
                },
                'data_lake': {
                    'technology': 'Hadoop HDFS',
                    'use_case': '原始数据存档'
                }
            }
        }

        return architecture

    def design_user_interface(self):
        """设计用户界面"""
        ui_design = {
            'dashboard_layout': {
                'executive_dashboard': {
                    'target_users': ['工厂经理', '生产总监'],
                    'key_metrics': ['整体OEE', '日产量', '质量指标', '成本指标'],
                    'update_frequency': '15分钟',
                    'visualizations': ['KPI卡片', '趋势图', '状态指示器']
                },
                'operator_dashboard': {
                    'target_users': ['生产操作员', '班组长'],
                    'key_metrics': ['设备状态', '当前产量', '质量状态', '告警信息'],
                    'update_frequency': '实时',
                    'visualizations': ['设备状态图', '实时曲线', '告警列表']
                },
                'maintenance_dashboard': {
                    'target_users': ['维护工程师', '设备管理员'],
                    'key_metrics': ['设备健康度', '维护计划', '故障预测', '备件库存'],
                    'update_frequency': '1小时',
                    'visualizations': ['设备健康热力图', '维护甘特图', '预测曲线']
                }
            },
            'mobile_interface': {
                'features': ['移动告警', '现场数据录入', '设备状态查询', '工单管理'],
                'supported_platforms': ['iOS', 'Android', 'Web App'],
                'offline_capabilities': ['基础数据查看', '离线数据录入']
            },
            'customization_options': {
                'dashboard_personalization': '用户可自定义仪表板布局',
                'alert_preferences': '个性化告警设置',
                'report_templates': '自定义报表模板',
                'role_based_access': '基于角色的权限控制'
            }
        }

        return ui_design

    def define_success_metrics(self):
        """定义成功指标"""
        metrics = {
            'business_metrics': {
                'operational_efficiency': {
                    'oee_improvement': {
                        'baseline': '75%',
                        'target': '85%',
                        'measurement_period': '6个月'
                    },
                    'downtime_reduction': {
                        'baseline': '20%',
                        'target': '10%',
                        'measurement_period': '6个月'
                    }
                },
                'cost_optimization': {
                    'maintenance_cost_reduction': {
                        'target': '15%',
                        'measurement_period': '12个月'
                    },
                    'energy_cost_reduction': {
                        'target': '10%',
                        'measurement_period': '12个月'
                    }
                },
                'quality_improvement': {
                    'defect_rate_reduction': {
                        'baseline': '2%',
                        'target': '1%',
                        'measurement_period': '6个月'
                    }
                }
            },
            'technical_metrics': {
                'system_performance': {
                    'data_processing_latency': '< 100ms',
                    'system_availability': '> 99.5%',
                    'concurrent_users': '> 500'
                },
                'data_quality': {
                    'data_completeness': '> 95%',
                    'data_accuracy': '> 98%',
                    'data_timeliness': '< 5分钟延迟'
                }
            },
            'user_adoption_metrics': {
                'user_engagement': {
                    'daily_active_users': '目标80%使用率',
                    'feature_adoption': '核心功能50%+使用率',
                    'user_satisfaction': 'NPS > 50'
                },
                'training_effectiveness': {
                    'time_to_productivity': '< 1周',
                    'training_completion_rate': '> 90%',
                    'certification_pass_rate': '> 85%'
                }
            }
        }

        return metrics

    def create_roadmap(self):
        """创建产品路线图"""
        roadmap = {
            'phase_1_foundation': {
                'duration': '3个月',
                'objectives': ['基础数据采集', '实时监控', '基础报表'],
                'deliverables': [
                    '数据采集平台',
                    '实时监控仪表板',
                    '基础告警系统',
                    '用户权限管理'
                ],
                'success_criteria': [
                    '支持100台设备接入',
                    '实现99%数据采集率',
                    '基础监控功能可用'
                ]
            },
            'phase_2_intelligence': {
                'duration': '4个月',
                'objectives': ['预测分析', '智能告警', '移动应用'],
                'deliverables': [
                    '预测性维护模型',
                    '智能异常检测',
                    '移动端应用',
                    '高级报表系统'
                ],
                'success_criteria': [
                    '预测准确率>80%',
                    '误报率<15%',
                    '移动应用上线'
                ]
            },
            'phase_3_optimization': {
                'duration': '5个月',
                'objectives': ['生产优化', '供应链集成', '高级分析'],
                'deliverables': [
                    '生产优化引擎',
                    '供应链可视化',
                    '高级分析工具',
                    'API开放平台'
                ],
                'success_criteria': [
                    'OEE提升5%',
                    '支持第三方集成',
                    '完整API文档'
                ]
            },
            'phase_4_scale': {
                'duration': '持续',
                'objectives': ['规模化部署', '生态建设', '持续优化'],
                'deliverables': [
                    '多工厂部署',
                    '合作伙伴生态',
                    '持续学习机制',
                    '行业解决方案'
                ],
                'success_criteria': [
                    '支持10+工厂',
                    '建立合作伙伴网络',
                    '形成行业标杆案例'
                ]
            }
        }

        return roadmap

5.2 产品需求文档(PRD)模板

问题：为制造业数据产品编写一份完整的PRD文档。

参考答案：

# 智能制造数据平台 - 产品需求文档 (PRD)

## 1. 产品概述

### 1.1 产品定位
智能制造数据平台是面向制造企业的工业4.0数字化转型解决方案，通过集成IoT数据采集、实时分析、预测性维护、生产优化等功能，帮助制造企业提升生产效率、降低运营成本、改善产品质量。

### 1.2 目标用户
- **主要用户**：制造业企业（年收入1-100亿规模）
- **使用角色**：
  - 生产经理/工厂经理
  - 生产操作员/班组长
  - 维护工程师/设备管理员
  - 质量工程师/质量经理
  - IT管理员/数据分析师

### 1.3 核心价值主张
- **提升效率**：通过实时监控和优化算法提升OEE 5-15%
- **降低成本**：预测性维护降低维护成本10-30%
- **改善质量**：智能质量控制降低缺陷率50%+
- **增强可视化**：统一数据视图提升决策效率

## 2. 市场分析

### 2.1 市场规模
- 全球智能制造市场规模：2024年3000亿美元，年增长率12%
- 中国工业4.0市场：2024年1200亿人民币，年增长率15%
- 目标市场：中型制造企业数字化改造需求

### 2.2 竞争分析
| 竞争对手 | 优势 | 劣势 | 差异化策略 |
|---------|------|------|-----------|
| 西门子MindSphere | 品牌知名度高 | 价格昂贵，定制复杂 | 标准化产品，快速部署 |
| GE Predix | 技术先进 | 已停止发展 | 持续创新，开放生态 |
| 本土厂商 | 本地化服务 | 技术相对落后 | 技术领先，服务优质 |

## 3. 功能需求

### 3.1 核心功能模块

#### 3.1.1 数据采集模块
**功能描述**：支持多种工业协议的设备数据采集

**详细需求**：
- 支持协议：OPC UA, Modbus TCP/RTU, MQTT, HTTP/REST API
- 采集频率：1秒-1小时可配置
- 设备容量：单实例支持1000+设备并发
- 数据类型：数值、文本、状态、告警等
- 边缘计算：支持边缘预处理和本地存储

**验收标准**：
- [ ] 支持5种以上工业协议
- [ ] 数据采集成功率>99%
- [ ] 支持设备自动发现和配置
- [ ] 提供设备连接状态监控

#### 3.1.2 实时监控模块
**功能描述**：提供生产过程实时监控和可视化

**详细需求**：
- 实时仪表板：设备状态、生产指标、质量数据
- 告警管理：阈值告警、趋势告警、智能告警
- 数据更新：关键指标实时更新，其他1分钟刷新
- 自定义视图：用户可自定义监控布局
- 移动支持：响应式设计，支持移动设备访问

**验收标准**：
- [ ] 数据延迟<3秒
- [ ] 支持1000+并发用户
- [ ] 提供15+预设仪表板模板
- [ ] 告警响应时间<10秒

#### 3.1.3 预测分析模块
**功能描述**：基于机器学习的预测性维护和生产优化

**详细需求**：
- 预测性维护：设备故障预测、维护计划优化
- 质量预测：产品质量预测、工艺参数优化
- 生产预测：产量预测、需求预测
- 模型管理：模型训练、评估、部署、监控
- 算法库：回归、分类、聚类、时序分析等

**验收标准**：
- [ ] 设备故障预测准确率>85%
- [ ] 预测提前期7-30天
- [ ] 支持10+机器学习算法
- [ ] 模型自动更新和监控

### 3.2 用户界面需求

#### 3.2.1 Web界面
- **技术要求**：响应式设计，支持Chrome、Firefox、Safari、Edge
- **性能要求**：页面加载时间<3秒，操作响应时间<1秒
- **可用性要求**：新用户30分钟内掌握基本操作

#### 3.2.2 移动应用
- **平台支持**：iOS 12+, Android 8+, 微信小程序
- **核心功能**：设备监控、告警推送、数据查询、工单管理
- **离线功能**：基础数据查看、离线表单录入

## 4. 非功能性需求

### 4.1 性能需求
- **响应时间**：Web界面<3秒，API调用<1秒
- **并发能力**：支持500+并发用户
- **数据处理**：每秒处理10万条传感器数据
- **存储容量**：支持TB级历史数据存储

### 4.2 可靠性需求
- **系统可用性**：99.5%
- **数据完整性**：99.9%
- **故障恢复**：RTO<4小时，RPO<1小时
- **备份策略**：每日自动备份，异地备份

### 4.3 安全需求
- **身份认证**：支持LDAP、SSO集成
- **权限控制**：基于角色的访问控制(RBAC)
- **数据加密**：传输加密(TLS)、存储加密(AES-256)
- **审计日志**：完整的用户操作日志记录

### 4.4 集成需求
- **ERP集成**：SAP、Oracle、用友、金蝶等主流ERP
- **MES集成**：支持主流MES系统数据交换
- **API接口**：RESTful API，支持第三方系统集成
- **数据导入导出**：Excel、CSV、数据库直连等

## 5. 技术架构

### 5.1 系统架构

前端层：Web界面 + 移动应用应用层：业务逻辑 + API网关服务层：微服务架构（数据采集、分析、告警等）数据层：时序数据库 + 关系数据库 + 缓存基础设施：容器化部署 + 云平台

### 5.2 技术选型
- **前端**：React + TypeScript + Ant Design
- **后端**：Java Spring Boot + Python Flask
- **数据库**：InfluxDB + PostgreSQL + Redis
- **消息队列**：Apache Kafka
- **容器化**：Docker + Kubernetes
- **监控**：Prometheus + Grafana

## 6. 项目规划

### 6.1 开发计划
| 阶段 | 时间 | 主要功能 | 交付物 |
|-----|------|---------|--------|
| 阶段1 | 3个月 | 数据采集、基础监控 | MVP版本 |
| 阶段2 | 4个月 | 预测分析、移动应用 | V1.0版本 |
| 阶段3 | 5个月 | 高级功能、集成优化 | V2.0版本 |

### 6.2 资源需求
- **开发团队**：15-20人（前端3人、后端6人、算法3人、测试3人、产品2人、UI/UX 2人）
- **基础设施**：云服务器、开发环境、测试环境
- **预算估算**：人力成本400-500万/年，基础设施成本50-100万/年

## 7. 风险与应对

### 7.1 技术风险
- **数据接入复杂性**：工业协议多样，设备型号众多
  - 应对策略：建立设备兼容性测试实验室，与设备厂商深度合作

- **实时性能要求**：大量数据的实时处理和分析
  - 应对策略：采用流处理架构，边缘计算预处理

### 7.2 市场风险
- **客户接受度**：传统制造业数字化转型意愿和能力
  - 应对策略：提供完整的数字化转型咨询和培训服务

- **竞争加剧**：大厂进入市场，价格战风险
  - 应对策略：专注垂直领域，提供差异化价值

## 8. 成功指标

### 8.1 业务指标
- 客户数量：首年获得50+客户
- 收入目标：首年收入5000万+
- 客户满意度：NPS>50
- 续约率：>80%

### 8.2 产品指标
- 平台稳定性：可用性>99.5%
- 用户活跃度：DAU>80%
- 功能完成度：100%按时交付
- 缺陷率：<1‰

### 8.3 技术指标
- 性能达标率：100%满足性能需求
- 安全合规：100%通过安全审计
- 集成成功率：>95%
- 数据准确性：>99%

---

## 附录

### A. 术语表
- **OEE**：Overall Equipment Effectiveness，整体设备效率
- **IoT**：Internet of Things，物联网
- **MES**：Manufacturing Execution System，制造执行系统
- **SCADA**：Supervisory Control and Data Acquisition，数据采集与监控系统

### B. 参考资料
- 工业4.0白皮书
- 智能制造技术标准
- 数据安全法规要求
- 行业最佳实践案例

总结

制造业数据岗位具有以下特点：

技术融合性强：需要掌握IT技术和OT（运营技术）知识
实时性要求高：生产过程监控和控制需要实时响应
可靠性要求严格：系统故障可能导致生产停机和安全事故
领域知识重要：需要深入理解制造工艺和业务流程
成本敏感：ROI考核严格，需要量化业务价值

制造业数据人才应具备：

扎实的数据技术基础
工业领域知识
系统思维和问题解决能力
跨部门协作能力
持续学习新技术的能力

制造业正在向智能制造转型，为数据专业人士提供了广阔的发展机会和挑战。

本文节选自数据从业者全栈知识库。知识库包含 2300+ 篇体系化技术文档，覆盖数据分析、数据工程、数据治理、AI 等全栈领域。了解更多 ->

更多文章

概述

制造业数据工作特点

核心技术栈

1. 数据分析师 - 制造业

基础能力考察

1.1 制造业务理解

1.2 生产效率分析

高级应用场景

1.3 供应链风险分析

1.4 质量异常根因分析

2. 数据科学家 - 制造业

机器学习应用

2.1 预测性维护建模

2.2 工艺参数优化

3. 数据工程师 - 制造业

工业数据架构

3.1 IoT数据采集架构

3.2 数据仓库设计

4. BI分析师 - 制造业

制造业BI解决方案

4.1 生产监控仪表板设计

4.2 成本分析报表

5. 数据产品经理 - 制造业

工业4.0产品设计

5.1 智能制造平台产品设计

5.2 产品需求文档(PRD)模板

总结

加入免费社群

成为会员

1v1 咨询

制造业数据分析师面试题库：MES/ERP/SCADA系统解析与OEE建模

更多文章

概述

制造业数据工作特点

核心技术栈

1. 数据分析师 - 制造业

基础能力考察

1.1 制造业务理解

1.2 生产效率分析

高级应用场景

1.3 供应链风险分析

1.4 质量异常根因分析

2. 数据科学家 - 制造业

机器学习应用

2.1 预测性维护建模

2.2 工艺参数优化

3. 数据工程师 - 制造业

工业数据架构

3.1 IoT数据采集架构

3.2 数据仓库设计

4. BI分析师 - 制造业

制造业BI解决方案

4.1 生产监控仪表板设计

4.2 成本分析报表

5. 数据产品经理 - 制造业

工业4.0产品设计

5.1 智能制造平台产品设计

5.2 产品需求文档(PRD)模板

总结

加入免费社群

成为会员

1v1 咨询

相关文章

学了 30 个数据教程，为什么面试还是讲不清一个项目？

面试官问项目细节，90% 的人输在证据链

面试官问数据质量，别只背 3 类规则